Measure Theory

\[\newcommand{\st}{\, : \:} \newcommand{\ind}[1]{\mathbf{1}_{#1}} \newcommand{\dd}{\mathrm{d}}\]

This appendix provides a concise overview of the key concepts and results from Measure Theory, it covers more than actually used throughout the book. Some proofs are sketched, especially long but simple points are stated with little details. In this case however precise references are given. We refer to the excellent (Bogachev 2007) for a comprehensive treatment of measure theory.

Measure Spaces

TipTakeaway of Section 1

The concepts of measure, volume, perimeter, area, probability and so on, are mathematically formalized by a triple \((\Omega,\mathcal{F},\mu)\). \(\Omega\) corresponds to the space where we want to measure, it can be any non-empty set. \(\mathcal{F}\) is the actual family of subsets of \(\Omega\) that we want to measure, the so called measurable events. \(\mathcal{F}\) should satisfy some intuitive conditions (e.g if we can measure the occurrence of \(A\) and \(B\), we can measure the non-occurrence of \(A\) or the occurrence of \(A\cup B\)). \(\mu\) is the actual measure (or volume, probability etc.), so it is a map that associates to each measurable set \(A\in \mathcal{F}\), a number \(\mu(A)\). The basic property is that \(\mu(A\sqcup B)=\mu(A)+\mu(B)\).

There are two problems to keep in mind:

  • We want \(\mu(A\sqcup B)=\mu(A)+\mu(B)\) to hold also for infinite (countable) sums and unions. Because in Math people like to take limits.
  • We want to identify measures without the need to specify their values on each measurable set. For instance, on the real numbers, we can decide that the length of an intervals \((a,b]\) is \(b-a\), and that should be enough to characterize the way we measure lengths on \(\mathbb{R}\).

These two problems complicate matters a bit. In this section we give some basic definitions and results to tackle those problems later.

Sigma-Algebras and Measurable Spaces

Just as topologies provides a rigorous framework for concepts like proximity and convergence, \(\sigma\)–algebras and filtrations formalize the notions of available information and measurability.

Definition 1 (Pi-System) A collection \(\mathcal{P}\) of subsets of a set \(\Omega\) is a \(\pi\)-system if it satisfies

  1. \(\mathcal{P}\) is non-empty.
  2. For \(A,B \in \mathcal{P}\), \(A\cap B \in \mathcal{P}\).

A collection \(\mathcal{P}\) of subsets of a set \(\Omega\) is a quasi \(\pi\)-system, or q\(\pi\)-system for short, if it satisfies

  1. \(\mathcal{P}\) is non-empty.
  2. For \(A,B \in \mathcal{P}\), there exists finitely many pairwise disjoint \(C_1,\ldots,C_n \in \mathcal{P}\) such that \(A\cap B = \cup_i C_i\).

Definition 2 (Lambda-System) A collection \(\mathcal{L}\) of subsets of a set \(\Omega\) is a \(\lambda\)-system if it satisfies the following properties:

  1. \(\emptyset \in \mathcal{L}\).
  2. If \(A \in \mathcal{L}\), then \(A^c \in \mathcal{L}\) (closed under complementation).
  3. If \((A_n)_{n=1}^\infty\) is a sequence of pairwise disjoint sets in \(\mathcal{L}\), then \(\bigcup_{n=1}^\infty A_n \in \mathcal{L}\) (closed under countable disjoint unions).

Definition 3 (sigma-algebra) A \(\sigma\)-algebra on a set \(\Omega\) is a collection \(\mathcal{F}\) of subsets of \(\Omega\) that is both a q\(\pi\)-system and \(\lambda\)-system. Equivalently it satisfies:

  1. \(\emptyset \in \mathcal{F}\).
  2. If \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\) (closed under complementation).
  3. If \((A_n)_{n=1}^\infty\) is a sequence of sets in \(\mathcal{F}\), then \(\bigcup_{n=1}^\infty A_n \in \mathcal{F}\) (closed under countable unions).

A measurable space is a pair \((\Omega, \mathcal{F})\), where \(\Omega\) is a non-empty set and \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\).

Remark. The intersection of any collection of \(\sigma\)-algebras (respectively \(\pi\)-systems, \(\lambda\)-systems) on \(\Omega\) is also a \(\sigma\)-algebra (respectively a \(\pi\)-system, a \(\lambda\)-system). This allows us to define the \(\sigma\)-algebra generated by a family of sets \(\mathcal{C}\), denoted by \(\sigma(\mathcal{C})\), as the weakest (also called coarsest or smallest) \(\sigma\)-algebra containing \(\mathcal{C}\) (as well as the \(\pi\)-system and the \(\lambda\)-system generated by \(\mathcal{C}\)). It is the intersection of all \(\sigma\)-algebras containing \(\mathcal{C}\). More generally, by the weakest \(\sigma\)-algebra such that a given property holds we mean the intersection of all such \(\sigma\)-algebras.

Example 1 (Trivial and Discrete sigma-algebras) For any set \(\Omega\), the power set (the set of all subsets of \(\Omega\)) is a \(\sigma\)-algebra, called the discrete \(\sigma\)-algebra. This is the strongest (maximal) \(\sigma\)-algebra on \(\Omega\).

The collection \(\{\emptyset, \Omega\}\) is also a \(\sigma\)-algebra, called the trivial \(\sigma\)-algebra. This is the weakest \(\sigma\)-algebra on \(\Omega\).

Example 2 (Borel sigma-algebras) If \((\Omega,\mathcal{T})\) is a topological space, the \(\sigma\)-algebra \(\sigma(\mathcal{T})\) generated by the open sets (equivalently the closed sets) is called the Borel \(\sigma\)-algebra. Its elements are called Borel sets or Borelians. When \(\mathcal{T}\) is understood, the Borel \(\sigma\)-algebra is often denoted \(\sigma(\Omega)\). This is usually the case on Euclidean spaces, so that \(\sigma(\mathbb{R})\) denotes the Borel \(\sigma\)-algebra generated by the standard topology on \(\mathbb{R}\).

If \((\Omega,\mathcal{T})\) is a second-countable topological space, and \(\mathcal{B}\) is a base of the topology \(\mathcal{T}\), then the Borel \(\sigma\)-algebra coincides with the one generated by \(\mathcal{B}\), \(\sigma(\mathcal{T})=\sigma(\mathcal{B})\). For instance, on \(\mathbb{R}\) the Borel \(\sigma\)-algebra is generated by intervals.

Example 3 (Classes of intervals) If \(\Omega=\mathbb{R}\), intervals of the form \((a,b]\) form a \(\pi\)-system. The same can be said for open or closed intervals, or intervals of the form \([a,b)\). Similar statements hold for rectangles in \(\mathbb{R}^2\).

If \(\Omega=S^1\), the class of closed arcs of \(\Omega\) is a q\(\pi\)-system but not a \(\pi\)-system. As opposed to the case \(\Omega=\mathbb{R}\), the intersection of two arcs can be a union of two arcs.

Theorem 1 (Pi-Lambda Theorem) If \(\mathcal{P}\) is a q\(\pi\)-system and \(\mathcal{L}\) is a \(\lambda\)-system such that \(\mathcal{P} \subset \mathcal{L}\), then \(\sigma(\mathcal{P}) \subset \mathcal{L}\). In other words, the weakest \(\sigma\)-algebra and the weakest \(\lambda\)-system generated by a q\(\pi\)-system coincide.

Proof (Theorem 1). With no loss of generality, we can assume \(\mathcal{L}\) to be the weakest \(\lambda\)-system containing \(\mathcal{P}\). For \(A\in \mathcal{L}\) define \[ \mathcal{L}_A:= \left\{B \subset \Omega : A\cap B \in \mathcal{L} \right\} \] Notice that for \(A\in \mathcal{L}\)

  • \(\emptyset \in \mathcal{L}_A\).
  • If \(B\in \mathcal{L}_A\), then \(A\cap B^c = (A^c \cup (A \cap B) )^c \in \mathcal{L}\). Namely \(\mathcal{L}\) is closed under complementation.
  • \(\mathcal{L}_A\) is closed under countable disjoint union, since \(A \cap (\cup_i B_i)= \cup_i (A\cap B_i)\).
  • If \(B\in \mathcal{P}\), then \(A\cap B \in \mathcal{L}\) as a finite disjoint union of elements of \(\mathcal{P}\subset \mathcal{L}\).

Therefore \(\mathcal{L}_A\) is a \(\lambda\)-system containing \(\mathcal{P}\), and by the minimality of \(\mathcal{L}\), we conclude \(\mathcal{L}_A \supset \mathcal{L}\). Which in turns means that, for \(A,B \in \mathcal{L}\), it holds \(A\cap B\in \mathcal{L}\). In other words \(\mathcal{L}\) is a \(\pi\)-system and a \(\lambda\)-system, and thus a \(\sigma\)-algebra.

Sometimes it is convenient to use algebras and monotone classes instead of \(\sigma\)-algebras, \(\pi\)-systems and \(\lambda\)-systems.

Definition 4 (Algebra) A collection \(\mathcal{A}\) of subsets of a set \(\Omega\) is an algebra (or field) if it satisfies the following properties:

  1. \(\Omega \in \mathcal{A}\).
  2. If \(A \in \mathcal{A}\), then \(A^c \in \mathcal{A}\) (closed under complementation).
  3. If \(A, B \in \mathcal{A}\), then \(A \cup B \in \mathcal{A}\) (closed under finite unions).

Definition 5 (Monotone Class) A collection \(\mathcal{M}\) of subsets of a set \(\Omega\) is a monotone class if it satisfies the following properties:

  1. If \((A_n)_{n=1}^\infty\) is an increasing sequence of sets in \(\mathcal{M}\) (i.e., \(A_1 \subset A_2 \subset A_3 \subset \dots\)), then \(\bigcup_{n=1}^\infty A_n \in \mathcal{M}\) (closed under countable increasing unions).
  2. If \((B_n)_{n=1}^\infty\) is a decreasing sequence of sets in \(\mathcal{M}\) (i.e., \(B_1 \supset B_2 \supset B_3 \supset \dots\)), then \(\bigcap_{n=1}^\infty B_n \in \mathcal{M}\) (closed under countable decreasing intersections).

Theorem 2 (Monotone Class Theorem) Let \(\mathcal{A}\) be an algebra of subsets of a set \(\Omega\). Let \(\mathcal{M}\) be a monotone class of subsets of \(\Omega\). If \(\mathcal{A} \subset \mathcal{M}\), then \(\sigma(\mathcal{A}) \subset \mathcal{M}\).

Proof (Theorem 2). With no loss of generality, assume that \(\mathcal{M}\) is the minimal monotone class containing \(\mathcal{A}\). We will prove that \(\mathcal{M}\) is a \(\sigma\)-algebra.

For \(A \in \mathcal{M}\), let \(\mathcal{M}_A = \{B \subset \Omega : A \cap B \in \mathcal{M}, B^c \in \mathcal{M}\}\). It is straightforward to verify that \(\mathcal{M}_A\) is a monotone class containing \(\mathcal{A}\). Thus \(\mathcal{M}_A \supset \mathcal{M}\), which in particular implies that \(\mathcal{M}\) is a \(\pi\)-system. But a monotone class which is also a \(\pi\)-system is a \(\sigma\)-algebra, as can be easily checked.

Classes of Measurable Spaces

In this section we review a few properties that define “well-behaved” classes of measurable spaces. This can be viewed as a measurable equivalence of the various regularity properties for topological spaces.

Definition 6 (Atoms) In a measurable space \((\Omega, \mathcal{F})\), the atoms are the elements of \(\mathcal{F}\) obtained as equivalence classes of the equivalence relation on \(\Omega\): \(x\sim y\) if and only if \[ \mathbf{1}_A(x)= \mathbf{1}_A(y) \quad \forall A \in \mathcal{F} \]

Definition 7 (Hausdorff and Separable spaces) The measurable space \((\Omega, \mathcal{F})\) is called:

  • Hausdorff if all atoms are singletons (points of \(\Omega\)).

  • Separable if \(\mathcal{F}\) is generated by a countable collection of sets of \(\Omega\).

Example 4 \(\mathbb{R}\) with the Borel \(\sigma\)-algebra is Hausdorff and separable.

As a counterexample to the Hausdorff property, consider the space \(\Omega = \{a, b, c\}\) with the \(\sigma\)-algebra \(\mathcal{F} = \{\emptyset, \{a\}, \{b, c\}, \Omega\}\). This space is not Hausdorff because the atoms are \(\{a\}\) and \(\{b,c\}\).

As a counterexample to the separability, an uncountable space with the discrete \(\sigma\)-algebra is not separable.

Example 5 If \((\Omega,\mathcal{T})\) is Hausdorff (topological sense), the Borelian measurable space \((\Omega,\sigma(\mathcal{T}))\) is Hausdorff (measurable sense).

Measurable Functions

Definition 8 (Measurable Function) Let \((\Omega_1, \mathcal{F}_1)\) and \((\Omega_2, \mathcal{F}_2)\) be measurable spaces. A function \(f: \Omega_1 \to \Omega_2\) is called measurable (or \(\mathcal{F}_1 / \mathcal{F}_2\)-measurable) if for every \(A \in \mathcal{F}_2\), the preimage \(f^{-1}(A) = \{\omega \in \Omega_1: f(\omega) \in A\}\) is in \(\mathcal{F}_1\).

Proposition 1 Let \((\Omega_1, \mathcal{F}_1)\) and \((\Omega_2, \mathcal{F}_2)\) be measurable spaces, and let \(\mathcal{C}\) be a collection of subsets in \(\Omega_2\) such that \(\mathcal{F}_2=\sigma(\mathcal{C})\). Then \(f\) is \(\mathcal{F}_1 / \mathcal{F}_2\)-measurable if and only if: \[ f^{-1}(E)\in \mathcal{F}_1, \quad \forall E\in \mathcal{C} \]

Proof (Proposition 1). Set operations are natural under pull-backs (that is a difficult way to say that \(f^{-1}(E\cap F) = f^{-1}(E) \cap f^{-1}(F)\), and similar properties for the elementary set operations). Therefore it is easy to check that \(\mathcal{F}:= \{E\subset \Omega_2 : f^{-1}(E)\in \mathcal{F}_1\}\) is a \(\sigma\)-algebra. Since it contains \(\mathcal{C}\), it contains \(\mathcal{F}_2\), namely \(f\) is \(\mathcal{F}_1 / \mathcal{F}_2\)-measurable.

If \((\Omega_2, \mathcal{F}_2)\) is a measurable spaces, a collection of functions \(\mathcal{F}\subset \Omega_2^{\Omega_1}\), induces on \(\Omega_1\) a \(\sigma\)-algebra \(\mathcal{F}_1 := \sigma(\mathcal{F}) \equiv \{f^{-1}(A) : f\in \mathcal{F}, A\in \mathcal{F_2}\}\): the weakest \(\sigma\)-algebra such that all functions in \(\mathcal{F}\) are measurable.

Remark. Checking functions measurability can be a daunting (and somehow boring) task. Some standard ways to easily determine that a function is measurable are:

  • The composition of measurable functions is measurable (trivially).
  • Continuous functions between topological spaces are measurable with respect to the respective Borel \(\sigma\)-algebras (from Proposition 1).
  • Let \(f_n \colon \Omega \to \mathbb{R}\cup {-\infty} \cup {+\infty}\) be a sequence of Borel measurable functions. The pointwise supremum and infimum of \(f_n\) is also measurable (intervals \((-\infty,c]\) generate the Borel \(\sigma\)-algebra and \(\{\sup_n f_n \le c\}= \cap_n \{f_n \le c\}\)). In particular, \(\varlimsup_n f_n\) and \(\varliminf_n f_n\) are measurable.

Nontrivial examples

In general measurability issues do not receive much love in the world of Mathematics, since they are either trivial or artificial. There are of course exceptions.

A typical measurability problem is the following, which arises in optimization theory: for each “random” outcome \(\omega\in \Omega\), we have several optimal strategies \(A_\omega \subset E\). Can we measurably sample an optimal strategy? Namely, can we find a measurable map \(f\colon \Omega \to E\) such that \(f(\omega)\in A_\omega\) for \(\omega \in \Omega\)? Measurability here is crucial, since it is a minimal requirement to practically construct the function. If we have no further information, we can only assume the axiom of choice, which will give us the existence of an optimal \(f(\omega)\in E\) with \(f(\omega)\in A_\omega\). However, the axiom of choice is the standard way to define non-constructive objects, and typically non-measurable. This approach has no interest in the optimization framework. A more interesting (measurable) result is due to Kuratowski and Ryll-Nardzewski, and features many variants, see (Bogachev 2007, Theorem 6.9.3).

Theorem 3 (Kuratowski and Ryll-Nardzewski Selection) Let \((\Omega,\mathcal{F})\) be a measurable space and \((E,\mathcal{E})\) be Polish space, where \(\mathcal{E}\) is the Borel \(\sigma\)-algebra. Suppose that we have a map \(\Omega \ni \omega \mapsto A_\omega \in \mathcal{E}\) such that

  • \(A_\omega\) is a non-empty closed set of \(E\), for all \(\omega \in \Omega\).
  • \(\{\omega \in \Omega : A_\omega \cap O = \emptyset \} \in \mathcal{F}\) for all \(O\) open in \(E\) (weak measurability of \(A_\omega\)).

Then there exists a measurable map \(f\colon \Omega \to E\) such that \(f(\omega)\in A_\omega\) for all \(\omega \in \Omega\).

Another example concerns the following problem. Let \((E,\mathcal{E})\) be a measurable space and \(\Omega\) non-empty. A function \(f\colon \Omega \to E\) defines a \(\sigma\)-algebra on \(\Omega\), let’s call it \(\mathcal{F}\), defined as the weakest \(\sigma\)-algebra such that \(f\) is measurable. Let \(g\) be another function \(g\colon \Omega \to E\) and assume that \(g\) is \(\mathcal{F}\)-measurable. Is it true that \(g\) is a function of \(f\)? Namely, does a measurable \(h\colon E\to E\) such that \(g= h \circ f\) exist? One expects an affirmative answer, since for \(g\) to be \(\mathcal{F}\)-measurable, it has to be determined by \(f\). The answer is indeed yes, but hypotheses are needed: If \((E,\mathcal{E})\) is a Polish space (or measurably isomorphic to a Polish space), then indeed there exists such an \(h\).

Product of measurable spaces

Definition 9 (Product Sigma-Algebra) Let \(T\) be an arbitrary index set. For each \(t \in T\), let \((\Omega_t, \mathcal{F}_t)\) be a measurable space. For \(J \subset T\), define the product space \[ \Omega_J := \prod_{t \in J} \Omega_t \] and let \(\mathcal{F}_J = \prod_{t \in J} \mathcal{F}_t\) be the product \(\sigma\)-algebra on \(\Omega_J\), defined as the \(\sigma\)-algebra generated by the cylinder sets of the form \(\prod_{t \in J} A_t\), where \(A_t \in \mathcal{F}_t\) for all \(t \in J\) and \(A_t = \Omega_t\) for all but finitely many \(t\).

In the previous Definition 9, we restricted to a subset \(J\subset T\) so that we can also define a \(\sigma\)-algebra on \(\Omega_T\) (not just on \(\Omega_J\)) induced by a subset \(J\subset T\). This is the weakest \(\sigma\)-algebra on \(\Omega_T\) such that the canonical projection \(\pi^T_J \colon \Omega_T \to \Omega_J\) is measurable. That is, we can consider on \(\Omega_T\) the \(\sigma\)-algebra \(\{(\pi^T_J)^{-1}(A), A\in \mathcal{F}_J \}\). With a somehow confusing notation, this \(\sigma\)-algebra is usually still denoted \(\mathcal{F}_J\).

Measures

Definition 10 (Measure) Let \((\Omega, \mathcal{F})\) be a measurable space (i.e., \(\Omega\) is a set and \(\mathcal{F}\) is a \(\sigma\)-algebra on \(\Omega\)). A measure \(\mu\) on \((\Omega, \mathcal{F})\) is a function \(\mu: \mathcal{F} \to [0, \infty]\) such that:

  1. \(\mu(\emptyset) = 0\).
  2. (Countable Additivity) If \((A_n)_{n=1}^\infty\) is a sequence of pairwise disjoint sets in \(\mathcal{F}\) (i.e., \(A_i \cap A_j = \emptyset\) for \(i \neq j\)), then \(\mu\left(\bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty \mu(A_n)\).

A triple \((\Omega, \mathcal{F}, \mu)\), where \((\Omega, \mathcal{F})\) is a measurable space and \(\mu\) a measure, is called a measure space.

A measure \(\mu\) is a probability measure if \(\mu(\Omega)=1\). In such a case \((\Omega, \mathcal{F}, \mu)\) is called a probability space.

A measure \(\mu\) is \(\sigma\)-finite if there exists a countable collection of measurable sets \(\Omega_i \in \mathcal{F}\) such that \(\Omega=\bigcup_i \Omega_i\) and \(\mu(\Omega_i)<\infty\).

A set function with properties similar to a measure, but defined on a collection of set which is not necessarily a \(\sigma\)-algebra is called a content. In the literature, this is also called a premeasure, in particular when the \(\sigma\)-algebra is replaced by an algebra.

Definition 11 (Content) Let \(\Omega\) be a non-empty set and \(\mathcal{A}\) a collection of subsets of \(\Omega\) with \(\emptyset \in \mathcal{A}\). A map \(\mu\colon \mathcal{A}\to [0,\infty]\) is an outer content if

  • \(\mu(\emptyset)=0\) (or equivalently \(\mu(\emptyset)<\infty\)).
  • If \(A,A_1,\ldots,A_N \in \mathcal{A}\) with \(A\subset \cup_{i=1}^n A_i\), then \(\mu(A)\le \sum_{i=1}^n \mu(A_i)\).

\(\mu\) is a content if the last condition is replaced by the stronger

  • \(\mu(\cup_{i=1}^n A_i)= \sum_{i=1}^n \mu(A_i)\) for any finite collection \((A_i)_{i=1}^n\) of pairwise disjoint elements of \(\mathcal{A}\) such that \(\cup_{i=1}^n A_i\in \mathcal{A}\).

\(\mu\) is a \(\sigma\)-additive content if the last condition holds on countable collections:

  • \(\mu(\cup_i A_i)= \sum_i \mu(A_i)\) for any countable collection \((A_i)_i\) of pairwise disjoint elements of \(\mathcal{A}\) such that \(\cup_i A_i\in \mathcal{A}\).

Proposition 2 Let \((\Omega, \mathcal{F})\) be a measurable space, and let \(\mu\) and \(\nu\) be two probability measures on \(\mathcal{F}\). If \(\mathcal{P}\) is a \(\pi\)-system such that \(\sigma(\mathcal{P}) = \mathcal{F}\), and \(\mu(A) = \nu(A)\) for all \(A \in \mathcal{P}\), then \(\mu = \nu\) on \(\mathcal{F}\).

Proof (Proposition 2). It is easy to check that the collection of sets on which the two probability measures coincide is a \(\lambda\)-system. Since it contains a \(\pi\) system that generates \(\mathcal{F}\), they coincide on the entire \(\sigma\)-algebra by Theorem 1.

Example 6 Let \(\Omega = \mathbb{R}\), and let \(\mathcal{P} = \{(-\infty, a] : a \in \mathbb{R}\}\). \(\mathcal{P}\) is a \(\pi\)-system. The \(\sigma\)-algebra generated by \(\mathcal{P}\), \(\sigma(\mathcal{P})\), is the Borel \(\sigma\)-algebra on \(\mathbb{R}\). Let \(\mathcal{L}\) be the collection of all subsets \(A \subset \mathbb{R}\) such that \(A\) or \(A^c\) is countable or has a countable complement. Then \(\mathcal{L}\) is a \(\lambda\)-system, and \(\mathcal{P} \subset \mathcal{L}\). By the Pi-Lambda Theorem, \(\sigma(\mathcal{P}) \subset \mathcal{L}\).

Furthermore, by Proposition 2, if two probability measures \(\mu\) and \(\nu\) on \((\mathbb{R}, \mathcal{B}(\mathbb{R}))\) agree on all intervals of the form \((-\infty, a]\), then \(\mu = \nu\). The (cumulative) distribution function \(F_\mu\) of a probability measure \(\mu\) is defined by \(F_\mu(x) = \mu((-\infty, x])\), so this shows that the distribution function uniquely determines the probability measure.

Pushforward of Measures

If \((\Omega,\mathcal{F})\) and \((E,\mathcal{E})\) are measurable spaces, then a measurable \(f\colon \Omega \to E\) can be lifted to a linear map \(f_\sharp\) mapping measures on \(\Omega\) to measures on \(E\) as \[ f_\sharp \mu := \mu \circ f^{-1} \] which means, for \(A\in \mathcal{E}\) \[ (f_\sharp \mu)(A):= \mu(\{ x\in \Omega : f(x)\in A \}) \] Notice that \(f_\sharp\) maps probabilities to probabilities.

Remark. If we let \(\wp(\Omega)\) be the spaces of probability measures over \(\Omega\), we can consider an injection \(x\ni \Omega \mapsto \delta_x \in \wp(\Omega)\), where the Dirac mass at \(x\) is defined as the probability measure \(\delta_x(A)=\mathbf{1}_A(x)\). The map \(f_\sharp\) then features the following properties

  • \(f_\sharp \delta_x= \delta_{f(x)}\)
  • \(f_\sharp (\alpha \mu + (1-\alpha) \nu)= \alpha f_\sharp \mu +(1-\alpha)f_\sharp \nu\)

We will see when discussing topologies of probabilities that these two properties characterize \(f_\sharp\) among continuous maps \(\wp(\Omega)\to \wp(E)\).

Continuity of Finite Measures

Given a sequence (or a net) \((A_n)\), we can define \[ \varlimsup_n A_n:= \cap_{k\in \mathbb{N}} \cup_{n \ge k}A_n, \qquad \varliminf_n A_n := \cup_{k\in \mathbb{N}} \cap_{n \ge k}A_n \] Whenever \(\varlimsup_n A_n= \varliminf_n A_n\) we say that \(A_n\) converges and denote \(\lim_n A_n\) the limit. Notice that monotone sequences converge.

Remark. Equivalently, \(\varlimsup_n A_n\) is the set of points \(x\in \Omega\) such that \(x\in A_n\) for infinitely many \(n\). \(\varliminf_n A_n\) is the set of \(x\in \Omega\) such that \(x\not \in A_n\) for finitely many \(n\).

Proposition 3 Let \((\Omega, \mathcal{F},\mu)\) be a finite measurable space, \(A_n\) a measurable sequence. It is not hard to check that \(\varliminf_n A_n\), \(\varlimsup_n A_n\) are measurable and \[ \varliminf_n \mu(A_n) \ge \mu(\varliminf_n A_n), \qquad \varlimsup_n \mu(A_n) \le \mu(\varlimsup_n A_n) \tag{1}\] In particular whenever \(A_n\) converges, \(\lim_n \mu(A_n)=\mu(\lim_n A_n)\).

Denote now \(A\Delta B\) the symmetric difference of \(A\) and \(B\). Consider the equivalence relation on \(\mathcal{F}\) given by \(A\sim B\) whenever \(\mu(A\Delta B)=0\). It is easily seen that, if \(A\sim A'\) and \(B\sim B'\), then \(\mu(A\Delta B)= \mu(A'\Delta B')\). Thus the function \[ d_\mu(A,B):= \mu(A\Delta B) \] passes to the quotient \(\mathcal{F}/\sim\), and it is easily seen to be a distance, called the \(\mu\)-Boolean distance. From Equation 1, we see that \((\mathcal{F}/\sim,d_\mu)\) is a complete metric space.

Regular Measure Spaces

Definition 12 (Regular Measure) Let \((\Omega, \mathcal{F})\) be a measurable space where \(\Omega\) is a topological space. A measure \(\mu\) on \((\Omega, \mathcal{F})\) is

  • Outer regular if, for every \(A \in \mathcal{F}\), \[ \mu(A) = \inf\{\mu(U) : U \supset A, U \text{ open and measurable}\} \]
  • Inner regular if, for every \(A \in \mathcal{F}\) \[ \mu(A) = \sup\{\mu(C) : K \subset A, K \text{ compact and measurable}\} \]
  • regular if it is both outer and inner regular. That is, for every \(A \in \mathcal{F}\) and \(\varepsilon > 0\), there exist an open set \(U\in \mathcal{F}\) and a compact set \(K \in \mathcal{F}\) such that \(K \subset A \subset U\) and \(\mu(U \setminus K) < \varepsilon\).

If \(\Omega\) is a Polish space (a separable, completely metrizable topological space) equipped with its Borel \(\sigma\)-algebra, see Example 2, then every finite measure on \((\Omega, \mathcal{B}(\Omega))\) is regular. Similar statements hold on compact metric spaces, and on locally compact second-countable Hausdorff spaces. More generally, every locally finite measure on a Polish space is regular. In other words, under mild assumptions on the topology of \(\Omega\), regularity of all finite Borel measures holds.

Complete Measure Spaces

Definition 13 (Negligible set) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. A subset \(A\subset \Omega\) is called negligible if it is contained in a set of measure \(0\).

Definition 14 (Complete Measure Space) A measure space \((\Omega, \mathcal{F}, \mu)\) is complete if every negligible set is measurable (and has necessarily measure \(0\)).

Definition 15 (Almost Everywhere Equivalence) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. We say that a property holds almost everywhere (\(\mu\)-a.e., or simply a.e. when the measure is clear from context) if it holds for all \(\omega \in \Omega \setminus N\), where \(N\) is a \(\mu\)-negligible set.

Two measurable functions \(f, g : \Omega \to \mathbb{R}\) are almost everywhere equivalent (or \(\mu\)-a.e. equivalent) if \(f(\omega) = g(\omega)\) for \(\mu\)-a.e. \(\omega \in \Omega\); that is, if \(\mu(\{\omega : f(\omega) \neq g(\omega)\}) = 0\). We often treat a.e. equivalent functions as identical, especially in the context of integration.

Proposition 4 (Completion of a Measure Space) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. Then there exists a complete measure space \((\Omega, \overline{\mathcal{F}}, \overline{\mu})\), called the completion of \((\Omega, \mathcal{F}, \mu)\), such that:

  1. \(\mathcal{F} \subset \overline{\mathcal{F}}\).
  2. \(\overline{\mu}(A) = \mu(A)\) for all \(A \in \mathcal{F}\).
  3. For every \(E \in \overline{\mathcal{F}}\), there exist sets \(A, B \in \mathcal{F}\) such that \(A \subset E \subset B\) and \(\mu(B \setminus A) = 0\).

Moreover, the completion is unique in the following sense: If \((\Omega, \mathcal{F}', \mu')\) is another complete measure space satisfying (1), (2) and (3), then \(\mathcal{F}' = \overline{\mathcal{F}}\) and \(\mu' = \overline{\mu}\).

Proof (Proposition 4). Define \[ \overline{\mathcal{F}} = \{E \cup N : E \in \mathcal{F}, N \subset A \text{ for some } A \in \mathcal{F} \text{ with } \mu(A) = 0\}. \tag{2}\] In other words, \(\overline{\mathcal{F}}\) consists of sets that can be formed by taking a measurable set \(E\) and adding a negligible set. \(\overline{\mathcal{F}}\) is a \(\sigma\)-algebra since:

  1. \(\Omega \in \mathcal{F} \subset \overline{\mathcal{F}}\).
  2. Let \(E \cup N \in \overline{\mathcal{F}}\), where \(E \in \mathcal{F}\), \(N \subset A\), and \(\mu(A) = 0\). Then \[ (E \cup N)^c = E^c \cap N^c (E^c \cap A^c)\cup (E^c \cap (A\setminus N)) \tag{3}\] and \(E^c \cap A^c \in \mathcal{F}\) while \((E^c \cap (A\setminus N))\subset A\) is negligible.
  3. Let \((E_i \cup N_i)_i\) be a sequence in \(\overline{\mathcal{F}}\), where \(E_i \in \mathcal{F}\), \(N_i \subset A_i\), and \(\mu(A_i) = 0\). Then \(\cup_i (E_i \cup N_i) = (\cup_i E_i) \cup (\cup_i N_i )\) is also in \(\overline{\mathcal{F}}\) since \(\mu(\cup_i A_i)=0\).

We next define \(\overline{\mu}\) on \(\overline{\mathcal{F}}\) by \[ \overline{\mu}(E \cup N) = \mu(E) \tag{4}\] for \(E \in \mathcal{F}\) and \(N \subset A\) with \(\mu(A) = 0\). It is easy to see that \(\overline{\mu}\) is well-defined: suppose \(E_1 \cup N_1 = E_2 \cup N_2\), where \(E_1, E_2 \in \mathcal{F}\), \(N_1 \subset A_1\), \(N_2 \subset A_2\), and \(\mu(A_1) = \mu(A_2) = 0\). Then \(E_1 \subset E_2 \cup A_2\), thus \(\mu(E_1 \setminus E_2) = 0\) and similarly \(\mu(E_2 \setminus E_1) = 0\). Thus \[ \mu(E_1) = \mu(E_1 \cap E_2) = \mu(E_2). \tag{5}\] This shows that \(\overline{\mu}\) is well-defined. Now, we show that \(\overline{\mu}\) is a measure on \(\overline{\mathcal{F}}\).

  1. \(\overline{\mu}(\emptyset) = \mu(\emptyset) = 0\).
  2. Let \((E_i \cup N_i)_i\) be a sequence of pairwise disjoint sets in \(\overline{\mathcal{F}}\). Then \[ \overline{\mu}\left(\cup_i (E_i \cup N_i)\right) = \mu\left(\cup_i E_i \right) = \sum_i \mu(E_i) = \sum_i \overline{\mu}(E_i \cup N_i). \tag{6}\] Therefore \(\overline{\mu}\) is a measure. It is also clear from construction that is complete. The points (1), (2), (3) are trivial, as well as uniqueness.

Compactly approximated measures

We first introduce the notion of compact class.

Definition 16 (Compact Class) A family \(\mathcal{K}\) of subsets of a set \(\Omega\) is called a compact class if, for any sequence \((K_n)\) of elements of \(\mathcal{K}\) with \(\cap_n K_n = \emptyset\), there exists a finite \(N\) such that \(\bigcap_{n=1}^N K_n = \emptyset\).

Remark. A well known fact in elementary topology (the finite intersection property), states that, in a Hausdorff topological space, the collection of compact subsets forms a compact class. That is the origin of the name compact class.

Definition 17 Let \(\Omega\) be a non-empty set, \(\mathcal{A}\) a collection of subsets of \(\Omega\), \(\mu\) an outer content on \((\Omega,\mathcal{A})\) and \(\mathcal{K}\) a compact class of subsets of \(\Omega\). We say that \(\mu\) is compactly approximated by \(\mathcal{K}\) if for every \(A\in \mathcal{A}\) and \(\varepsilon>0\), there exist \(K_\varepsilon \in \mathcal{K}\), \(A_\varepsilon \in \mathcal{A}\) such that \(A \setminus K_\varepsilon \subset A_\varepsilon\) and \(\mu(A_\varepsilon)\le \varepsilon\).

Example 7 An inner regular measure on a topological space is compactly approximated by the collection of compact sets, see Definition 12.

Lemma 1 A compactly approximated (by some compact class \(\mathcal{K}\)) outer content is continuous at \(0\). Namely, if \(A_n\in \mathcal{A}\) is a decreasing sequence with \(\cap_n A_n=\emptyset\), then \(\lim_n \mu(A_n)=0\).

Proof (Lemma 1). Fix \(\varepsilon>0\), and let \(K_n \in \mathcal{K}\) and \(B_n\in \mathcal{A}\) be such that \(K_n\subset A_n\), \(A_n\setminus B_n=\emptyset\) and \(\mu(B_n)\le \varepsilon 2^{-n}\). Then \(\cap_n K_n \subset \cap_n A_n =\emptyset\). Thus for some finite \(N\), \(\cap_{n=1}^N K_n=\emptyset\). Therefore \[ A_N \cap (\cup_{n=1}^N B_n)^c = \cap_{n=1}^N (A_n\cap B_n^c) \subset \cap_{n=1}^N (A_n\cap (A_n^c \cup K_n)) \subset \cap_{n=1}^N K_n =\emptyset \] Namely \(A_N \subset \cup_{n=1}^N B_n\) and since \(\mu\) is an outer content \[ \mu(A_N) \le \sum_{n=1}^N \mu(B_n)\le \varepsilon \]

Extension Theorems

In this section we briefly described some results that easily allow us to define measures on a space, by only defining their values on a specific class of sets. Let us consider the following problems.

  • We want to define a measure on \(\mathbb{R}\). The most intuitive approach would consist in defining its values on intervals such as \((a,b]\) (for \(\sigma\)-finite measures) or \((-\infty,a]\) (for finite measures, since the measure of this interval could be identically \(\infty\) for non-finite measures). Would these values ‘be enough’ to characterize the measure on the whole \(\sigma\)-algebra they generate? Caratheodory’s Extension Theorem Theorem 4 answers this question (positively).
  • We want to define a measure on functions, say on the space \(E^{\mathbb{N}}=\{X : \mathbb{N}\to E\}\). Certainly we need to know the measure of sets of the form \(\{X \in E^{\mathbb{N}} : (X_1\in A_1, X_2 \in A_2,\ldots, X_n \in A_n) \}\). Since these sets are charaterized by finitely many conditions \(X_i\in A_i\), in many cases their measure can be described explicitely. If we know the measure on sets of this form, can we extend it to a measure on the whole product \(\sigma\)-algebra of \(E^{\mathbb{N}}\)? Kolmogorov’s Extension Theorem Theorem 5 answers this question (positively under mild conditions).

Caratheodory’s Extension Theorem

TipTakeaway of Section 2.1

In order to define a measure \(\mu\) on a measurable space \((\Omega,\mathcal{F})\) it is enough to define:

  • A quasi semiring \(\mathcal{S}\), that generates the \(\sigma\)-algebra \(\mathcal{F}\).
  • A content \(\mu_0\) on \(\mathcal{S}\), that identifies \(\mu\) on the quasi semiring.
  • A compact class \(\mathcal{K}\) on \(\Omega\), that compactly approximates \(\mu_0\) on \(\mathcal{S}\).

Moreover if \(\mu_0\) is \(\sigma\)-finite on \(\mathcal{S}\), the measure \(\mu\) on \((\Omega,\mathcal{F})\) that coincides with \(\mu_0\) on \(\mathcal{S}\) is unique.

Examples: \(\Omega=\mathbb{R}\), \(\mathcal{F}\) is the Borel \(\sigma\)-algebra, \(\mathcal{S}\) is the collection of intervals \((a,b]\), \(\mathcal{K}\) is the class of compact sets (or simply compact intervals).

Definition 18 (Quasi Semiring) A quasi semiring of sets \(\mathcal{S}\) on a set \(\Omega\) is a collection of subsets of \(\Omega\) that satisfies:

  1. \(\emptyset \in \mathcal{S}\).

  2. If \(A, B \in \mathcal{S}\), than both \(A\cap B\) and \(A\cap B^c\) are finite disjoint unions of elements of \(\mathcal{S}\). Namely there exist pairwise disjoint sets \(B_1,\ldots, B_n\) and pairwise disjoint \(C_1, \ldots, C_n \in \mathcal{S}\) such that \(A\cap B= \cup_i B_i\), \(A \cap B^c = \cup_i C_i\).

Theorem 4 (Caratheodory’s Extension Theorem) Let \(\mathcal{S}\) be a quasi semiring of subsets of \(\Omega\), and let \(\mu_0: \mathcal{S} \to [0, \infty]\) be a \(\sigma\)-additive content, see Definition 11. Then there exists a measure \(\mu\) (called the extension of \(\mu_0\)) on \(\sigma(\mathcal{S})\) such that \(\mu(A) = \mu_0(A)\) for all \(A \in \mathcal{S}\). If \(\mu_0\) is \(\sigma\)-finite, then the extension is unique.

Proof (Theorem 4). We give a sketch of the proof, which can be found in (Patriota 2011).

  1. Outer content. For any \(E \subset \Omega\), define the outer content \[ \mu^*(E) = \inf\left\{ \sum_{i=1}^\infty \mu_0(A_i) \mid A_i \in \mathcal{S}, E \subset \cup_{i=1}^\infty A_i \right\}. \tag{7}\] where the infimum of the empty set is understood to be \(+\infty\). It is easy to see that \(\mu^*(\emptyset)=0\), and that \(\mu^*\) is monotone, namely \(\mu^*(A) \le \mu^*(B)\) if \(A\subset B\). We next show that \(\mu^*(E) \le \sum_{n=1}^\infty \mu^*(E_n)\) (subadditivity). Let \(E = \bigcup_{n=1}^\infty E_n\). Without loss of generality, assume that \(\mu^*(E_n) < \infty\) for all \(n\). Fix \(\varepsilon > 0\). For each \(n\), there exists a sequence \((A_{n,i})_{i\in \mathbb{N}}\) in \(\mathcal{S}\) such that \(E_n \subset \cup_i A_{n,i}\) and \(\sum_i \mu_0(A_{n,i}) < \mu^*(E_n) + 2^{-n-1}\varepsilon\). Then \(E \subset \cup_n \cup_i A_{n,i}\), so \[ \mu^*(E) \le \sum_n \sum_i \mu_0(A_{n,i}) < \sum_n (\mu^*(E_n) + 2^{-n-1}\varepsilon) = \sum_n \mu^*(E_n) + \varepsilon \tag{8}\]

  2. The \(\sigma\)-algebra \(\mathcal{F}^*\). Define the collection of sets \[ \mathcal{F}^\ast:= \{ E \subset \Omega : \mu^*(A) = \mu^*(A \cap E) + \mu^*(A \cap E^c), \forall A\subset \Omega\} \tag{9}\] Since \(\mu^*\) is subadditive, in the last formula the equality can be equivalently replaced with \(\mu^*(A) \ge \mu^*(A \cap E) + \mu^*(A \cap E^c)\). It is then an elementary task to check that \(\mathcal{F}^\ast\) is a \(\sigma\)-algebra.

  3. \(\mathcal{F}^* \supset \mathcal{S}\). First notice that, in the Equation 9, the covering \(A_i\) can be taken to be disjoint. With this in mind, one can prove that for \(\mu^*(A) = \mu^*(A \cap E) + \mu^*(A \cap E^c)\) for all \(A \in \Omega\) and \(E\in \mathcal{S}\).

  4. \(\mu^*\) restricted to \(\mathcal{F}^*\) is a measure. We skip the proof of this lengthy but elementary check.

  5. Extension existence. Since \(\mu^*\) is a measure on the \(\sigma\)-algebra \(\mathcal{F}^*\) containing \(\mathcal{S}\), it induces a measure on \(\sigma(\mathcal{S})\). We need to show \(\mu^*(S) = \mu_0(S)\) for all \(S \in \mathcal{S}\). Clearly, \(\mu^*(S) \le \mu_0(S)\) by definition, while we skip the elementary but lengthy prove of the opposite inequality.

  6. Extension uniqueness. Assume \(\mu_0\) is \(\sigma\)-finite, with \(\Omega = \cup_{n=1}^\infty \Omega_n\), \(\Omega_n \in \mathcal{S}\) increasing, and \(0<\mu_0(\Omega_n) < \infty\). Let \(\mu\) and \(\nu\) be two extensions of \(\mu_0\) to \(\sigma(\mathcal{S})\). Since \(\mathcal{S}\) is a q\(\pi\)-system, \(\mu\) and \(\nu\) coincide on \(\Omega_n\), namely \(\mu(A\cap \Omega_n)=\nu(A\cap \Omega_n)\) for \(A\in \sigma(\mathcal{S})\), in view of Theorem 1. Since we assumed the \(\Omega_n\) monotone, one then deduces the equality taking the limit in \(n\).

Remark. It is worth noting that for the uniqueness of the extension, it is necessary that \(\mu_0\) is \(\sigma\)-finite. It is not enough (for uniqueness) that an extension of \(\mu_0\) is \(\sigma\)-finite.

Proposition 5 If a content \(\mu\) on quasi semiring \(\mathcal{S}\) is compactly approximated, then it is \(\sigma\)-additive.

Proof (Proposition 5). Suppose that \(A_1,\ldots,A_m \in \mathcal{S}\), \(B_1,\ldots,B_n \in \mathcal{S}\), are two pairwise disjoint collections of elements of \(\mathcal{S}\) such that \(A:= \cup_i A_i = \cup_j B_j\). Then \(C_{i,j}:=A_i \cap B_j\) can be written as a disjoint union of \(K_{i,j}\) elements \(C_{i,j,k}\) of \(\mathcal{S}\). Therefore, since \(\mu\) is additive \[ \sum_{i} \mu(A_i)= \sum_{i,j} \sum_k \mu(C_{i,j,k})=\sum_j \mu(B_j):= \mu(A) \] This allows us to consistently extend \(\mu(\cdot)\) to a content on the collection \(\mathcal{S}^u\) of disjoint unions of elements of \(\mathcal{S}\). We can define an outer content \(\mu^\ast\) on the collection \(\mathcal{S}^\ast\) of finite (possibly non disjoint) unions of elements of \(\mathcal{S}\), as in Equation 7. We have already remarked in the proof of Theorem 4, that in Equation 7 we can restrict to pairwise disjoint \(A_i\), see (Patriota 2011, Proposition 1.1), therefore \(\mu^\ast(A)=\mu(A)\) for \(A\in \mathcal{S}^u\).

It is easy to check that \(\mu^\ast\) on \(\mathcal{S}^\ast\) can be compactly approximated (see (Bogachev 2007 Proposition 1.12.4)). From Lemma 1, \(\mu^\ast\) is continuous at \(0\) in \(\mathcal{S}^u\).

Let now \(A_i\in \mathcal{S}\) be a pairwise disjoint sequence such that \(\cup_i A_i=:A\in \mathcal{S}\). Then \(\cup_{i>n} A_i= \cup_{i \le n} (A \cap A_i^c)\) and each \(A \cap A_i^c\) is in \(\mathcal{S}^u\) since \(\mathcal{S}\) is a quasi semi-ring. Therefore \(\cup_{i>n} A_i \in \mathcal{S}^\ast\) and \[ \mu(\cup_{i\ge 1} A_i) = \mu^\ast(\mu(\cup_{i\ge 1} A_i)) \le \mu^\ast(\cup_{i> n} A_i) + \mu^\ast(\cup_{i\le n} A_i) = \mu^\ast(\cup_{i>n} A_i) + \sum_{i} \mu(\cup_{i\le n} A_i) \] and \(\mu^\ast(\cup_{i>n} A_i) \to 0\) as \(n\to \infty\).

Example 8 (Lebesgue measure) On \(\Omega=\mathbb{R}\), consider the semiring \(A\) of intervals \([a,b)\), and the content \(\hat{\lambda}([a,b))=b-a\). Then \(\hat{\lambda}\) is compactly approximated by the compact class of finite closed intervals, thus it is \(\sigma\)-additive on \(A\) by Proposition 5. By Theorem 4, \(\hat{\lambda}\) extends uniquely to a \(\sigma\)-additive measure \(\lambda_{\mathrm{Borel}}\) on the Borel \(\sigma\)-algebra of \(\mathbb{R}\). Recalling Proposition 4, the completion of the Borel \(\sigma\)-algebra w.r.t. \(\lambda_{\mathrm{Borel}}\) is called the Lebesgue \(\sigma\)-algebra; the completion \(\lambda\) of \(\lambda_{\mathrm{Borel}}\) is called the Lebesgue measure.

Kolmogorov’s Extension Theorem

TipTakeaway of Section 2.2

If we want to define a probability measure on a space of functions or sections, it is enough to define its finite-dimensional projections, provided these projections are regular enough (compactly approximated). This is a minimal requirement which always holds on reasonable topological spaces, like Polish spaces.

Kolmogorov’s Extension Theorem provides conditions for the existence of a probability measure on an infinite product space, given a consistent family of measures on finite-dimensional projections.

Theorem 5 (Kolmogorov’s Extension Theorem) Let \(T\) be an arbitrary index set. For each \(t \in T\), let \((\Omega_t, \mathcal{F}_t)\) be a measurable space. For \(J \subset T\), define the product space \[ \Omega_J := \prod_{t \in J} \Omega_t \tag{10}\] and let \(\mathcal{F}_J\) be the product \(\sigma\)-algebra on \(\Omega_J\), see Definition 9. For subsets \(I \subset J \subset T\), let \(\pi^J_I: \Omega_J \to \Omega_I\) denote the canonical projection map.

Suppose that for each finite subset \(J \Subset T\), we are given a compactly approximated (see Definition 17) probability measure \(\mu_J\) on \((\Omega_J, \mathcal{F}_J)\). Then the following are equivalent

  • The consistency condition holds, namely for any finite subsets \(I \subset J \Subset T\) \[ \mu_I = (\pi^J_I)_* \mu_J, \tag{11}\] where \((\pi^G_F)_* \mu_G\) is the pushforward measure defined by \(((\pi^G_F)_* \mu_G)(A) = \mu_G((\pi^G_F)^{-1}(A))\) for \(A \in \mathcal{F}_F\).
  • There exists a (necessarily unique) probability measure \(\mu\) on the product space \(\Omega_T := \prod_{t \in T} \Omega_t\) such that \[ \mu_J = (\pi^T_J)_* \mu \tag{12}\] for all finite subsets \(J \Subset T\).

Remark 1. The consistency condition is clearly necessary for the existence of a measure \(\mu\) on \(\Omega_T\) satisfying Equation 12. Indeed its pushforwards \((\mu_F)_{F\Subset T}\) defined by the canonical projections trivially satisfy Equation 11, since \(\pi^F_G \circ \pi^T_F=\pi^T_G\) and therefore \[ \mu_F := \mu \circ (\pi^T_G)^{-1} = \mu \circ (\pi^T_F)^{-1} \circ (\pi^F_G)^{-1} = \mu_G \circ (\pi^G_F)^{-1} \tag{13}\] The theorem thus states that the compactly approximated condition implies the opposite implication.

Proof (Theorem 5). Let \(\mathcal{R}\) be the semiring of subsets of \(\Omega_T\) of the form \[ \prod_{t\in J} A_t \times \prod_{t\in T\setminus J} \Omega_t \] for any finite \(J\Subset T\) and \((A_t\in \mathcal{F}_t)_{t\in J}\). We define a set function \(\mu\) on \(\mathcal{R}\) by \(\mu(C^J_B) = \mu_J(B)\). The consistency condition ensures that \(\mu\) is well-defined and that \(\mu\) is a content, see Definition 11 on \(\mathcal{R}\).

Let \(\mathcal{K}\) be the collection of subsets of \(\Omega_T\) of the form \[ \prod_{t\in J} K_t \times \prod_{t\in T\setminus J} \Omega_t \] for any finite \(J\Subset T\) and \((K_t\in \mathcal{K}_t)_{t\in J}\). \(\mathcal{K}\) is easily seen to be a compact class, see Definition 16. Moreover \(\mu\) is compactly approximated by \(\mathcal{K}\) on \(\mathcal{R}\). By Lemma 1, \(\mu\) is continuous at \(0\) on \(\mathcal{R}\). This in turn yields that \(\mu\) is \(\sigma\)-additive in view of Proposition 5. Since \(\mathcal{R}\) is a semiring, and thus a quasi semiring, we conclude by Theorem 4.

Integration and Lebesgue Spaces

TipTakeaway of Section 3

Measure-theoretic integrals provide a general definition that includes sums, series, classical Riemann integrals, expected values of random variables, and more. It works as one would expect, that is like elementary sums (e.g. it is linear, monotone). The precise mathematical characterization however, allows us to be very precise with convergence, see Theorem 12.

Measure-theoretic integral

Let \((\Omega, \mathcal{F}),\mu\) be a measure space and let \(f\colon \Omega \to [0,\infty)\). If \(f = \sum_j a_j 1_{A_j}\) takes finitely-many values \(a_1,\ldots,a_n\), respectively on the measurable sets \(A_1,\ldots,A_n\), then \(f\) is called simple or discrete. For such functions we set \[ \mu(f) \equiv \int f d\mu := \sum_j a_j \mu(A_j) \] \(\mu(f)\) is easily seen to depend on \(f\) and not on the values \((a_j)\) and measurable \((A_j)\) used to represent it as a sum.

For any (not necessarily simple) function \(f\colon \Omega \to [0,\infty]\), we have two reasonable definitions for the integral of \(f\) with respect to \(\mu\): \[ \mu(f) \equiv \int_\Omega f(x) \, d\mu(x) := \sup \left\{ \mu(g) \st g\le f, g \text{ simple} \right\} \tag{14}\] \[ \mu(f) \equiv \int_\Omega f(x) \, d\mu(x) := \inf \left\{ \mu(g) \st g\ge f, g \text{ simple} \right\} \tag{15}\]

Definition 19 (Integral) If \(f\colon \Omega \to [0,\infty]\) is measurable the r.h.s. of Equation 14 and Equation 15 coincide (as can be easily checked), and the quantity \(\mu(f)\) is called the integral of \(f\) with respect to \(\mu\).

If \(f=f^+ - f^-\) is not necessarily positive, one then sets \(\mu(f)=\mu(f^+)-\mu(f^-)\), provided the two integrals are not both \(+\infty\).

If \(\mu(|f|)<\infty\), \(f\) is called \(\mu\)-integrable, or integrable if the measure \(\mu\) is understood from the context.

It is immediate to verify that the integral is

  • linear: \(\mu(\alpha \,f+g)=\alpha \mu(f)+g\).
  • monotone: if \(f\ge g\) then \(\mu(f)\ge \mu(g)\).
  • compatible with the \(\mu\)-a.e. equivalence relation: if \(f=g\) \(\mu\)-a.e. then \(\mu(f)=\mu(g)\).

The notation \(\mu(f)\) is particularly useful to stress the linearity of the operation, as in some context measures and functions are in duality. Notice that the integral may be \(+\infty\) even if \(f\) is finite.

Example 9 If \(\Omega\) is a countable space endowed with the discrete \(\sigma\)-algebra (or more in general if the measurable space \((\Omega,\mathcal{F})\) has at most countably many atoms), then the integral writes as a (possibly finite, at most countable) \[ \mu(f):=\sum_{x\in \Omega} f(x) \mu(\{x\}) \]

Example 10 If \(\Omega=\mathbb{R}\), \(\mu\) is the Lebesgue measure and \(f\) is Riemann integrable, then \(\mu(f)=\int f(x) dx\) coincides with the classical Riemann integral.

Theorem 6 (Fubini’s Theorem) Let \((\Omega_1, \mathcal{F}_1, \mu_1)\) and \((\Omega_2, \mathcal{F}_2, \mu_2)\) be two \(\sigma\)-finite measure spaces, and let \((\Omega_1 \times \Omega_2, \mathcal{F}_1 \otimes \mathcal{F}_2, \mu_1 \otimes \mu_2)\) be the product measure space. Let \(f\colon \Omega_1 \times \Omega_2 \to \mathbb{R}\) be a measurable function.

If \(f\) is integrable \(\int_{\Omega_1 \times \Omega_2} |f| \, d(\mu_1 \otimes \mu_2) < \infty\) then the iterated integrals are equal to the integral over the product space: \[ \int_{\Omega_1} \left( \int_{\Omega_2} f(x,y) \, d\mu_2(y) \right) d\mu_1(x) = \int_{\Omega_2} \left( \int_{\Omega_1} f(x,y) \, d\mu_1(x) \right) d\mu_2(y) = \int_{\Omega_1 \times \Omega_2} f(x,y) \, d(\mu_1 \otimes \mu_2)(x,y) \]

Lebesgue spaces

Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. For \(p\in (0,\infty]\), the quotient space (here \(\sim\) denotes the a.e. equivalence Definition 15) \[ \begin{aligned} & L^p(\Omega, \mathcal{F}, \mu) := \left\{ f \colon \Omega \to \mathbb{R} \text{ measurable } \st \int |f|^p d\mu <\infty \right\} / \sim, \qquad p<\infty \\ & L^\infty(\Omega, \mathcal{F}, \mu):= \left\{ f \colon \Omega \to \mathbb{R} \text{ measurable } \st \exists C>0 \st |f| \le C \text{ $\mu$-a.e.} \right\} / \sim, \qquad p=\infty \end{aligned} \] is called the Lebesgue space with exponent \(p\).

Proposition 6 For \(p\in [1,\infty]\), \(L^p(\Omega, \mathcal{F}, \mu)\) is a Banach space when equipped with the norm \[ \begin{aligned} & \|f\|_{L^p}:= \left( \int |f|^p d\mu \right)^{1/p}, \qquad p<\infty \\ & \|f\|_{L^\infty}:= \inf \left\{C \ge 0 \st |f| \le C \text{ $\mu$-a.e.} \right\}, \qquad p=\infty \end{aligned} \] Moreover if \(\mu(\Omega)<\infty\), \(\|f\|_{L^\infty}=\lim_{p\to \infty} \|f\|_{L^p}\).

Decomposition of Measures

TipTakeaway of Section 4

If \(\mu\) is a reference measure on a measurable space, we can decompose any other measure as follows. \[ \nu = \varrho \,\mu + \nu^s \] where \(\nu^s\) is singular with respect to \(\mu\), in other words \(\nu\) only weights sets of \(\mu\)-measure \(0\). While \(\varrho \mu\) should be interpreted as in Equation 16.

Definition 20 (Absolutely Continuous and Singular Measures) Let \((\Omega, \mathcal{F})\) be a measurable space. A measure \(\nu\) is said to be absolutely continuous with respect to a measure \(\mu\) (denoted \(\nu \ll \mu\)) if for every \(A \in \mathcal{F}\), \(\mu(A) = 0\) implies \(\nu(A) = 0\). If \(\nu \ll \mu\) and \(\mu \ll \nu\), namely if they have the same sets of \(0\) measure, they are called equivalent.

On the opposite side, \(\mu\) and \(\nu\) are mutually singular (denoted \(\mu \perp \nu\)) if there exists a set \(A \in \mathcal{F}\) such that \(\mu(A) = 0\) and \(\nu(A^c) = 0\).

Remark. On \(\mathbb{R}\) with the Borel \(\sigma\)-algebra, a measure \(\nu\) is absolutely continuous with respect to Lebesgue measure \(\mu\) if and only if for every \(\varepsilon > 0\), there exists \(\delta > 0\) such that for any finite collection of disjoint intervals \((a_i, b_i)\) with \(\sum_i (b_i - a_i) < \delta\), we have \(\sum_i \nu((a_i, b_i)) < \varepsilon\). This corresponds to the elementary notion of absolute continuity for functions.

Let \((\Omega, \mathcal{F}, \mu)\) be a \(\sigma\)-finite measure space, and \(\varrho \colon \Omega \to [0, \infty]\) is measurable, then the function \[ \nu(A) := \int_A \varrho\,d\mu, \quad A \in \mathcal{F} \tag{16}\] is a measure \((\Omega, \mathcal{F})\), as can be easily checked using the monotone convergence Theorem 10. It satisfies \[ \int f d\nu = \int f \varrho d\mu \quad f\ge 0, \text{ measurable} \tag{17}\] This measure is denoted by \(\nu=\varrho \mu\) or, in view of Equation 17, the notation \(d\nu=\varrho d\mu\) is also used.

Theorem 7 (Radon-Nikodym derivative) Let \((\Omega, \mathcal{F}, \mu)\) be a \(\sigma\)-finite measure space, and let \(\nu\) be another measure on \((\Omega, \mathcal{F})\). The following are equivalent

  • \(\nu\) is absolutely continuous with respect to \(\mu\), \(\nu \ll \mu\).
  • There exists a measurable function \(\varrho \colon \Omega \to [0, \infty]\) such that \(\nu=\varrho \mu\), namely Equation 16 holds.

The function \(\varrho\) is called the Radon-Nikodym derivative of \(\nu\) with respect to \(\mu\), and is denoted by \(\tfrac{d\nu}{d\mu}\), and it is unique up to \(\mu\)-a.e. identification.

Moreover, if any of the two equivalent conditions above holds, the following are also equivalent

  • \(\varrho\) is finite, namely takes values in \([0,\infty)\) (up to its irrelevant definition on a \(\mu\)-negligible set).
  • \(\nu\) is \(\sigma\)-finite.

Finally, for \(\sigma\)-finite measures \(\nu \ll \mu \ll \lambda\), the chain rule holds \[ \frac{d\nu}{d\lambda} = \frac{d\nu}{d\mu}\frac{d\mu}{d\lambda} \tag{18}\]

Proof (Theorem 7). We give a short sketch of the proof, we refer to (Bogachev 2007, 3.2; Conway 2012, 113) for the interested reader. If \(\nu=\varrho \mu\), then certainly \(\nu(A)=0\) whenever \(\mu(A)=0\). So we only need to prove that, if \(\nu \ll \mu\), there exists \(\varrho\) such that \(\nu =\varrho \mu\). We can also restrict to the case where \(\mu\) is finite, since one can prove the theorem for the restriction of \(\mu\) to sets of finite measure, and get the \(\sigma\)-finite case as an immediate consequence. We also restrict to \(\nu\) being finite for the sake of simplicity.

Consider the set \(\mathcal{E}:= \{f \in L^0(\Omega,\mu) : f \mu \le \nu \}\) and define \[ \varrho(x) = \sup_{f\in \mathcal{E}} f(x) \] \(\mathcal{E}\) is non-empty since the zero function is in \(\mathcal{E}\). Moreover if \(f_1,f_2 \in \mathcal{E}\), then \(\max(f_1,f_2)\in \mathcal{E}\). From this, one can deduce that \(\varrho \in \mathcal{E}\). Therefore one only needs to check that \(\varrho \mu(A) - \nu(A) \ge 0\) for all \(A\in \mathcal{F}\), to conclude.

Theorem 8 (Lebesgue Decomposition Theorem) Let \((\Omega, \mathcal{F}, \mu)\) be a \(\sigma\)-finite measure space, and let \(\nu\) be another \(\sigma\)-finite measure on \((\Omega, \mathcal{F})\). Then there exists a unique decomposition \(\nu = \nu_{ac} + \nu_s\), where \(\nu_{ac}\) and \(\nu_s\) are \(\sigma\)-finite measures on \((\Omega, \mathcal{F})\) such that \(\nu_{ac} \ll \mu\) (absolutely continuous) and \(\nu_s \perp \mu\) (singular).

Proof (Theorem 8). Since \(\nu\) and \(\mu\) are \(\sigma\)-finite, so is \(\nu+\mu\). Moreover, \(\mu \ll \nu+\mu\). By the Radon-Nikodym theorem, there is a measurable function \(\varrho\) such that \[ \mu(A) = \int_A \varrho \, d(\nu+\mu), \qquad \forall A \in \mathcal{F} \tag{19}\] Let \(B = \{\omega: \varrho(\omega) = 0\}\) and define \(\nu_{ac}\) and \(\nu_{s}\) by \[ \nu_{ac}(A) = \nu(A\cap B^c), \qquad \nu_s(A) = \nu(A\cap B), \qquad \forall A\in \mathcal{F} \tag{20}\]

Clearly \(\nu = \nu_{ac}+\nu_s\), and \(\nu_s(B^c)=0\) and \(\mu(B)=0\). That is \(\nu_s\) and \(\mu\) are singular.

Let now \(A\) be such that \(\mu(A)=0\). Then \[ 0=\mu(A\cap B_c)=\int_{A\cap B^c} h d(\nu+\mu)\ge \int_{A\cap B^c} h d\nu \] Since \(h>0\) on \(B^c\), necessarily \(\nu(A\cap B^c)=0\). Namely \(\nu_{ac}\ll \mu\).

To show uniqueness, suppose \(\nu = \nu'_{ac} + \nu'_s\) is another such decomposition. Then \(\nu_{ac} - \nu'_{ac} = \nu'_s - \nu_s\). The left-hand side is absolutely continuous with respect to \(\mu\), while the right-hand side is singular with respect to \(\mu\). The only measure that is both absolutely continuous and singular with respect to \(\mu\) is the zero measure.

Convergence results

Considering \(\sigma\)-additivity measures is a strong restiction when compared to just additive measures. The reason for developing such a theory lies in the ensuing powerful results about convergence.

Definition 21 (Convergence of measurable functions) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space and let \((E,d)\) be a metric space (equipped with its Borel \(\sigma\)-algebra). A sequence (or net) \((f_n)\) of measurable functions \(f_n\colon \Omega \to E\) converges to \(f\)

  • \(\mu\)-a.e. if \(f_n(x)\to f(x)\) outside a set of measure \(0\). Namely \[ \mu\left( \{x\in \Omega \st \varlimsup_n d(f_n(x),f(x))\neq 0 \} \right)=0 \]
  • in \(\mu\)-measure if for each \(\varepsilon>0\) \[ \varlimsup_n \mu\left( \{x\in \Omega \st \varlimsup_n d(f_n(x),f(x))> \varepsilon \} \right)=0 \]
  • in \(L^p(\Omega, \mathcal{F}, \mu)\) if \[ \lim_n \int d(f_n(x),f(x))^p d\mu(x)=0 \]

Theorem 9 Let \((\Omega, \mathcal{F}, \mu)\), \((E,d)\) \(f_n,f\) as in Definition 21. The following holds

  • If \(f_n\to f\) \(\mu\)-a.e., then \(f_n\to f\) in measure.
  • If \(f_n\to f\) in \(L^p\), then \(f_n\to f\) in measure.
  • If \(f_n\to f\) in measure, then there exists a subsequence \(f_{n_k}\) that converges to \(f\) \(\mu\)-a.e..
  • If \(f_n\to f\) in \(L^p\), then there exists a subsequence \(f_{n_k}\) that converges to \(f\) \(\mu\)-a.e..

Theorem 10 (Monotone Convergence Theorem) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. If \((f_n)_{n=1}^\infty\) is a sequence of non-negative measurable functions such that \(f_n(\omega) \uparrow f(\omega)\) for \(\mu\)-a.e. \(\omega \in \Omega\), then \(\int f_n \, d\mu \uparrow \int f \, d\mu\).

Theorem 11 (Dominated Convergence Theorem) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space. If \((f_n)_{n=1}^\infty\) is a sequence of measurable functions such that \(f_n(\omega) \to f(\omega)\) for \(\mu\)-a.e. \(\omega \in \Omega\), and there exists an integrable function \(g\) (i.e., \(\int |g| \, d\mu < \infty\)) such that \(|f_n(\omega)| \leq g(\omega)\) for all \(n\) and \(\mu\)-a.e. \(\omega\), then \(\int f_n \, d\mu \to \int f \, d\mu\).

Theorem 12 (Vitali Convergence Theorem (finite measure)) Let \((\Omega, \mathcal{F}, \mu)\) be a finite measure space, and let \((f_n)_{n=1}^\infty\) be a sequence of functions in \(L^p(\Omega, \mathcal{F}, \mu)\) for \(1 \le p < \infty\). Then, \(f_n\) converges to \(f\) in \(L^p(\Omega, \mathcal{F}, \mu)\) iff

  • The sequence \((f_n)\) converges in measure to \(f\).
  • The sequence \((|f_n|^p)\) is uniformly integrable: \[ \lim_{M\to \infty} \sup_n \int_{|f_n|\ge M} |f_n|^p d\mu =0 \]

Theorem 13 (Vitali Convergence Theorem (general measure)) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space, and let \((f_n)_{n=1}^\infty\) be a sequence of functions in \(L^p(\Omega, \mathcal{F}, \mu)\) for \(1 \le p < \infty\). Then \(f_n\) converges to \(f\) in \(L^p(\Omega, \mathcal{F}, \mu)\) iff:

  • The sequence \((f_n)\) converges in measure to \(f\).
  • The functions \((|f_n|^p)\) are uniformly integrable.
  • For every \(\varepsilon>0\), there exists a set \(E\) of finite measure, such that \(\int_{E^c}|f_n|^p < \varepsilon\) for all \(n\).

Theorem 14 (Borel-Cantelli Lemma) Let \((\Omega, \mathcal{F}, \mu)\) be a measure space, and let \((A_n)\) be a sequence of measurable sets. If \(\sum_{n=1}^\infty \mu(A_n) < \infty\), then the measure of the limsup of the sets is zero: \[ \mu\left( \limsup_{n \to \infty} A_n \right) = \mu\left( \bigcap_{m=1}^\infty \bigcup_{n=m}^\infty A_n \right) = 0. \] This means that the set of points that belong to infinitely many \(A_n\) has measure zero.

References

Bogachev, Vladimir I. 2007. Measure Theory. Springer.
Conway, J. B. 2012. A Course in Abstract Analysis. Graduate Studies in Mathematics. American Mathematical Society.
Patriota, Alexandre Galvao. 2011. “An Extended Version of the Caratheodory Extension Theorem.” https://api.semanticscholar.org/CorpusID:211837752.