Chapter 5: Invariant Measures

\[\newcommand{\st}{\, : \:} \newcommand{\ind}[1]{\mathbf{1}_{#1}} \newcommand{\dd}{\mathrm{d}}\]

Invariant Measures (Stationary Distributions)

Invariant measures play a central role in understanding the long-time behavior of Markov chains. Even more, a standard motivation for studying Markov chains is to investigate or sample their invariant measures.

Definition 1 (Invariant measure) A positive measure \(m\) on the countable space \(S\) is invariant if \[ m_x = \sum_y m_y p_{y,x}, \qquad x\in S \tag{1}\]

If \(m \in \mathcal{P}(S)\) is a probability, we say it is an invariant probability or stationary distribution for the Markov chain.

For invariant probabilities, it is better to use the operator \(P\), and its interpretation via the Markov process, to understand invariance. Indeed the following remark follows immediately from the definition of \(P\) (which however is defined on probabilities, not on any measure).

Remark. Let \(m\in \mathcal{P}(S)\) be a probability on \(S\). The following are equivalent

  • \(m\) is invariant.
  • \(mP=m\).
  • For each function \(f\in \mathcal{B}(S)\), \(\mathbb{E}_m[f(X_t)]= \mathbb{E}_m[f(X_0)]\) for all \(t\ge 0\).
  • For each \(t\ge 0\), \(\mathbf{Y}:= \theta_t \mathbf{X}\) is a Markov chain with initial distribution \(m\), where \(\theta_t\) is the time shift.

This remark (which can be proved easily) clarifies the meaning of invariance: if \(\mathbf{X}\) starts at time \(0\) with distribution \(m\), then \(X_t\) has distribution \(m\) for all \(t\ge 0\).

Example 1 For \(n\ge 2\), take \(S= \mathbb{Z}_n = \{0,1,\ldots,n-1\}\), and consider the transition probabilities associated to jumping clockwise with probability \(p\) and counterclockwise with probability \(1-p\), namely \[ p_{x,y}= \begin{cases} p & \text{if $y=x+1 \pmod n$} \\ 1-p & \text{if $y=x-1 \pmod n$} \\ 0 & \text{otherwise} \end{cases} \] Then \(m_x=1/n\) is an invariant probability, since \(1/n= p/n + (1-p)/n\).

Example 2 On \(S=\mathbb{Z}\), consider the asymmetric simple random walk characterized by the transition probabilities \[ p_{x,y}= \begin{cases} p & \text{if $y=x+1$} \\ 1-p & \text{if $y=x-1$} \\ 0 & \text{otherwise} \end{cases} \] The equation for the invariant measure becomes \[ m_x= p m_{x-1}+(1-p)m_{x+1} \] So we have

  • If \(p=0\) or \(p=1\), then \(m_x=1\) is the unique invariant measure up to multiplicative constants.
  • For \(p=1/2\), all the solutions have the form \(m_x=c_1 + c_2 x\). However since we are looking for positive measures \(m_x\ge 0\), necessarily we have \(c_2=0\). Therefore in this case we have a unique invariant measure \(m_x=1\) up to multiplicative constants.
  • For \(p \in (0,1)\), \(p\neq 1/2\), all the measures \(m_x= c_1 +c_2 (p/1-p)^x\) are invariant for \(c_1,c_2\ge 0\).

Invariant Measures and Recurrence

Definition 2 The return measure based at \(x\in S\) is the measure (regarded as a measure over \(y\)) \[ \mu^x_y:= \mathbb{E}_x\left[\sum_{t=0}^{\tau^+_x-1} \ind{y}(X_t) \right] \in [0,\infty] \tag{2}\] namely \(\mu^x_y\) counts the average time of visits to \(y\) before coming back to \(x\) (or over all times if one never comes back to \(x\)).

Theorem 1 Fix \(x\in S\). Consider the system for the unknown variable \(\mu \equiv (\mu_y)_{y\in S}\). \[ \begin{cases} \mu_x= 1 & \\ \mu P \le \mu & \text{meaning $\sum_z \mu_z P_{z,y} \le \mu_y$} \end{cases} \tag{3}\] Then:

  1. The return measure \(\mu^x\) based at \(x\) is the minimal positive solution to Equation 3, in other words the minimal measure solving it.
  2. If \(x\) is recurrent, \(\mu^x\) solves Equation 3 with equality, i.e. \(\mu^x\) is invariant.
  3. If the Markov chain is recurrent and irreducible, then there exists a unique invariant measure \(\mu\) up to a multiplicative constant, and moreover \(\mu_y \in (0,\infty)\) for all \(y\). In particular \(\mu^{x'}_{\cdot}= \mu_x^{x'} \mu^{x}_{\cdot}\).

Proof. For \(x,y\in S\), we have that \[ \sum_{t=1}^{\tau^+_x} \ind{y}(X_t) =- \ind{x=y} + \ind{\tau_x^+<\infty, x=y} + \sum_{t=0}^{\tau^+_x-1} \ind{y}(X_t) \le \sum_{t=0}^{\tau^+_x-1} \ind{y}(X_t) \tag{4}\] In other words, the inequality between the l.h.s. and the r.h.s is an a.s. equality unless \(x=y\) and \(\tau_x^+=+\infty\) (in all the other cases, we either added and removed \(0\), or added and removed \(1\)).

  1. Clearly \(\mu^x_x=1\) by the definition. On the other hand, by Equation 4

    \[ \begin{aligned} \mu^x_y & \ge \mathbb{E}_x \left[ \sum_{t=1}^{\tau^+_x} \ind{y}(X_t) \right] = \mathbb{E}_x\left[\sum_{s=1}^\infty \sum_{t=1}^{\tau^+_x} \ind{y}(X_t) \ind{\tau_x^+=s }\right] = \mathbb{E}_x\left[\sum_{t=1}^\infty\ind{y}(X_t) \ind{\tau_x^+ \ge t}\right] \\ & =\sum_{t=1}^\infty \sum_{z\in S} \mathbb{P}_x\left(\tau_x^+> t-1, X_{t-1}=z, X_t=y \right) \end{aligned} \] Since \(\tau_x^+\) is a stopping time, \(\{\tau_x^+> t-1\}\) is determined by \((X_1,\ldots,X_{t-1})\) (namely \(\ind{\{\tau_x^+> t-1\}}\) is a measurable function of \((X_1,\ldots,X_{t-1})\)). We can thus apply to Markov property to get \[ \begin{aligned} \mu^x_y & \ge \sum_{t=1}^\infty \sum_{z\in S} p_{z,y} \mathbb{P}_x\left(\tau_x^+> t-1, X_{t-1}=z\right) = \sum_{z\in S} \sum_{t=0}^\infty p_{z,y} \mathbb{E}_x\left[\ind{\tau_x^+> t} \ind{z}(X_t)\right] = \sum_{z\in S} \mu^x_z p_{z,y}= (P\mu^x)_y \end{aligned} \] This proves that \(\mu^x\) solves Equation 3. Minimality follows easily iterating the inequality: for \(\mu\) any solution to Equation 3 and \(y\neq x\) \[ \begin{aligned} \mu_y & \ge (\mu P)_y = p_{x,y} + \sum_{z_1\neq x} \mu_{z_1} p_{z_1,y} \ge p_{x,y}+ \sum_{z_1\neq x} p_{x,z_1}+ \sum_{z_1,z_2 \neq x} \mu_{z_2} p_{z_2,z_1} p_{z_1,y} \ge \ldots \\ & \sum_{t=1}^T\mathbb{P}_x(X_t=y, \tau_x^+\ge t) + \sum_{z_1,\ldots,z_T \neq x} \mu_{z_T} p_{z_T,z_{T-1}}\cdots p_{z_2,z_1} \end{aligned} \] As we drop the last (non-negative) term as send \(T\to \infty\), we get \(\mu \ge \mu^x\).

  2. If \(x\) is recurrent, then Equation 4 is an equality \(\mathbb{P}_x\)-a.s., and thus in the previous argument all inequalities are equalities.

  3. Assuming recurrence and irreducibilility, let \(\mu\) be an invariant measure. By irreducibility for each \(x,y\in S\) there exists \(t\) such that \(p^{(t)}_{x,y}>0\), thus \[ \mu_y= \sum_z \mu_z p^{(t)}_{z,y} \ge p^{(t)}_{x,y} \mu_x \tag{5}\] from which one deduces \(\mu_y \in (0,\infty)\) for all \(x,y \in S\), as long as there is one point with such property. Fix now \(x\in S\), take \(\mu:=\mu^x\) and let now \(\mu'\) be an invariant measure such that \(\mu_x=1\) (this fixes the multiplicative constant). By the minimality of \(\mu\), we have that \(\lambda=\mu'-\mu \ge 0\), and \(\lambda P^t=\lambda\), but \(\lambda_x=0\). Then reasoning as in Equation 5, we get \(\lambda_y=0\) for all \(y\in S\).

Definition 3 A point \(x\in S\) is called positive-recurrent if \(\mathbb{E}_x[\tau_x^+]<\infty\). In particular such an \(x\) is recurrent.

A point \(x\in S\) is called null-recurrent if it is recurrent and \(\mathbb{E}_x[\tau_x^+]=\infty\).

Theorem 2 For an irreducible Markov Chain, the following are equivalent:

  1. Every state \(x\in S\) is positive-recurrent.
  2. There exists a positive-recurrent state \(x\in S\).
  3. There exists a (necessarily unique) invariant probability measure \(m\in \mathcal{P}(S)\).

If any of these conditions holds, then \(m_x= 1/\mathbb{E}_x[\tau_x^+]\). In particular \[ \sum_{x\in S}\frac{1}{\mathbb{E}_x[\tau_x^+]}=1 \]

Proof.

  • a. \(\implies\) b., is trivial.
  • b. \(\implies\) c., we know that \(\mu^x\) is invariant and finite in each point, from point (b) in Theorem 1. Notice that \[ \sum_y \mu^x_y= \mathbb{E}_x\left[\sum_{t=0}^{\tau^+_x-1} \sum_{y\in S} \ind{y}(X_t) \right] =\mathbb{E}_x\left[\tau^+_x\right]<\infty \tag{6}\] by assumption. So \(m=\mu^x/{E}_x\left[\tau^+_x\right]\) is the unique invariant probability measure.
  • c. \(\implies\) a. Let \(m\) be an invariant probability, then \(\mu:=m/m_x\) satisfies Equation 3, and thus \(\mu^x\) (as defined in Equation 2 and being the minimal solution to Equation 3), satisfies _y^x_y$ for all \(y\in S\). Therefore by Equation 6 \[ 1= \sum_y m_y \ge m_x \sum_y \mu^x_y= m_x \mathbb{E}_x\left[\tau^+_x\right] \] which implies that \(x\) is positive-recurrent, as well as the explicit formula for \(m_x\).

Example 3 Suppose that we shuffle a deck of \(N\) cards, repeating some random shuffling action over and over. Assume that:

  • It is a complete mixing method, meaning that, in finitely many iterations of the shuffling, we can obtain any order of the cards (e.g. we do not just swipe the first and second card repeatedly).
  • We do not see the cards when we mix them (that is, the probability of performing a given shuffling does not depend on the order of the cards).

Let us say that we start with a completely ordered deck. Then the average number of shuffling actions needed to come back to the original order is \(N!\), no matter the allowed shuffling actions (method) or the exact probabilities of each of them. Indeed, the order of the cards \(X_t\) at each iteration is a Markov chain on the space \(S\) of permutations over \(N\) cards. The first hypothesis states that this Markov chain is irreducible, the second hypothesis states that \(p_{\sigma\circ x, \sigma \circ y}=p_{x,y}=p_{e,x^{-1}\circ y}\) for each permutation \(\sigma\). Thus the uniform measure on \(S\) is invariant, namely \(m_x=1/N!\) is the unique invariant probability. Therefore \(\mathbb{E}_x[\tau_x^+]=N!\) for all \(x\in S\).

Relation between closure, recurrence and invariant measures
Figure 1

Convergence to the invariant measure

If \(X_t\) is an irreducible, positive-recurrent Markov chain, it can be proven (as a consequence of the law of large numbers applied to the return times \(\tau_x^+\)) \[ \lim_T \frac{1}{T}\sum_{t=0}^{T-1} f(X_t)= m(f) \tag{7}\] a.s., for all function \(f\in L^1(m)\). We will get refinements of this statement in later chapters. Here we want to focus on a stronger statement: does the law of \(X_t\) converge to \(m\) as \(t\to \infty\)? Notice that this is in principle a different question compared to Equation 7. Take \(f=\ind{x}\), for some fixed \(x\in S\). Then Equation 7 states that the number of times the chain passes in \(x\) in the first \(T\) steps, is roughly \(m_x T\). This holds for any irreducible, positive-recurrent chain and is quite intuitive.

In Equation 7 with \(f=\ind{x}\), taking expectation we get \[ \lim_T \frac{1}{T}\sum_{t=0}^{T-1}\mathbb{P}(X_t=x)= m_x \tag{8}\] In other words, the Cesaro mean (in time) of \(\mathbb{P}(X_t=x)\) converges to \(m_x\). One may wonder whether \(\mathbb{P}(X_t=x)\) converges to \(m_x\), not just its Cesaro mean. In other words, we are asking if \(X_t\) converges to \(m\) in distribution. On the one hand, this result would be stronger than Equation 7, since it concerns the direct limit in \(t\) of \(\mathbb{P}(X_t=x)\), or more in general of observables like \(\mathbb{E}[f(X_t)]\). On the other hand, we are just making a statement for the distribution of \(X_t\), not an a.s. statement, and on the other hand it is clear that for a generic chain, the chain \(X_t\) will continue to randomly jump and would not converge anywhere as \(t\to \infty\), unless there are closed singletons (known as absorbing states). So we are asking whether we can trade a weaker stochastic notion of convergence (in distribution) for a stronger notion of limit in time (usual direct limit vs Cesaro mean).

However, it is easy to see that \(\mathbb{P}(X_t=x)\to m_x\) cannot be true, even on a finite state space, without further assumptions. Indeed, the chain on \(Z_{2n}=\{0,1,\ldots,2n\}\) that each steps jumps from \(i\) to \(i+1 \pmod 2n\) with probability \(1\), features an obvious periodicity: \(X_t-X_0\) is even if \(t\) is even, and odd \(t\) is odd. So while the uniform measure is the unique invariant measure, \(\mathbb{P}_0(X_t=x)\) will not converge (indeed, it will converge to the uniform measure on even/odd numbers, when taking the limit along even/odd times \(t\)). This periodic constraint phenomenon, however, is the only one that may hinder convergence, and justifies the following definition.

Definition 4 (aperiodic) A state \(x\) is aperiodic if there exists \(T\) such that \(p^{(t)}_{x,x}>0\) for all \(t\ge T\).

Exercise 1 Let \(z\) be an aperiodic state of an irreducible Markov chain. Prove that for each \(x,y\in S\) there exists a time \(T\) such that \(p^{(t)}_{x,y}>0\) for all \(t\ge T\). In particular for an irreducible Markov chain, a state is aperiodic iff all the states are aperiodic. The chain is called aperiodic in this case.

There exists \(s,s'\) such that \(p^{(s)}_{x,z}>0\), \(p^{(s')}_{z,y}>0\). Then \[ p^{(s+t+s')}_{x,y}\ge p^{(s)}_{x,z}p^{(t)}_{z,z}p^{(s')}_{z,y} \] Thus if \(p^{(t)}_{z,z}>0\) for \(t\ge T_z\), then \(p^{(t)}_{x,y}>0\) for all \(t\ge T_z+s+s'\).

Exercise 2 Define the set \(\Pi_x:=\{t\ge 1 \st p^{(t)}_{x,x}>0 \}\). The period of \(x\) is the greatest common divisor of \(\Pi_x\). Prove that for an irreducible Markov chain, all points share the same period.

We are ready to prove that periodicity is indeed the only obstruction to the convergence of the law of \(X_t\) to the invariant probability. The proof, despite simple, is really influential and a favorite of many probabilists.

Theorem 3 For an irreducible aperiodic Markov chain \(\mathbf{X}\) with invariant probability \(m\) (in particular \(\mathbf{X}\) is positive-recurrent) it holds \[ \lim_{t\to \infty} \sup_{x\in S} |\mathbb{P}(X_t=x)-m_x|=0 \]

Proof. Let \(\mathbf{X'}\) be a Markov chain on the same state space, independent of \(\mathbf{X}\), and starting with initial distribution \(m\). Notice that the pair \(\mathbf{\xi}:=(\mathbf{X},\mathbf{X'})\) is a Markov chain on the state space \(\Sigma:=S\times S\) with transition probabilities \[ q_{(x,x'),(y,y')}:=p_{x,y}p_{x',y'} \] From this definition, it is intuitive and easy to check that \[ q^{(t)}_{(x,x'),(y,y')}:=p^{(t)}_{x,y}p^{(t)}_{x',y'} \tag{9}\]

The Markov chain \(\mathbf{\xi}\)

  • admits the measure \(\mu_{(x,x')}:=m_x m_{x'}\) as invariant probability.
  • is irreducible (for this we need the aperiodicity of the original chain). Indeed from Equation 9 and Exercise 1, for all \((x,x'), (y,y')\in \Sigma\), \(q^{(t)}_{(x,x'),(y,y')}>0\) for \(t\) large enough (so \(\mathbf{\xi}\) is actually aperiodic, although we just need irreducibility here).

Therefore \(\mathbf{\xi}\) is positive-recurrent from Theorem 2. Let \(\Delta:=\{(x,x), x\in S\} \subset \Sigma\) be the diagonal in \(\Sigma\). Then \[ \tau:= \inf\{ t\ge 0 \st X_t=X_t' \}= \tau_{\Delta}(\mathbf{\xi}) \] is the hitting time of \(\Delta\) for \(\mathbf{\xi}\), and thus a stopping time. In particular \[ \mathbb{P}(X_t=x,\tau\le t)= \mathbb{P}(X_t'=x,\tau\le t) \] as indeed both coincide with \(\mathbb{P}(Y_t=x,\tau\le t)\) where \(Y_t=X_t \ind{t\le \tau}+ X_t' \ind{t>\tau}\). Therefore

\[ \left| \mathbb{P}(X_t=x) - m_x \right| = \left| \mathbb{P}(X_t=x) - \mathbb{P}(X_t'=x) \right| = \left| \mathbb{P}(X_t=x,\tau>t) - \mathbb{P}(X_t'=x,\tau>t) \right| \le \mathbb{P}(\tau >t) \] However, since \(\mathbf{\xi}\) is irreducible and positive-recurrent \(\mathbb{P}(\tau<\infty)=1\), and thus the r.h.s. vanishes as we take \(t\to \infty\).

The operator \(P\) and invariant measures

Proposition 1 Let \(m\) be an invariant measure. For \(q\in [1,\infty]\), the operator \(P\) is a contraction in \(L^q(m)\). Namely \[ \begin{aligned} & \int |Pf|^q dm \equiv \sum_x |Pf(x)|^q m_x \le \sum_x |f(x)|^q m_x \equiv \int |f|^q dm, \qquad q\in [1,\infty) \\ & \|Pf\|_{L^\infty(m)}\equiv \sup_{x \st m_x>0} |Pf(x)| \le \sup_{x \st m_x>0} |f(x)| \equiv \|f\|_{L^\infty(m)} \end{aligned} \]

Proof. Take \(q\in(1,\infty)\). By Jensen’s inequality applied to the function \(u\mapsto |u|^q\), we have that \[ |Pf(x)|^q = |\mathbb{E}_x[f(X_1)]|^q \le \mathbb{E}_x[|f(X_1)|^q]= Pg(x) \] where \(g(\cdot)=|f(\cdot)|^q\). If we sum (integrate over \(m\)) we get, since \(mP=m\), \(m(|Pf|^q) \le m(Pg)= m(g)=m(|f|^q)\). Or more explicitly \[ \sum_x |Pf(x)|^q m_x \le \sum_x m_x \sum_y p_{x,y} g(y)= \sum_{x,y} m_x p_{x,y} g(y)=\sum_y m_y g(y)= \sum_y m_y |f(y)|^q \] where in the last equality we used the invariance \(\sum_x m_x p_{x,y}=m_y\). The case with \(q=\infty\) is an immediate consequence of the fact that \(|Pf(x)| \le c\) if \(|f|\le c\).

The adjoint operator

In particular it is useful to consider \(P\) acting on \(L^2(m)\). Since \(P\) is bounded in \(L^2(m)\), the operator \(P^\dagger \colon L^2(m)\to L^2(m)\) is defined by \[ m( f \,Pg):= m(g P^\dagger f) \qquad \text{for all $f,g\in L^2(m)$} \tag{10}\] It is immediate to check that \(P^\dagger\) has entries such that \[ m_x p^\dagger_{x,y}= m_y p_{y,x} \tag{11}\]

Remark. If \(m\) is invariant, then \(p^\dagger_{x,y}\) as in Equation 11 are transition probabilities, namely \(\sum_y p^\dagger_{x,y}=1\).

Conversely, if for some measure \(m\), \(p^\dagger_{x,y}\) as in Equation 11 are transition probabilities, then \(m\) is invariant.

Reversibility

Exercise 3 Let \(X_0, X_1,\ldots\) be a Markov chain on a countable state space \(S\), with transition probabilities \((p_{x,y})_{x,y}\), and invariant probability \(m\). Assume that \(X_0\) (and thus all of the \(X_t\)) has distribution \(m\). Let \(T\ge 1\), and consider the backward-time chain \(Y_0,Y_1,\ldots,Y_T\) defined as \(Y_t=X_{T-t}\). Check that \((Y_t)_{t\le T}\) is a Markov chain, and find its transition probabilities.

Definition 5 A measure \(m\) is called reversible for the transition probabilities \((p_{x,y})\) if for all \(x,y\in S\) \[ m_x p_{x,y}= m_y p_{y,x} \tag{12}\] or equivalently if \(P^\dagger=P\).

If a Markov chain admits a unique invariant measure, the chain is called reversible if the invariant measure is reversible.

Notice that reversibility implies invariance (take the sum over \(y\) in Equation 12).

Markov triples

Definition 6 A Markov triple is a triple \((S,P,m)\) where

  1. \(S\) is a countable set.
  2. \(P\) is a Markov operator, meaning \(Pf(x)=\mathbb{E}_x[f(X_t)]\) for some Markov chain \(\mathbf{X}\), and for all bounded measurable \(f\colon S\to \mathbb{R}\).
  3. \(m\) is a \(\sigma\)-finite invariant measure on \(S\), namely \(m(Pf)=m(f)\), for \(f\in L^1(S)\).

Definition 7 A measurable map \(\phi \colon S \to S'\) is a morphism of Markov triples
\(\phi \colon (S, P, m) \to (S', P', m')\) if for all bounded measureable \(f\colon S' \to \mathbb{R}\) \[ P (f\circ \phi) = (P'f)\circ \phi \quad \text{and} \quad m' = m \circ \phi^{-1}. \] In this case, \((Y_t:=\phi(X_t))_t\) is a Markov chain with Markov operator \(P'\) and invariant measure \(m'\), whenever \(\mathbf{X}\) is a Markov chain with transition operator \(P\) and invariant measure \(m\).

Hereafter, given a Markov triple \((S,P,m)\) we denote \({\langle f,g \rangle}_m:= \sum_x f(x) g(x) m_x\) the scalar product in \(L^2(m)\).

Definition 8 Let \((S,P,m)\) be a Markov triple. If \(f,g \in L^2(m)\) define the Dirichlet form \[ \mathcal{E}(f,g)= {\langle f, (I-P)g \rangle}_m \]

Proposition 2 Denote \(P^\dagger\) the adjoint of \(P\) w.r.t. the measure \(m\) and \(P^s:=(P+P^\dagger)/2\), and similarly \(p^s_{x,y}:=(p_{x,y}+p^\dagger_{x,y})/2\), \(\mathcal{E}^\dagger(f,g):={\langle f,(I-P^\dagger)g\rangle}_m\), \(\mathcal{E}^s(f,g):={\langle f,(I-P^s)g\rangle}_m\). Then it holds

  1. \(\mathcal{E}^\dagger(f,g)=\mathcal{E}(g,f)\). In particular \(m\) is reversible iff \(\mathcal{E}\) is symmetric.
  2. \(\mathcal{E}(f,f)=\mathcal{E}^s(f,f)=\frac{1}{2} \sum_{x,y\in S} m_x p^s_{x,y} (f(x)-f(y))^2\).
  3. \(\mathcal{E}^s(f,g)= \frac{1}{2} \sum_{x,y\in S} m_x p^s_{x,y} (f(x)-f(y))(g(x)-g(y))\).

\[ \mathcal{E}(f,f)= {\langle f, (I-P)g \rangle}_m \]

Proof.

  1. It follows immediately from the definition of \(P^\dagger\).
  2. Notice that from a. \[ \begin{aligned} \mathcal{E}(f,f) & =\mathcal{E}^\dagger(f,f)=\frac{1}{2} \left(\mathcal{E}(f,f)+\mathcal{E}^\dagger(f,f)\right)=\mathcal{E}^s(f,f) \\ & =\sum_{x,y}\left(m_x p^s_{x,y} (f(x)-f(y))f(x) \right) =\tfrac{1}{2}\sum_{x,y}\left(m_x p^s_{x,y} (f(x)-f(y))f(x) + m_y p^s_{y,x} (f(y)-f(x))f(y) \right) \end{aligned} \] and since \(m_x p^s_{x,y}=m_y p^s_{y,x}\), the terms are easily rearranges in a square as in the statement.
  3. It follows by polarization from b., since \(\mathcal{E}^s\) is a symmetric bilinear form.

Definition 9 An invariant measure \(m\) is ergodic if any of the following equivalent conditions holds

  • If \(Pf=f\) \(m\)-a.s., then \(f\) is constant \(m\).a.s.
  • If \(\mathcal{E}(f,f)=0\), then \(f\) is constant.

Another look at linear equations

Let \((S,P,m)\) be a Markov triple, and let \(c \colon S\times S \to \mathbb{R}\) and \(g\colon S \to \mathbb{R}\) be given, and let \(A\subset S\). We assume that \(c\ge 0\) to be sure that all the sums are well-defined. Define \[ u(x):= \mathbb{E}_x\left[ g(X_{\tau_A}) \ind{\tau_A<\infty}+\sum_{t=1}^{\tau_A} c(X_{t-1},X_t) \right] \in [0,\infty] \] Then, conditioning on the first step \(X_1\), we get that \(u\) satisfies (it is indeed the minimal solution if \(c,g\ge 0\)) \[ \begin{cases} (I-P)u=f & \text{on $A^c$} \\ u=f & \text{on $A$} \end{cases} \tag{13}\] where \[ \begin{cases} f(x)=\sum_y p_{x,y}c(x,y) & \text{for $x\in A^c$} \\ f(x)=g(x) & \text{for $x\in A$} \end{cases} \]

Remark. Assume that \((S,P,m)\) is ergodic and \(A\) non-empty. Equation Equation 13 admits at most one solution in \(L^2(m)\). Indeed, let \(u,v\) be two solutions, and set \(w=u-v\). Then \(w\) satisfies Equation 13 with \(f\equiv 0\). In particular \(w(x) ((I-P)w)(x)=0\) everywhere on \(S\) (if \(x\in A\) then \(w(x)=0\), otherwise \(((I-P)w)(x)=0\)). Thus \(\mathcal{E}(w,(I-P)w)=0\), and \(w\) is constant by ergodicity. But \(w=0\) on \(A\), thus \(w\equiv 0\).

Let \(S\) be a Polish space, and recall that in this case we have that the transition probabilities are measurable maps \(p\colon S \to \mathcal{P}(S)\), meaning that \(p(x,A) = \mathbb{P}(X_1\in A|X_0=x)\) represents the probability of jumping to \(A\) when at point \(x\in S\). In this case, we have that \(P\) has the following three properties

  • \(P \ind{}=\ind{}\).
  • \(Pf \ge 0\) of \(f\ge 0\).
  • \(Pf_n \to 0\) if \(f_n\) is a monotone sequence converging pointwise to \(0\).

It is not hard to check that these properties characterize the Markov operators. In other words, if a linear operator acting on bounded measurable functions satisfies them, then \(Pf(x)=\mathbb{E}_x[f(X_1)]\) for some Markov chain with transition probabilities \(p(x,A):= (P \ind{A})(x)\).