Chapter 2. The strong Markov property

\[\newcommand{\st}{\, : \:} \newcommand{\ind}[1]{\mathbf{1}_{#1}} \newcommand{\dd}{\mathrm{d}}\]

Hereafter, we always consider time-homogeneous transition probabilities \(p_{x,y}\) on a finite or countable state space \(S\) (although most of the ideas generalize to a rather general state space \(S\)). \(\mathbf{X}=(X_0,X_1,\ldots)\) denotes the Markov chain, and \(\mathbb{P}_\mu\) the probability measure when \(\mathbf{X}\) starts with distribution \(\mu\), \(X_0 \sim \mu\). \(\mathcal{B}(S)\) is the space of bounded measurable functions \(f\colon S\to \mathbb{R}\) and \(\mathcal{P}(S)\) is the space of probability measures on \(S\).

We will think that the transition probabilities \(p_{x,y}\) are fixed once and for all. We then define the space \(D(S)\) of sequences \((x_i)\) in \(S\), such that \(p_{x_i,x_{i+1}}>0\) for all \(i\ge 0\). Finally recall that \(P\) is the Markov operator \(Pf(x):= \mathbb{E}_x[f(X_1)]\).

In this chapter we aim to formalize some intuitive properties of a Markov chain. For instance, conditioning on the intermediate times we have for \(f\in \mathcal{B}(S)\) \[ \mathbb{E}_x[f(X_t)]= (P^tf)(x) \] and for \(t\ge s\), regardless of \(\mu \in \mathcal{P}(S)\) \[ \mathbb{E}_\mu[f(X_t)\mid X_s=x]= (P^{t-s} f)(x) \] or say, for \(f\in \mathcal{B}(S)\) \[ \mathbb{E}_\mu[f(X_t,X_{t+1})\mid X_s=x]= \mathbb{E}_{x}[f(X_{t-s},X_{t-s+1}) ] \]

In this chapter we see how to properly write these kinds of properties and many others in a general framework.

The (homogeneous) Markov property

First, we introduce the time shift operator \(\theta_t\), for \(t\ge 0\) \[ \begin{aligned} & \theta_t \colon D(S) \to D(S) \\ & (\theta_t \mathbf{X})_{s}:=\mathbf{X}_{t+s} \end{aligned} \tag{1}\]

Proposition 1 (Homogeneous Markov property) Let \(\mathbf{X}\) be a homogeneous Markov chain. Then for \(s\ge 0\) and for all bounded measurable functions \(F\colon D(S)\to \mathbb{R}\) \[ \mathbb{E}_\mu[F(\theta_s \mathbf{X})\mid X_s=x]= \mathbb{E}_\mu[F(\theta_s \mathbf{X})\mid X_0=x_0, \ldots, X_{s-1}=x_{s-1}, X_s=x]= \mathbb{E}_x[F(\mathbf{X})] \] In other words, conditional on \(X_t=x\), the sequence \((X_t,X_{t+1},\ldots)\) is a Markov chain starting at \(x\), with the same transition probabilities as \(\mathbf{X}\), and independent of \((X_0,\ldots,X_{t-1})\).

Exercise 1 Proposition 1 may sound abstract, but it is a rather obvious statement. Use the previous example about the random walker, to make a trivial example of this statement. Take the transition probabilities \(p_{x,y}\) homogeneous, and let \(f(x,y)\) be the price of the flight from town \(x\) to town \(y\). Let \(F(\mathbf{X})\) be the cost paid for the first \(10\) flights. What does the proposition say?

Proof (Proposition 1). With no loss of generality, to keep the notation explicit, we can assume that \(F\) depends on finitely many coordinates \(F(\mathbf{X})= F(X_0,X_1,X_2,\ldots,X_t)\), since measurable functions of \(D(S)\) are limit of such functions. By linearity in \(F\), it is enough to prove the equality for indicator functions, namely for \(F(X_0,X_1,X_2,\ldots,X_t)= \ind{z_0}(X_0) \ind{z_1}(X_1)\cdots \ind{z_t}(X_t)\) for some \(z_0,\ldots,z_t\in S\). Then for such an \(F\) \[ \begin{aligned} \mathbb{E}_\mu[F(\theta_s \mathbf{X})\mid X_0=x_0, \ldots, X_{s-1}=x_{s-1}, X_s=x] & = \mathbb{P}_\mu(X_{s}=z_0,X_{s+1}=z_1,\ldots,X_{s+t}=z_t \mid X_0=x_0, \ldots, X_{s-1}=x_{s-1}, X_s=x ) \\ & = \ind{z_0}(x) \frac{ p_{x_0,x_1}\cdots p_{x_{s-1},x} p_{z_0,z_1}\cdots p_{z_{t-1},z_t}}{p_{x_0,x_1} \cdots p_{x_{s-1},x}}=\ind{z_0}(x) p_{z_0,z_1}\cdots p_{z_{t-1},z_t} =\mathbb{E}_x[F(\mathbf{X}) ] \end{aligned} \]

Stopping times

Definition 1 (Stopping times) A map \(\tau \colon \Omega \to \mathbb{N}\cup \{\infty\}\) is called a stopping time if, for each \(t\ge 0\), the value \(\{\tau=t\}\) is determined by \((X_0,\ldots,X_t)\). That is if \(\ind{\{\tau=t\}}\) is a (measurable) function of \((X_0,X_1,\ldots,X_t)\).

Remark. Notice that, since \(\{\tau =t\}= \{\tau\le t \} \setminus \{\tau \le t-1 \}\), it is equivalent to say that \(\tau\) is a stopping time if \(\{\tau \le t\}\) is determined by \((X_1,\ldots,X_t)\).

For instance, for \(A\subset S\), the hitting time (discussed more accurately in the next chapter) of the set \(A\) \[ \tau_A := \inf \{ t\ge 0 \st X_t \in A \} \] is a stopping time: \(\{\tau_A=t\}=\{X_0 \not \in A,\ldots X_{t-1}\not \in A, X_t \in A\}\). However \(\tau_A-1\) is not (in general) a stopping time, since \(\tau_A-1=t\) depends on \(X_{t+1}\).

Exercise 2 Let \(\tau_A^+:=\inf \{t \ge 1 \st X_t \in A\}\) be the first return time to \(A\). Check that \(\tau_A^+\) is a stopping time.

Let \(\xi_a:=\sup \{ t\ge 0 \st X_t \in A\}\) be the time of the last visit in \(A\). Check that \(\xi_A\) is not, in general, a stopping time.

Exercise 3 Let \(\tau_A^{(0)}=0\), and for \(k\ge 1\) \[ \tau_A^{(k)}:= \inf \{ t> \tau_A^{(k-1)} \st X_t\in A \} \] be the \(k\)-th time the process visits \(A\). Is \(\tau_A^{(k)}\) a stopping time?

The Strong Markov Property

Theorem 1 (Strong Markov property) Let \(\sigma\) be a finite stopping time: \(\mathbb{P}(\sigma<\infty)=1\). Then a stronger version of the Markov property holds: for all bounded measurable functions \(F\colon D(S)\to \mathbb{R}\). \[ \mathbb{E}_\mu[F(\theta_\sigma \mathbf{X})\mid (X_r)_{r\le \sigma}]= \mathbb{E}_{X_{\sigma}}[F(\mathbf{X})] \] In particular \[ \mathbb{E}_\mu[F(\theta_\sigma \mathbf{X})\mid X_\sigma=x]= \mathbb{E}_x[F(\mathbf{X})] \] In other words, conditional on \(X_\sigma=x\), \((X_{\sigma}, X_{\sigma+1},\ldots)\) is a Markov chain starting at \(x\), independent of \(X_0,\ldots,X_{\sigma-1}\), and with the same transition probabilities as \(\mathbf{X}\).

Proof. For all \((x_0,x_1,\ldots) \in D(S)\) and \(x\in S\) \[ \begin{aligned} \mathbb{E}_\mu[F(\theta_\sigma \mathbf{X})\mid X_0=x_0, \ldots, X_{\sigma}=x] & = \frac{\sum_{t=0}^\infty \mathbb{E}_\mu[F(\theta_t \mathbf{X}) \ind{\sigma=t} \ind{X_0=x_0, \ldots, X_{t}=x}] }{\sum_{t=0}^\infty \mathbb{E}_\mu[\ind{\sigma=t} \ind{X_0=x_0, \ldots, X_{t}=x}]} \\ & = \frac{\sum_{t=0}^\infty \mathbb{E}_\mu[F(\theta_t \mathbf{X}) | \sigma=t, X_0=x_0, \ldots, X_{t}=x] \mathbb{P}_\mu({\sigma=t} ,X_0=x_0, \ldots, X_{t}=x)}{\sum_{t=0}^\infty \mathbb{P}_\mu({\sigma=t} ,X_0=x_0, \ldots, X_{t}=x)} \end{aligned} \] where we understand \(\mathbb{E}[F|A]\mathbb{P}(A)=0\) whenever \(\mathbb{P}(A)=0\). Since \(\sigma\) is a stopping time, \(\{\sigma=t\}\) only depends on \((X_0,\ldots,X_t)\). Thus by the Markov Property \[ \mathbb{E}_\mu[F(\theta_t \mathbf{X}) | \sigma=t, X_0=x_0, \ldots, X_{t}=x] = \mathbb{E}_x[F(\mathbf{X})] \] from which we easily conclude, plugging the last equality in the above formula.

The trace of a Markov chain

Let \(A\subset S\). Suppose that we can only observe the Markov Chain inside \(A\), namely we define \[ Y_t:=X_{\tau_A^{(t+1)}} \tag{2}\] with \(\tau_A^{(t)}\) defined in Exercise 3.

The Markov chain \(\mathbf{Y}\) is called the trace of \(\mathbf{X}\) on \(A\).

Proposition 2 (Trace of a Markov Chain) If \(\mathbb{P}(\tau_A^{(t)}<\infty)=1\) for all \(t\ge 1\), the process \(\mathbf{Y}=(Y_0,Y_1,\ldots)\) defined in Equation 2 is a Markov chain with transition probabilities \(q_{x,y}=\mathbb{P}_x(X_{\tau^+_A}=y)\), where \(\tau_A^+:=\inf \{t \ge 1 \st X_t \in A\}\) is the stopping time defined in Exercise 2.

Proof. For \(t \ge 0\), \(y, y_0, \ldots, y_t \in A\) \[ \mathbb{P}(Y_{t+1}=y \mid Y_0=y_0, \ldots, Y_t=y_t) = \mathbb{P}(X_{\tau^{(t+2)}_A}=y \mid X_{\tau^{(1)}_A}=y_0, \ldots, X_{\tau^{(t+1)}_A}=y_t) \] However:

  1. Since the \(\tau^{(s)}_A\) are stopping times, \(\ind{X_{\tau^{(1)}_A}=y_0, \ldots, X_{\tau^{(t+1)}_A}=y_t}\) is a function of \((X_0, \ldots, X_{\tau^{(t+1)}_A})\).
  2. We have \(\ind{X_{\tau^{(t+2)}_A}=y}= \theta_{\tau^{(t+1)}_A}\ind{X_{\tau_A^+}=y}\).

We can therefore apply the Strong Markov property to the function \(F(\mathbf{X})=\ind{X_{\tau_A^+}=y}\) to obtain \[ \mathbb{P}(Y_{t+1}=y \mid Y_0=y_0, \ldots, Y_t=y_t)= \mathbb{E}[ \theta_{\tau^{(t+1)}_A}\ind{X_{\tau_A^+}=y} \mid X_{\tau^{(1)}_A}=y_0, \ldots, X_{\tau^{(t+1)}_A}=y_t] = \mathbb{E}_{y_t}[ \ind{X_{\tau_A^+}=y}] \] which is the claim.

Figure 1: \(\mathbf{Y}\) (the golden annulus) is the trace of \(\mathbf{X}\) (the red dot) on the set \(A\) drawn with yellow vertices.

Example 1 Each 30 minutes, the cat can either eat, sleep, or play outside. Each action influences the probability of the action to execute next: \(p_{e,s}=0.7\), \(p_{e,p}=0.3\), \(p_{s,s}=0.9\), \(p_{s,e}=0.1\) and \(p_{p,e}=1\) since the cat is tired after playing.

We take note of what the cat is doing every 30 minutes, but we forget about it if we do not see the cat (namely if it’s playing outside). So our notes will only have two states \(A=\{s,e\}\). The random sequence of our notes, e.g. \((s,s,s,e,e,s,\ldots)\) is still a Markov chain with \(p_{e,e}=0.3\), \(p_{e,s}=0.7\), \(p_{s,s}=0.9\), \(p_{s,e}=0.1\).

Exercise 4 An algorithm that does not change state is just a waste of computing resources. Suppose \(\mathbf{X}\) is a Markov chain where \(p_{x,x}>0\) for some \(x\) but \(p_{x,x}<1\) for all \(x\).

  • Define \(\mathbf{Y}\) corresponding to the idea that \(\mathbf{Y}\) should skip all the time ticks where \(\mathbf{X}\) does not change its position.
  • Prove that \(\mathbf{Y}\) is a Markov chain.

Define \(\tau_0=0\) and \(\tau_{t+1}:=\inf \{ s> \tau_{t} \st X_{s}\neq X_{\tau_t} \}\). \(\tau_t\) is a stopping time and by the Strong Markov Property \[ \mathbb{P}(Y_{t+1}=y| Y_0=x_0,\ldots,Y_{t-1}=x_{t-1}, Y_t=x)= \mathbb{P}(X_{\tau_{t+1}}=y| X_{\tau_0}=x_0,\ldots X_{\tau_t}=x)= \mathbb{P}(X_{\tau_{t+1}}=y| X_{\tau_t}=x)= \mathbb{P}_x(X_{\tau_{1}}=y)=:q_{x,y} \] and the transition probabilities of \(\mathbf{Y}\) are given by \(q_{x,y}:= p_{x,y}/(1-p_{x,x})\) for \(x\neq y\) and \(q_{x,x}=0\).