Probability and Statistics 2

Markov chains

example 1
- machine in 2 states – working × broken
- w → b … probability 0.01
- w → w … 0.99
- b → w … 0.9
- b → b … 0.1
example 2
- a fly which may get caught in a web … absorbing states
df: stochastic process (náhodný proces)
- sequence of random variables $X_0,X_1,\dots$ $X_{0}, X_{1}, \dots$ such that "all probabilities are defined"
  - e.g. $P(X_0=1\land X_1=2\land X_3=0)$ is defined
df: Markov process/chain
- stochastic process such that
  - $\exists$ $\exists$ finite/countable set $S:\text{Im}X_t\subseteq S\;\forall t$ $S : Im X_{t} \subseteq S \forall t$
    - elements of $S$ … states
    - $t$ … time
  - $(\forall i,j\in S)(\exists p_{ij}\in [0,1])(\forall t)(\forall a_0,a_1,\dots,a_t,a_{t+1})$
- notes
  - this is a discrete time Markov chain
  - discrete space … $|S|\leq |\mathbb N|$
  - time-homogeneous
    - $p_{ij}$ not depeding on $t$
  - Markov condition
    - only applies if the condition probabilities are defined
  - Markov condition … "forgetting the past"
df: transition matrix $P=(p_{ij})$
df: transition diagram/graph
- vertices … $S$
- directed edges … $\set{(i,j):p_{ij}\gt 0}$
observation: every row of $P$ sums to 1
Markov chain is equivalent to a random walk on the transition diagram
- from state $i$ we go to $j$ with the probability of $p_{ij}$ independent of the past
in example 1, assume $X_0$ $X_{0}$ , what is $P(X_2=w)$ $P (X_{2} = w)$ ?
- $P(X_1=w)=p_{11}=0.99$
- $P(X_2=w)=p_{11}\cdot p_{11}+p_{12}\cdot p_{21}=0.99\cdot 0.99+0.01\cdot 0.9$ $P (X_{2} = w) = p_{11} \cdot p_{11} + p_{12} \cdot p_{21} = 0.99 \cdot 0.99 + 0.01 \cdot 0.9$
  - using the total probability theorem
probability mass function (PMF, pravděpodobnostní funkce) for $X_0,X_1,\dots$ $X_{0}, X_{1}, \dots$
- $\pi_i^{(t)}=P(X_t=i)$ $π_{i}^{(t)} = P (X_{t} = i)$
  - $t\in\mathbb N_0$
  - $i\in S$
theorem
- $\pi^{(t+1)}=\pi^{(t)}\cdot P$ $π^{(t + 1)} = π^{(t)} \cdot P$
  - where $\pi^{(t)}=(\pi_1^{(t)},\dots,\pi_n^{(t)})$ row vector
- proof … definition of matrix multiplication & total probability theorem
theorem
- $P(X_0=a_0,X_1=a_1,\dots,X_t=a_t)=\pi_{a_0}^{(0)}\cdot P_{a_0a_1}\cdot P_{a_1a_2}\cdot\ldots\cdot P_{a_{t-1}a_t}$
- proof by induction
df: probability of $k$ $k$ -step transition … $r_{ij}(k)$ $r_{ij} (k)$
- $r_{ij}(k)=P(X_{t+k}=j\mid X_t=i)$
- $r_{ij}(1)=P(X_{t+1}=j\mid X_t=i)=p_{ij}$
- it is independent of $t$
theorem (Chapman-Kolmogorov)
- $\forall$ $\forall$ Markov chain, $\forall i,j,k:$ $\forall i, j, k :$
  - $r_{ij}(k+1)=\sum_{u=1}^n r_{iu}(k) p_{uj}$
  - $r_{ij}(k+\ell)=\sum_{u=1}^n r_{iu}(k) r_{uj}(\ell)$
  - $r_{ij}(k)=(P^k)_{ij}$
- proof
  - 1 is a special case of 2 ( $r_{uj}(1)=p_{uj})$
  - $1\implies 3$ … matrix multiplication & induction
  - …
df: $j$ $j$ is accessible from $i$ $i$ (where $i,j\in S$ $i, j \in S$ )
- $i\to j$
- $j\in A(i)$
- $j$ is accessible from $i\equiv$ $\exists t:\underbrace{P(X_t=j\mid X_0=i)}_{r_{ij}(t)}\gt 0$
- $\iff$ in the transition digraph exists directed path $i\to j$ of length $t$
df: $i\leftrightarrow j$ ( $i$ and $j$ communicate) $\equiv i\in A(j)\land j\in A(i)$
theorem: $\leftrightarrow$ is an equivalence relation
proof
- reflexive: $i\in A(i)$ … $t=0$ in the definition
- symmetric: the formula $i\in A(j)\land j\in A(i)$ is symmetric
- transitive: in the digraph $\exists$ path $i$ to $j$ , $\exists$ path $j$ to $k$ $\implies\exists$ walk $i\to k$
decomposition of the transition digraph to components of strong connectivity $\equiv$ finding equivalence classes of $\leftrightarrow$
df: a state $i\in S$ $i \in S$ is recurrent if we return to it with probability 1
- it is transient otherwise
- česky rekurentní, tranzientní
df: Markov chain is irreducible if $\forall i,j\in S:i\leftrightarrow j$
df: $T_i=\min\set{t\geq 1:X_t=i}$ $T_{i} = min {t \geq 1 : X_{t} = i}$
- $T_i=\infty$ if the set is empty (there is no such $t$ )
- $T_i$ … random variable
- $f_{ij}(n)=P(\text{we get to }j\text{ first at time }n\mid X_0=i)=P(T_j=n\mid X_0=i)$ $f_{ij} (n) = P (we get to j first at time n ∣ X_{0} = i) = P (T_{j} = n ∣ X_{0} = i)$
  - we define it for $n\gt 0$
- $f_{ij}=P(T_j\lt\infty\mid X_0=i)=\sum_{n\geq 1} f_{ij}(n)$
$i$ $i$ is transient
- $f_{ii}\lt 1$ … probability of ever getting back
- $\iff P(T_i=\infty)\gt 0$
- $\iff P(\text{get back infinitely often})\lt 1$
example – random walk on a line
- with probability 1/2 we go to the left
- $f_{00}(2)=(\frac12)^2+(\frac12)^2=\frac12$
- $f_{00}(1)=f_{00}(3)=\dots=0$
- $f_{00}(4)=\frac{1}{2^4}\cdot 2$ $f_{00} (4) = \frac{1}{2 ^{4}} \cdot 2$
  - $r_{00}(4)\neq f_{00}(4)$
  - $r_{00}(4)=\frac{1}{2^4}\cdot 4$
- is 0 recurrent?
  - by definition, we should add $f_{00}(2)+f_{00}(4)+\dots$ and check if it equals 1
  - theorem: $i$ $i$ is a recurrent state $\iff\sum_{n=1}^\infty r_{ii}(n)=\infty$ $⟺ \sum_{n = 1}^{\infty} r_{ii} (n) = \infty$
    - $B_n=\begin{cases} 1&\text{ if }X_n=i\\ 0 &\text{ otherwise}\end{cases}$ $B_{n} = {10 if X_{n} = i otherwise$
      - we got back
      - $\mathbb E(B_n\mid X_0=i)=P(X_n=i\mid X_0=i)=r_{ii}(n)$
    - $r_{00}(2n)=$ $r_{00} (2 n) =$ probability that out of the $2n$ $2 n$ steps, $n$ $n$ were to the left, $n$ $n$ were to the right
      - $=\frac1{2^{2n}}\cdot{2n\choose n}\doteq\frac c{\sqrt n}$ … see Matoušek, Nešetřil
    - $\sum_{n=1}^\infty \frac{c}{\sqrt n}=\infty$ (taught in calculus)
    - $\mathbb E(T_0\mid X_0=0)=\infty$ (we did not prove that)
theorem
- if $i\leftrightarrow j$ , then either both $i$ and $j$ are recurrent or both $i$ and $j$ are transient
for finite Markov chains
- $C$ … equiv class of $\leftrightarrow$ in a finite Markov chain
- $C$ $C$ is recurrent ( $\forall i\in C$ $\forall i \in C$ is recurrent) $\iff(\forall i \in C)(\forall j\in S):$ $⟺ (\forall i \in C) (\forall j \in S) :$ if $i\to j$ $i \to j$ then $j\in C$ $j \in C$
  - → $C$ is closed
stationary distribution / steady state distribution
- df: $\pi:S\to [0,1]$ $π : S \to [0, 1]$ such that $\sum_{i\in S}\pi_i=1$ $\sum_{i \in S} π_{i} = 1$ is called a stationary distribution if $\text{``}\pi P=\pi\text{"}$ $“ π P = π "$
  - $\forall i:\sum_i\pi_i p_{ij}=\pi_j$
- if $\pi^{(0)}$ (the PMF of $X_0$ ) is $\pi$ , then $\pi^{(1)}$ is $\pi$ as well
- $\pi^{(1)}=\pi^{(0)}P$
theorem: if a Markov chain is finite, aperiodic (→ not periodic) and irreducible, then
1. $\exists$ unique stat. distribution $\pi$
2. $\forall i:\lim_{n\to\infty}(P^n)_{ij}=\pi_j$
example of periodic Markov chain: two states, we change the state with probability 1
df: $i\in S$ $i \in S$ has period $d_i:=\text{gcd}\set{t:r_{ii}(t)\gt 0}$ $d_{i} := gcd {t : r_{ii} (t) > 0}$
- $i\in S$ is aperiodic if $d_i=1$
df: $i$ $i$ is null recurrent if $i$ $i$ is recurrent and $\mathbb E(T_i\mid X_0=i)=\infty$ $E (T_{i} ∣ X_{0} = i) = \infty$
- $i$ is positive recurrent if $i$ is recurrent and $\mathbb E(T_i\mid X_0=i)\lt\infty$
example: random walk on a line
theorem
- if $i,j\in S$ $i, j \in S$ , $i\leftrightarrow j$ $i \leftrightarrow j$ , then
  - $d_i=d_j$
  - $i$ is transient $\iff j$ is trainsient
  - $i$ is recurrent $\iff j$ is recurrent
  - $i$ is null recurrent $\iff j$ is null recurrent
  - $i$ is positive recurrent $\iff j$ is positive recurrent
- these are properties of the class of $\leftrightarrow$
theorem
- if a Markov chain is irreducible, aperiodic and finite, then
  - there exists a unique stationary distribution $\pi$ : $\pi P=\pi$
  - $\forall ij:\lim(P^t)_{ij}=\pi_j$ $\forall ij : lim (P^{t})_{ij} = π_{j}$
    - $P(X_t=j\mid X_0=i)\doteq\pi_j$
- actually, MC does not have to be finite, it suffices if all states are positive recurrent (?)
- steady state (?)
  - if $\pi^{(0)}=\pi$ then $\pi^{(1)}=\pi$
the proof is not easy, here is a cheat proof
- $Pj=Ij$ (row sums are 1)
- $(P-I)j=0$
- $P-I$ is sungular matrix
- $\exists x:x(P-I)=0\implies xP=x$
- $\pi=\frac xc$ such that $\sum \pi_i=1$
- problem
  - $x$ may have negative coordinates
  - to fix: use Perron-Frobenius theorem
  - the correct proof is shown in class of probabilistic techniques
to find $\pi$ $π$ , solve system of linear equations $\pi P=\pi$ $π P = π$ , add $\sum_{i\in S}\pi_i=1$ $\sum_{i \in S} π_{i} = 1$
- $\pi$ describes long-term behavior of the MC
- Page Rank (original google search) … MC model of people browsing WWW
- given $\pi$ $π$ , we can find a MC such that $\pi$ $π$ is its stationary distribution; then we can run the MC to generate random objects with distribution $\pi$ $π$
  - Markov chain Monte Carlo (MCMC)
detailed balance equation
- MC may have this property
- $\forall i\neq j:\pi_iP_{ij}=\pi_jP_{ji}$
to imagine this: ant colony moving independently according to a Markov chain
- stationary distribution $\iff$ the same number of ants at each state at each time – ants don't "accumulate"
detailed balance equation implies $\pi P=\pi$ $π P = π$
- detailed balance equation is stronger than $\pi P=\pi$
MCMC algo. sketch
- choose aperiodic irreducible digraph
- $p_{ij}=\min\set{1,\frac{\pi _j}{\pi_i}}\cdot C$
- $p_{ji}=\min\set{1,\frac{\pi_i}{\pi_j}}\cdot C$
- choose $C$ $C$ such that
  - $\forall i:\sum_{j\neq i} p_{ij}\lt 1$
  - df. $p_{ii}=1-\sum_{j\neq i}p_{ij}\gt 0$
  - $\implies d_i=1$
- tune the process to make convergence fast
absorbing state $i:p_{ii}=1$ $i : p_{ii} = 1$
- $A$ … set of absorbing states
- question 1: which $i\in A$ we end at?
- question 2: how fast?
example: $0\in A$ $0 \in A$ (?)
- $a_i=P(\exists t:X_t=0\mid X_0=i)$
$\mu_i=\mathbb E(T\mid X_0=i)$ $μ_{i} = E (T ∣ X_{0} = i)$
- $T=\min\set{t: X_t\in A}$
theorem: $(a_i)_{i\in S}$ $(a_{i})_{i \in S}$ are the unique solution to
- $a_0=1$
- $a_i=0$ if $i\in A,\,i\neq 0$
- $a_i=\sum_j p_{ij}\cdot a_j$ otherwise
theorem: $(\mu_i)_{i\in S}$ $(μ_{i})_{i \in S}$ are unieque solutions to
- $\mu_i=0$ if $i\in A$
- $\mu_i=1+\sum_j p_{ij}\mu_j$ if $i\notin A$
proof
- $P(\exists t:X_t=0\mid X_0=0)=1$
- $P(\exists t: X_t=0\mid X_0=i\in A\setminus\set{0})=0$
- $i\notin A$ $i \in / A$
  - $B_j=\set{\exists t:X_t=0}$
  - $P(B_i)=\sum_{j\in S} p_{ij}\cdot \underbrace{P(B_i\mid X_1=j)}_{P(B_j)=a_j}$
example: drunk person on their way home
- $A=\set{0}$
- $\mu_0=0$
- $\mu_1=1+\frac12\mu_0+\frac12\mu_2$
- $\mu_2=1+\frac12\mu_1+\frac12\mu_3$
- $\mu_{n-1}=1+\frac12\mu_{n-2}+\frac12\mu_{n}$
- $\mu_n=1+\mu_{n-1}$
- solution
  - $\mu_1=2n-1$
  - $\mu_n=n^2$