# Probability and Statistics 2 ## Markov chains - example 1 - machine in 2 states – working × broken - w → b … probability 0.01 - w → w … 0.99 - b → w … 0.9 - b → b … 0.1 - example 2 - a fly which may get caught in a web … absorbing states - df: stochastic process (náhodný proces) - sequence of random variables $X_0,X_1,\dots$ such that "all probabilities are defined" - e.g. $P(X_0=1\land X_1=2\land X_3=0)$ is defined - df: Markov process/chain - stochastic process such that - $\exists$ finite/countable set $S:\text{Im}X_t\subseteq S\;\forall t$ - elements of $S$ … states - $t$ … time - $(\forall i,j\in S)(\exists p_{ij}\in [0,1])(\forall t)(\forall a_0,a_1,\dots,a_t,a_{t+1})$ - notes - this is a discrete time Markov chain - discrete space … $|S|\leq |\mathbb N|$ - time-homogeneous - $p_{ij}$ not depeding on $t$ - Markov condition - only applies if the condition probabilities are defined - Markov condition … "forgetting the past" - df: transition matrix $P=(p_{ij})$ - df: transition diagram/graph - vertices … $S$ - directed edges … $\set{(i,j):p_{ij}\gt 0}$ - observation: every row of $P$ sums to 1 - Markov chain is equivalent to a random walk on the transition diagram - from state $i$ we go to $j$ with the probability of $p_{ij}$ independent of the past - in example 1, assume $X_0$, what is $P(X_2=w)$? - $P(X_1=w)=p_{11}=0.99$ - $P(X_2=w)=p_{11}\cdot p_{11}+p_{12}\cdot p_{21}=0.99\cdot 0.99+0.01\cdot 0.9$ - using the total probability theorem - probability mass function (PMF, pravděpodobnostní funkce) for $X_0,X_1,\dots$ - $\pi_i^{(t)}=P(X_t=i)$ - $t\in\mathbb N_0$ - $i\in S$ - theorem - $\pi^{(t+1)}=\pi^{(t)}\cdot P$ - where $\pi^{(t)}=(\pi_1^{(t)},\dots,\pi_n^{(t)})$ row vector - proof … definition of matrix multiplication & total probability theorem - theorem - $P(X_0=a_0,X_1=a_1,\dots,X_t=a_t)=\pi_{a_0}^{(0)}\cdot P_{a_0a_1}\cdot P_{a_1a_2}\cdot\ldots\cdot P_{a_{t-1}a_t}$ - proof by induction - df: probability of $k$-step transition … $r_{ij}(k)$ - $r_{ij}(k)=P(X_{t+k}=j\mid X_t=i)$ - $r_{ij}(1)=P(X_{t+1}=j\mid X_t=i)=p_{ij}$ - it is independent of $t$ - theorem (Chapman-Kolmogorov) - $\forall$ Markov chain, $\forall i,j,k:$ - $r_{ij}(k+1)=\sum_{u=1}^n r_{iu}(k) p_{uj}$ - $r_{ij}(k+\ell)=\sum_{u=1}^n r_{iu}(k) r_{uj}(\ell)$ - $r_{ij}(k)=(P^k)_{ij}$ - proof - 1 is a special case of 2 ($r_{uj}(1)=p_{uj})$ - $1\implies 3$ … matrix multiplication & induction - … - df: $j$ is accessible from $i$ (where $i,j\in S$) - $i\to j$ - $j\in A(i)$ - $j$ is accessible from $i\equiv$ $\exists t:\underbrace{P(X_t=j\mid X_0=i)}_{r_{ij}(t)}\gt 0$ - $\iff$ in the transition digraph exists directed path $i\to j$ of length $t$ - df: $i\leftrightarrow j$ ($i$ and $j$ communicate) $\equiv i\in A(j)\land j\in A(i)$ - theorem: $\leftrightarrow$ is an equivalence relation - proof - reflexive: $i\in A(i)$ … $t=0$ in the definition - symmetric: the formula $i\in A(j)\land j\in A(i)$ is symmetric - transitive: in the digraph $\exists$ path $i$ to $j$, $\exists$ path $j$ to $k$ $\implies\exists$ walk $i\to k$ - decomposition of the transition digraph to components of strong connectivity $\equiv$ finding equivalence classes of $\leftrightarrow$ - df: a state $i\in S$ is recurrent if we return to it with probability 1 - it is transient otherwise - česky rekurentní, tranzientní - df: Markov chain is irreducible if $\forall i,j\in S:i\leftrightarrow j$ - df: $T_i=\min\set{t\geq 1:X_t=i}$ - $T_i=\infty$ if the set is empty (there is no such $t$) - $T_i$ … random variable - $f_{ij}(n)=P(\text{we get to }j\text{ first at time }n\mid X_0=i)=P(T_j=n\mid X_0=i)$ - we define it for $n\gt 0$ - $f_{ij}=P(T_j\lt\infty\mid X_0=i)=\sum_{n\geq 1} f_{ij}(n)$ - $i$ is transient - $f_{ii}\lt 1$ … probability of ever getting back - $\iff P(T_i=\infty)\gt 0$ - $\iff P(\text{get back infinitely often})\lt 1$ - example – random walk on a line - with probability 1/2 we go to the left - $f_{00}(2)=(\frac12)^2+(\frac12)^2=\frac12$ - $f_{00}(1)=f_{00}(3)=\dots=0$ - $f_{00}(4)=\frac{1}{2^4}\cdot 2$ - $r_{00}(4)\neq f_{00}(4)$ - $r_{00}(4)=\frac{1}{2^4}\cdot 4$ - is 0 recurrent? - by definition, we should add $f_{00}(2)+f_{00}(4)+\dots$ and check if it equals 1 - theorem: $i$ is a recurrent state $\iff\sum_{n=1}^\infty r_{ii}(n)=\infty$ - $B_n=\begin{cases} 1&\text{ if }X_n=i\\ 0 &\text{ otherwise}\end{cases}$ - we got back - $\mathbb E(B_n\mid X_0=i)=P(X_n=i\mid X_0=i)=r_{ii}(n)$ - $r_{00}(2n)=$ probability that out of the $2n$ steps, $n$ were to the left, $n$ were to the right - $=\frac1{2^{2n}}\cdot{2n\choose n}\doteq\frac c{\sqrt n}$ … see Matoušek, Nešetřil - $\sum_{n=1}^\infty \frac{c}{\sqrt n}=\infty$ (taught in calculus) - $\mathbb E(T_0\mid X_0=0)=\infty$ (we did not prove that) - theorem - if $i\leftrightarrow j$, then either both $i$ and $j$ are recurrent or both $i$ and $j$ are transient - for finite Markov chains - $C$ … equiv class of $\leftrightarrow$ in a finite Markov chain - $C$ is recurrent ($\forall i\in C$ is recurrent) $\iff(\forall i \in C)(\forall j\in S):$ if $i\to j$ then $j\in C$ - → $C$ is closed - stationary distribution / steady state distribution - df: $\pi:S\to [0,1]$ such that $\sum_{i\in S}\pi_i=1$ is called a stationary distribution if $\text{``}\pi P=\pi\text{"}$ - $\forall i:\sum_i\pi_i p_{ij}=\pi_j$ - if $\pi^{(0)}$ (the PMF of $X_0$) is $\pi$, then $\pi^{(1)}$ is $\pi$ as well - $\pi^{(1)}=\pi^{(0)}P$ - theorem: if a Markov chain is finite, aperiodic (→ not periodic) and irreducible, then 1. $\exists$ unique stat. distribution $\pi$ 2. $\forall i:\lim_{n\to\infty}(P^n)_{ij}=\pi_j$ - example of periodic Markov chain: two states, we change the state with probability 1 - df: $i\in S$ has period $d_i:=\text{gcd}\set{t:r_{ii}(t)\gt 0}$ - $i\in S$ is aperiodic if $d_i=1$ - df: $i$ is null recurrent if $i$ is recurrent and $\mathbb E(T_i\mid X_0=i)=\infty$ - $i$ is positive recurrent if $i$ is recurrent and $\mathbb E(T_i\mid X_0=i)\lt\infty$ - example: random walk on a line - theorem - if $i,j\in S$, $i\leftrightarrow j$, then - $d_i=d_j$ - $i$ is transient $\iff j$ is trainsient - $i$ is recurrent $\iff j$ is recurrent - $i$ is null recurrent $\iff j$ is null recurrent - $i$ is positive recurrent $\iff j$ is positive recurrent - these are properties of the class of $\leftrightarrow$ - theorem - if a Markov chain is irreducible, aperiodic and finite, then - there exists a unique stationary distribution $\pi$: $\pi P=\pi$ - $\forall ij:\lim(P^t)_{ij}=\pi_j$ - $P(X_t=j\mid X_0=i)\doteq\pi_j$ - actually, MC does not have to be finite, it suffices if all states are positive recurrent (?) - steady state (?) - if $\pi^{(0)}=\pi$ then $\pi^{(1)}=\pi$ - the proof is not easy, here is a cheat proof - $Pj=Ij$ (row sums are 1) - $(P-I)j=0$ - $P-I$ is sungular matrix - $\exists x:x(P-I)=0\implies xP=x$ - $\pi=\frac xc$ such that $\sum \pi_i=1$ - problem - $x$ may have negative coordinates - to fix: use Perron-Frobenius theorem - the correct proof is shown in class of probabilistic techniques - to find $\pi$, solve system of linear equations $\pi P=\pi$, add $\sum_{i\in S}\pi_i=1$ - $\pi$ describes long-term behavior of the MC - Page Rank (original google search) … MC model of people browsing WWW - given $\pi$, we can find a MC such that $\pi$ is its stationary distribution; then we can run the MC to generate random objects with distribution $\pi$ - Markov chain Monte Carlo (MCMC) - detailed balance equation - MC may have this property - $\forall i\neq j:\pi_iP_{ij}=\pi_jP_{ji}$ - to imagine this: ant colony moving independently according to a Markov chain - stationary distribution $\iff$ the same number of ants at each state at each time – ants don't "accumulate" - detailed balance equation implies $\pi P=\pi$ - detailed balance equation is stronger than $\pi P=\pi$ - MCMC algo. sketch - choose aperiodic irreducible digraph - $p_{ij}=\min\set{1,\frac{\pi _j}{\pi_i}}\cdot C$ - $p_{ji}=\min\set{1,\frac{\pi_i}{\pi_j}}\cdot C$ - choose $C$ such that - $\forall i:\sum_{j\neq i} p_{ij}\lt 1$ - df. $p_{ii}=1-\sum_{j\neq i}p_{ij}\gt 0$ - $\implies d_i=1$ - tune the process to make convergence fast - absorbing state $i:p_{ii}=1$ - $A$ … set of absorbing states - question 1: which $i\in A$ we end at? - question 2: how fast? - example: $0\in A$ (?) - $a_i=P(\exists t:X_t=0\mid X_0=i)$ - $\mu_i=\mathbb E(T\mid X_0=i)$ - $T=\min\set{t: X_t\in A}$ - theorem: $(a_i)_{i\in S}$ are the unique solution to - $a_0=1$ - $a_i=0$ if $i\in A,\,i\neq 0$ - $a_i=\sum_j p_{ij}\cdot a_j$ otherwise - theorem: $(\mu_i)_{i\in S}$ are unieque solutions to - $\mu_i=0$ if $i\in A$ - $\mu_i=1+\sum_j p_{ij}\mu_j$ if $i\notin A$ - proof - $P(\exists t:X_t=0\mid X_0=0)=1$ - $P(\exists t: X_t=0\mid X_0=i\in A\setminus\set{0})=0$ - $i\notin A$ - $B_j=\set{\exists t:X_t=0}$ - $P(B_i)=\sum_{j\in S} p_{ij}\cdot \underbrace{P(B_i\mid X_1=j)}_{P(B_j)=a_j}$ - example: drunk person on their way home - $A=\set{0}$ - $\mu_0=0$ - $\mu_1=1+\frac12\mu_0+\frac12\mu_2$ - $\mu_2=1+\frac12\mu_1+\frac12\mu_3$ - $\mu_{n-1}=1+\frac12\mu_{n-2}+\frac12\mu_{n}$ - $\mu_n=1+\mu_{n-1}$ - solution - $\mu_1=2n-1$ - $\mu_n=n^2$