this dir | view | cards | source | edit | dark
top
Probability and Statistics 2
Markov chains
- example 1
- machine in 2 states – working × broken
- w → b … probability 0.01
- w → w … 0.99
- b → w … 0.9
- b → b … 0.1
- example 2
- a fly which may get caught in a web … absorbing states
- df: stochastic process (náhodný proces)
- sequence of random variables X0,X1,… such that "all probabilities are defined"
- e.g. P(X0=1∧X1=2∧X3=0) is defined
- df: Markov process/chain
- stochastic process such that
- ∃ finite/countable set S:ImXt⊆S∀t
- elements of S … states
- t … time
- (∀i,j∈S)(∃pij∈[0,1])(∀t)(∀a0,a1,…,at,at+1)
- notes
- this is a discrete time Markov chain
- discrete space … ∣S∣≤∣N∣
- time-homogeneous
- pij not depeding on t
- Markov condition
- only applies if the condition probabilities are defined
- Markov condition … "forgetting the past"
- df: transition matrix P=(pij)
- df: transition diagram/graph
- vertices … S
- directed edges … {(i,j):pij>0}
- observation: every row of P sums to 1
- Markov chain is equivalent to a random walk on the transition diagram
- from state i we go to j with the probability of pij independent of the past
- in example 1, assume X0, what is P(X2=w)?
- P(X1=w)=p11=0.99
- P(X2=w)=p11⋅p11+p12⋅p21=0.99⋅0.99+0.01⋅0.9
- using the total probability theorem
- probability mass function (PMF, pravděpodobnostní funkce) for X0,X1,…
- πi(t)=P(Xt=i)
- t∈N0
- i∈S
- theorem
- π(t+1)=π(t)⋅P
- where π(t)=(π1(t),…,πn(t)) row vector
- proof … definition of matrix multiplication & total probability theorem
- theorem
- P(X0=a0,X1=a1,…,Xt=at)=πa0(0)⋅Pa0a1⋅Pa1a2⋅…⋅Pat−1at
- proof by induction
- df: probability of k-step transition … rij(k)
- rij(k)=P(Xt+k=j∣Xt=i)
- rij(1)=P(Xt+1=j∣Xt=i)=pij
- it is independent of t
- theorem (Chapman-Kolmogorov)
- ∀ Markov chain, ∀i,j,k:
- rij(k+1)=∑u=1nriu(k)puj
- rij(k+ℓ)=∑u=1nriu(k)ruj(ℓ)
- rij(k)=(Pk)ij
- proof
- 1 is a special case of 2 (ruj(1)=puj)
- 1⟹3 … matrix multiplication & induction
- …
- df: j is accessible from i (where i,j∈S)
- i→j
- j∈A(i)
- j is accessible from i≡ ∃t:rij(t)P(Xt=j∣X0=i)>0
- ⟺ in the transition digraph exists directed path i→j of length t
- df: i↔j (i and j communicate) ≡i∈A(j)∧j∈A(i)
- theorem: ↔ is an equivalence relation
- proof
- reflexive: i∈A(i) … t=0 in the definition
- symmetric: the formula i∈A(j)∧j∈A(i) is symmetric
- transitive: in the digraph ∃ path i to j, ∃ path j to k ⟹∃ walk i→k
- decomposition of the transition digraph to components of strong connectivity ≡ finding equivalence classes of ↔
- df: a state i∈S is recurrent if we return to it with probability 1
- it is transient otherwise
- česky rekurentní, tranzientní
- df: Markov chain is irreducible if ∀i,j∈S:i↔j
- df: Ti=min{t≥1:Xt=i}
- Ti=∞ if the set is empty (there is no such t)
- Ti … random variable
- fij(n)=P(we get to j first at time n∣X0=i)=P(Tj=n∣X0=i)
- we define it for n>0
- fij=P(Tj<∞∣X0=i)=∑n≥1fij(n)
- i is transient
- fii<1 … probability of ever getting back
- ⟺P(Ti=∞)>0
- ⟺P(get back infinitely often)<1
- example – random walk on a line
- with probability 1/2 we go to the left
- f00(2)=(21)2+(21)2=21
- f00(1)=f00(3)=⋯=0
- f00(4)=241⋅2
- r00(4)=f00(4)
- r00(4)=241⋅4
- is 0 recurrent?
- by definition, we should add f00(2)+f00(4)+… and check if it equals 1
- theorem: i is a recurrent state ⟺∑n=1∞rii(n)=∞
- Bn={10 if Xn=i otherwise
- we got back
- E(Bn∣X0=i)=P(Xn=i∣X0=i)=rii(n)
- r00(2n)= probability that out of the 2n steps, n were to the left, n were to the right
- =22n1⋅(n2n)≐nc … see Matoušek, Nešetřil
- ∑n=1∞nc=∞ (taught in calculus)
- E(T0∣X0=0)=∞ (we did not prove that)
- theorem
- if i↔j, then either both i and j are recurrent or both i and j are transient
- for finite Markov chains
- C … equiv class of ↔ in a finite Markov chain
- C is recurrent (∀i∈C is recurrent) ⟺(∀i∈C)(∀j∈S): if i→j then j∈C
- stationary distribution / steady state distribution
- df: π:S→[0,1] such that ∑i∈Sπi=1 is called a stationary distribution if “πP=π"
- ∀i:∑iπipij=πj
- if π(0) (the PMF of X0) is π, then π(1) is π as well
- π(1)=π(0)P
- theorem: if a Markov chain is finite, aperiodic (→ not periodic) and irreducible, then
- ∃ unique stat. distribution π
- ∀i:limn→∞(Pn)ij=πj
- example of periodic Markov chain: two states, we change the state with probability 1
- df: i∈S has period di:=gcd{t:rii(t)>0}
- i∈S is aperiodic if di=1
- df: i is null recurrent if i is recurrent and E(Ti∣X0=i)=∞
- i is positive recurrent if i is recurrent and E(Ti∣X0=i)<∞
- example: random walk on a line
- theorem
- if i,j∈S, i↔j, then
- di=dj
- i is transient ⟺j is trainsient
- i is recurrent ⟺j is recurrent
- i is null recurrent ⟺j is null recurrent
- i is positive recurrent ⟺j is positive recurrent
- these are properties of the class of ↔
- theorem
- if a Markov chain is irreducible, aperiodic and finite, then
- there exists a unique stationary distribution π: πP=π
- ∀ij:lim(Pt)ij=πj
- P(Xt=j∣X0=i)≐πj
- actually, MC does not have to be finite, it suffices if all states are positive recurrent (?)
- steady state (?)
- if π(0)=π then π(1)=π
- the proof is not easy, here is a cheat proof
- Pj=Ij (row sums are 1)
- (P−I)j=0
- P−I is sungular matrix
- ∃x:x(P−I)=0⟹xP=x
- π=cx such that ∑πi=1
- problem
- x may have negative coordinates
- to fix: use Perron-Frobenius theorem
- the correct proof is shown in class of probabilistic techniques
- to find π, solve system of linear equations πP=π, add ∑i∈Sπi=1
- π describes long-term behavior of the MC
- Page Rank (original google search) … MC model of people browsing WWW
- given π, we can find a MC such that π is its stationary distribution; then we can run the MC to generate random objects with distribution π
- Markov chain Monte Carlo (MCMC)
- detailed balance equation
- MC may have this property
- ∀i=j:πiPij=πjPji
- to imagine this: ant colony moving independently according to a Markov chain
- stationary distribution ⟺ the same number of ants at each state at each time – ants don't "accumulate"
- detailed balance equation implies πP=π
- detailed balance equation is stronger than πP=π
- MCMC algo. sketch
- choose aperiodic irreducible digraph
- pij=min{1,πiπj}⋅C
- pji=min{1,πjπi}⋅C
- choose C such that
- ∀i:∑j=ipij<1
- df. pii=1−∑j=ipij>0
- ⟹di=1
- tune the process to make convergence fast
- absorbing state i:pii=1
- A … set of absorbing states
- question 1: which i∈A we end at?
- question 2: how fast?
- example: 0∈A (?)
- ai=P(∃t:Xt=0∣X0=i)
- μi=E(T∣X0=i)
- T=min{t:Xt∈A}
- theorem: (ai)i∈S are the unique solution to
- a0=1
- ai=0 if i∈A,i=0
- ai=∑jpij⋅aj otherwise
- theorem: (μi)i∈S are unieque solutions to
- μi=0 if i∈A
- μi=1+∑jpijμj if i∈/A
- proof
- P(∃t:Xt=0∣X0=0)=1
- P(∃t:Xt=0∣X0=i∈A∖{0})=0
- i∈/A
- Bj={∃t:Xt=0}
- P(Bi)=∑j∈Spij⋅P(Bj)=ajP(Bi∣X1=j)
- example: drunk person on their way home
- A={0}
- μ0=0
- μ1=1+21μ0+21μ2
- μ2=1+21μ1+21μ3
- μn−1=1+21μn−2+21μn
- μn=1+μn−1
- solution
- μ1=2n−1
- μn=n2