this dir | view | cards | source | edit | dark
top
Lecture
- credit
- homeworks
- attendance to practicals is mandatory, 3 absences allowed at maximum
- final test
- sample space Ω
- event A as a set of basic outcomes
- we can estimate the probability of event A by experiment
- we divide the number of A occurring by the number of experiments
- maximum likelihood estimation
- axioms
- p(A)∈[0,1]
- p(Ω)=1
- p(⋃Ai)=∑p(Ai)
- joint probability, conditional probability
- estimating conditional probability
- Bayes rule
- independence
- chain rule
- golden rule of statistical NLP
- expectation
- entropy
- nothing can be more uncertain than the uniform distribution
- perplexity
- G(p)=2H(p)
- joint entropy, conditional entropy
- entropy is non-negative
- chain rule
- H(X,Y)=H(Y∣X)+H(X)
- H(Y∣X)≤H(Y)
- other properties of entropy
- coding interpretation
- entropy … the least average number of bits needed to encode a message
- KL distance (divergence)
- mutual information
- I(X,Y)=D(p(x,y)∥p(x)p(y))
- we can derive that I(X,Y)=H(X)−H(X∣Y)
- by symmetry I(X,Y)=H(Y)−H(Y∣X)
- cross-entropy