data compression
measuring the performance
limits of lossless compression
basic concepts
typical strategy
codes
prefix codes
Shannon-Fano coding
algorithm
optimality of Huffman code
can we modify Huffman algorithm to guarantee that the resulting code minimizes the maximum codeword length?
yes, when constructing the tree, we can sort the nodes not only by their frequencies but also by the depth of their subtree
what is the maximum height of a Huffman tree for an input message of length ?
generalize the construction of binary Huffman code to the case of -ary coding alphabet ()
implementation notes
adaptive compression
if we know that the code contains 2 codewords of length 2 and 4 codewords of length 3, we can construct canonical Huffman code
00, 01, 100, 101, 110, 111
adaptive Huffman code
each string associate with a subinterval such that
encoding
decoding
problem: using floating point arithmetic leads to rounding errors → we will use integers
underflow may happen that way if we use too short integers
another problem
adaptive version
Hartley's formula
minimum and average number of yes/no questions to find the answer
Shannon's formula
Kraft-McMillan theorem
entropy of a discrete random vector
analysis of arithmetic coding
… the midpoint of the interval
problems
a better probabilistic model
according to an experimental estimate
the entropy of English is 1.3 bits per symbol
methods
Prediction by Partial Matching
unary code
binary code
Elias codes
observation
each probability distribution satisfies
idea
problems
LZ77
LZSS
other variants
Deflate algorithm
LZ77 disadvantages
LZ78
LZW
LZC
LZMW
LZAP
instead of ST add all substrings Sp where p is a prefix of T
digital images
color spaces
representation of bitmap images
GIF
PNG
interlacing schemes
Burrows-Wheeler transform
matrix is not neccessary
we can use suffix array
why does BW transform work?
how to measure the loss of information
capturing audio
quantization
vector quantization
differential encoding
transform coding
JPEG
…
WebP
example
can we reduce the problem to image compression?
basic types of algorithms
video signal representation
MPEG-1
H.261