data compression
compression
methods
measuring the performance
limits of lossless compression
basic concepts
typical strategy
codes
prefix codes
Shannon-Fano coding
optimality of Huffman code
can we modify Huffman algorithm to guarantee that the resulting code minimizes the maximum codeword length?
yes, when constructing the tree, we can sort the nodes not only by their frequencies but also by the depth of their subtree
what is the maximum height of a Huffman tree for an input message of length ?
is there an optimal prefix code which cannot be obtained using Huffman algorithm?
yes
generalize the construction of binary Huffman code to the case of -ary coding alphabet ()
implementation notes
adaptive compression
if we know that the code contains 2 codewords of length 2 and 4 codewords of length 3, we can construct canonical Huffman code (?)
00, 01, 100, 101, 110, 111
adaptive Huffman code
each string associate with a subinterval such that
encoding
decoding
problem: using floating point arithmetic leads to rounding errors → we will use integers
underflow may happen that way if we use too short integers
another problem