Huffman Coding

Huffman Coding | DAA

December 29, 2019

Huffman coding

Huffman coding is an algorithm for the lossless compression of ﬁles based on the frequency of occurrence of a symbol in the ﬁle that is being compressed. In any ﬁle, certain characters are used more than others. Using binary representation, the number of bits required to represent each character depends upon the number of characters that have to be represented. Using one bit we

can represent two characters, i.e., 0 represents the first character and l represents the second character. Using two bits we can represent four characters, and so on. Unlike ASCII code, which is a fixed-length code using seven bits per character, Huffman compression is a variable-length coding system that assigns smaller codes for more frequently used characters and larger codes

for less frequently used characters in order to reduce the size of ﬁles being compressed and transferred.

For example, in a ﬁle with the following data: ‘XXXXXXYYYYZZ. The frequency of "X" is 6, the frequency of "Y" is 4, and the frequency of "Z" is 2. If each character is represented using a fixed-length code of two bits, then the number of bits required to store this ﬁle would be 24, i.e., (2 x 6) + (2): 4) + (2): 2] = 24. If the above data were compressed using Huffman compression, the more frequently occurring numbers would be represented by smaller bits, such as X by the code 0 (1 bit), Y by the code 10 (2 bits), and Z by the code 11 (2 bits}, the size of the ﬁle becomes 18, i.e., (h 6) + [2 x 4] + (2 x 2) = 18. In this example, more frequently occurring characters are

assigned smaller codes, resulting in a smaller number of bits in the final compressed file.

Huffman compression was named after its discoverer, David Huffman.

To generate Huffman codes, we should create a binary tree of nodes. Initially, all nodes are leaf nodes, which contain the symbol itself, and the weight (frequency of appearance) of the symbol. As a common convention, bit '0' represents following the left child, and a bit '1' represents following the right child. A ﬁnished tree has up to n leaf nodes and n — l. internal nodes. A Huffman tree that omits unused symbols produces the most optimal code lengths. The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent, then a new node whose children are the 2 nodes with the smallest probability is created, such that the new node's probability is equal to the sum of the children's probability. With the previous 2 nodes merged into one node and with the new 11ode being now considered, the procedure is repeated until only one node remains in the Huffman tree. The simplest construction algorithm uses a priority queue where the 11ode with the lowest probability is given the highest priority:

Example

The following example is based on a data source using a set of ﬁve different symbols The

symbol's frequencies are:

Symbol Frequency

A 24

B 12

C 10

D 5

E 8

----> total 186 bits (with 3 bits per code word)

Algorithm

A greedy algorithm can construct Huffman code that is optimal preﬁx codes. A tree corresponding to optimal codes is constructed in a bottom-up manner starting from the |C| leaves and |C|-1 merging operations. Use priority queue Q to keep nodes ordered by frequency. Here the priority queue we considered is a binary heap.

HuffmanAlgo(C)

{

    n = | C | ;
    Q = C1

    For(i = 1; i <= n - 1; i++)

    {

        2 = Allocate - Node();

        x = Extract - Min(Q);

        y = Extract - Min(Q);

        left(z) = x;
        right(z) = y;

        f(z) = f(x) + (y);

        Insert(Q, 2);
    }
}

Analysis

We can use BuildHeap(C] to create a priority queue that takes O(11) time. Inside the for loop the expensive operations can be done in Oﬂogn) time. Since operations inside for loop execute for n-1 time total running time of the Huffman algorithm is O(11logn).

Search This Blog

The Complete Codes

Top 10 Programming Language to learn in 2023

Huffman Coding | DAA

Huffman coding

Example

Algorithm

Analysis

Comments

Post a Comment

Popular posts from this blog

Eliminating Epsilon transition (ε-Transitions) | Conversion of Epsilon-NFA to DFA | Conversion of Epsilon-NFA to NFA | Theory Of Computation (TOC)

Top 10 Programming Language to learn in 2023

Doolittle Algorithm: LU Decomposition Algorithm