David Betteridge
Theme

Following a conversation with David Turner after the YorkCodeDojo on Wednesday night I thought I would have a go at implementing Huffman encoding using C#

As I couldn't remember the finer details (University was a long time ago), it was off to Wikipedia to look it up. https://en.wikipedia.org/wiki/Huffman_coding

My first step was to order the letters by the frequency in which they occured in the source text. At this point I got a bit sidetracked as I wondered if using Linq would be a lot slower than writing a manual loop.

I now have a nice reuseable timing function :-)

```c# internal static T Time(Func function) { var sw = new Stopwatch(); sw.Start(); var result = function(); sw.Stop();

Console.WriteLine($"{sw.Elapsed.TotalMilliseconds}ms");

return result;

} ``` (It turns out for this size of text there is no difference in performance)

The Linq code is

c# static IEnumerable<LeafNode> OrderLettersByFrequencyUsingLinq(IEnumerable<char> fileText) { return fileText.GroupBy(c => c) .Select(grp => new LeafNode(grp.Key, grp.Count())) .OrderBy(grp => grp.Weight); }

Once you have the letters sorted into frequency order, you can build a tree using two queues.

There are two types of nodes in the tree 1. LeafNodes which are the nodes which represent a letter and it's original weight.
2. CombinedNodes which combine two nodes together.

Both LeafNodes and CombinedNodes implement a INode interface.

I started off with a helper function which returns the lowest two values from the queues.

c# private static INode GetLowestItem(Queue<LeafNode> leafQueue, Queue<CombinedNode> combinedQueue) { if (leafQueue.Count() == 0) return combinedQueue.Dequeue(); else if (combinedQueue.Count() == 0) return leafQueue.Dequeue(); else if (leafQueue.Peek().Weight < combinedQueue.Peek().Weight) return leafQueue.Dequeue(); else return combinedQueue.Dequeue(); }

Which then makes building the tree straight forward. We keep combining nodes until only 1 node remains. ```c# private static CombinedNode BuildTree(IEnumerable lettersByFrequency) { var leafQueue = new Queue(); var combinedQueue = new Queue();

foreach (var item in lettersByFrequency)
    leafQueue.Enqueue(item);

while (leafQueue.Count() + combinedQueue.Count() > 1)
{
    var lhs = GetLowestItem(leafQueue, combinedQueue);
    var rhs = GetLowestItem(leafQueue, combinedQueue);
    var combinedNode = new CombinedNode(lhs, rhs);
    combinedQueue.Enqueue(combinedNode);
}
return combinedQueue.Dequeue();

} ```

Once we have the tree we can recurse down it to generate the unique code for each letter ```c# private static void DisplayTree(INode node, string parentValue = "") { if (node is LeafNode leafNode) Console.WriteLine($"{leafNode.Key} is {parentValue}");

if (node is CombinedNode combinedNode)
{
    DisplayTree(combinedNode.LHS, parentValue + "0");
    DisplayTree(combinedNode.RHS, parentValue + "1");
}

} ```

To decode the message is quite fun. First of all you must build a tree with the same structure as you used to encode it. There is a bit in the article about the best way to transmit that. You then start at the beginning of the encoded text and walk the tree based on the current 0 or 1 value. Once you reach a leaf node, you output the character and return to the top of the tree.

```c# private static string DecodeText(CombinedNode tree, string encodedText) { var result = ""; var offset = 0; INode currentNode = tree; while (offset < encodedText.Length) { if (currentNode is CombinedNode combinedNode) { if (encodedText[offset] == '0') currentNode = combinedNode.LHS; else currentNode = combinedNode.RHS; }

    if (currentNode is LeafNode leafNode)
    {
        result += leafNode.Key;
        currentNode = tree;
    }

    offset++;
}
return result;

}

```

The complete code is available on github