In standard Huffman coding, the compressor builds a Huffman Tree based upon the counts/frequencies of the symbols occurring in the filetobecompressed and then assigns to each symbol the codeword implied by the path from the root to the leaf node associated to that symbol. For example, if we adopt the convention that an edge from a node to its left (respectively, right) child is labeled 0 (resp., 1), then if the path from the root to a particular leaf is left, left, right, left, right, left, then the codeword assigned to the associated symbol will be 001010.
Canonical Huffman Coding recognizes that the essential information provided by a Huffman Tree is the mapping from symbols to their codeword lengths; the particular bit patterns of the codewords are secondary and can be computed independently of the tree. Indeed, in Canonical Huffman Coding the set of codewords that is employed depends solely upon the distribution of codeword lengths. This codeword set is chosen so as to satisfy not only the familiar prefixfreeness property (i.e., no codeword is the prefix of any other), which guarantees that the deciphering delay is zero, but also this property:
LongerisLesser property:
If x and y are codewords, with x > y, then x' ≺ y, where x' is the prefix of x of length y. 
Using standard notation, z denotes the length of z and ≺ denotes the "lexicographically less than" relation. Lexicographic ordering is essentially the same as alphabetic ordering.
With respect to bit strings u and v, to say that u ≺ v is to say that either u is a proper prefix of v or else the leftmost bit in which they differ is a 0 in u and a 1 in v. For example, 100101 ≺ 10100 because of the bits in the 3rd position (counting from one at the left). (For essentially the same reason, the word "carwash" precedes "cattle" in the dictionary.)
Now, if A and B are leaves in a Huffman Tree (in which edges to left (respectively, right) children are labeled 0 (resp., 1)) with corresponding codewords x and y (i.e., the labels on the edges along the path from the root to A (respectively, B) spell out x (resp., y)) then x ≺ y is equivalent to A being "to the left" of B in the tree. (Let C be the nearest common ancestor of A and B in the tree. For A to be to the left of B means that A is in the left subtree of C and B is in the right subtree of C.)
Thus, in order for the set of codewords induced by a Huffman Tree to satisfy the LongerisLesser property, the tree must have this property:
LefterisDeeper property:
If A and B are leaves and A is to the left of B, then depthOf(A) ≥ depthOf(B). (The depth of a node is its distance from the root.) 
But we can take any Huffman Tree and, by a judicious sequence of swaps of subtrees rooted at nodes of the same depth, arrive at another Huffman Tree having the LefterisDeeper property and having a set of codewords whose length distribution is the same as that in the original tree.
Even though such a Huffman Tree transformation process is possible, it's not necessary to do it that way. A better approach is to take the codeword length distribution of the original tree and to build a LefterisDeeper tree directly therefrom. Indeed, for any given distribution of lengths, there is only one possible tree structure.
For example, suppose that the symbol frequencies led us to build one of the many Huffman Trees in which the codeword length distribution was as on the left below. Then the corresponding (unique) LefterisDeeper tree (where each leaf's depth is explicitly indicated) is in the middle, and the resulting set of codewords (listed in lexicographically increasing —and thus length descending— order) is to the right:

* / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ * * * * / \ / \ / \ 2 / \ / \ / \ / \ / \ / \ * * * * * * / \ / \ 3 3 3 3 * * * * / \ 4 4 4 * * / \ 5 * * 6 6 
Codewords 000000 000001 00001 0001 0010 0011 010 011 100 101 11 
In preparation for describing these properties, we offer a few definitions:
Definition 1:
For a bit string z, let #(z) be the natural number represented
by z in accord with the binary numeral system. This function can be
described recursively like this:

Examples: #(1001) = 9, #(00110) = 6, #(00110110) = 54.
Theorem 1: #(uv) = 2^{v}·#(u) + #(v)
Proof: The proof is by mathematical induction on v:
For the basis, suppose that v = 0 (i.e., v = λ),
so that uv = u. Then we have
#(uv) = < v is the empty string > #(u) = < 1, 0 are the identities of ·, +, respectively > 1·#(u) + 0 = < 2^{0} = 1 > 2^{0}·#(u) + 0 = = < v = 0 (i.e., v = λ) and #(λ) = 0 > 2^{v}·#(u) + #(v) 
For the induction step, let n ≥ 0 be arbitrary and assume as an
induction hypothesis (IH) that the theorem holds whenever v = n.
We show that it holds when v = n+1.
Toward this end, suppose that v = wb, where w is a bit string of length n
and b is either '0' or '1'.
Then we have
#(uv) = < v = wb > #(uwb) = < Definition 1 > 2·#(uw) + #(b) = < IH applied to uw (note that w = n) > 2·(2^{w}·#(u) + #(w)) + #(b) = < · distributes over + > 2·2^{w}·#(u) + 2·#(w) + #(b) = < Definition 1 > 2·2^{w}·#(u) + #(wb) = < 2·2^{k} = 2^{k+1} > 2^{w+1}·#(u) + #(wb) = < wb = w + 1 > 2^{wb}·#(u) + #(wb) = < v = wb > 2^{v}·#(u) + #(v) 
Longer'sPrefixisLesser property:
Let x and y be codewords, with x > y, and let x' be the prefix of x of length y. Then #(x') < #(y). Proof: Because x > y, the leaf A satisfying P(A) = x must be to the left of the leaf B satisfying P(B) = y. Let C be the nearest common ancestor of A and B, and let P(C) = u. Then, for some bit strings v and w of the same length r ≥ 0, x' = u0v and y = u1w. We maximize #(x') by choosing v = 1^{r} and minimize #(y) by choosing w = 0^{r}. Thus, it suffices to show that #(u01^{r}) < #(u10^{r}), or, equivalently, #(u10^{r})  #(u01^{r}) > 0. #(u10^{r})  #(u01^{r}) = < Theorem 1 > (2^{r+1}·#(u) + #(10^{r}))  (2^{r+1}·#(u) + #(01^{r})) = < algebra > #(10^{r})  #(01^{r}) = < Theorem 2b, 2c, 2d > 2^{r}  (2^{r}  1) = < algebra > 1 > < number theory! > 0 
ConsecutiveValues property:
For any particular length, the codewords of that length represent a consecutive range of natural numbers. Proof: It suffices to show that, in a lefttoright traversal of the fringe of the tree, any two "consecutive" leaves A and B of the same depth are such that #(P(A)) + 1 = #(P(B)). Let node C be the nearest common ancestor of consecutive leaves A and B. Then the path from C to A (respectively, B) is composed of an edge labeled 0 (resp., 1) followed by r edges labeled 1 (resp., 0), for some r≥0. That is, letting P(C) = x, we have P(A) = x01^{r} and P(B) = x10^{r} for some r≥0. Now we show that #(P(A)) + 1 = #P(B): #(P(A)) + 1 = < P(A) = x01^{r} > #(x01^{r}) + 1 = < Theorem 1 > 2^{r+1}·#(x) + #(01^{r}) + 1 = < Theorem 2a, 2d > 2^{r+1}·#(x) + 2^{r}  1 + 1 = < 1 and 1 cancel; Theorem 2c > 2^{r+1}·#(x) + #(10^{r}) = < Theorem 1 > #(x10^{r}) = < P(B) = x10^{r} > #(P(B)) 
HalfofSuccessor property:
For all k less than the maximum length among codewords, the smallest codeword of length k has value (m+1)/2, where m is the value of the largest codeword of length k+1. Proof: Let A be the leaf corresponding to the largest codeword of length k+1 and B the leaf corresponding to the smallest codeword of length k. First we observe that, by the Consecutive Values Property, A must be the rightmost leaf of depth k+1. As such, A must be the right child of its parent. For suppose instead that it were the left child of its parent. Then either A has no sibling to the right, which contradicts the tree being full, or else A's right sibling is not a leaf, which contradicts the LefterisDeeper property of the tree. Let C be the nearest common ancestor of A and B, and suppose that P(C) = u. Node A must be the rightmost leaf in the left subtree of C and B must be the leftmost leaf in the right subtree of C. Thus, the path from C to A must follow an edge labeled 0, followed by some number r ≥ 0 edges labeled 1, followed by one edge labeled 1. (Recall that A is a right child.) Meanwhile, the path from C to B must follow an edge labeled 1 followed by that same number r edges labeled 0. In other words, there exists some r ≥ 0 such that P(A) = u01^{r+1} and P(B) = u10^{r}. (Note that k = u + r + 1.) To complete the proof, we must show that #(P(B)) = (#(P(A)) + 1) / 2 (#(P(A)) + 1) / 2 = < P(A) = u01^{r+1} > (#(u01^{r+1}) + 1) / 2 = < Theorem 1 > (2^{r+2}·#(u) + #(01^{r+1}) + 1) / 2 = < Theorem 2a, 2d > (2^{r+2}·#(u) + 2^{r+1}  1 + 1) / 2 = < 1 and 1 cancel > (2^{r+2}·#(u) + 2^{r+1}) / 2 = < / distributes over +; 2^{m+1}/2 = 2^{m} > 2^{r+1}·#(u) + 2^{r} = < Theorem 2c > 2^{r+1}·#(u) + #(10^{r}) = < Theorem 1 > #(u10^{r}) = < P(B) = u10^{r} > #(P(B)) 
Generalized HalfofSuccessor property:
If k and d>0 are such that there are codewords of length k and of length k+d, but none of any length strictly in between, then the smallest codeword of length k has value (m+1)/2^{d}, where m is the value of the largest codeword of length k+d. Proof: Let A be the leaf corresponding to the largest codeword of length k+d and B the leaf corresponding to the smallest codeword of length k. Let C be the nearest common ancestor of A and B, and suppose that P(C) = u. Following reasoning similar to that in the proof of the basic version of this theorem, A must be the rightmost leaf in the left subtree of C and B must be the leftmost leaf in the right subtree of C. The depth of A is greater by d than the depth of B, and so the path from C to A must follow an edge labeled 0, followed by r+d edges labeled 1, for some r≥0, while the path from C to B must follow an edge labeled 1 followed by r edges labeled 0. In other words, there exists some r ≥ 0 such that P(A) = u01^{r+d} and P(B) = u10^{r}. (Note that k = u + r + 1.) To complete the proof, we must show that #(P(B)) = (#(P(A)) + 1) / 2^{d} (#(P(A)) + 1) / 2^{d} = < P(A) = u01^{r+d} > (#(u01^{r+d}) + 1) / 2^{d} = < Theorem 1 > (2^{r+d+1}·#(u) + #(01^{r+d}) + 1) / 2^{d} = < Theorem 2a, 2d > (2^{r+d+1}·#(u) + 2^{r+d}  1 + 1) / 2^{d} = < 1 and 1 cancel > (2^{r+d+1}·#(u) + 2^{r+d}) / 2^{d} = < / distributes over +; 2^{m+d}/2^{d} = 2^{m} > (2^{r+1}·#(u) + 2^{r} = < Theorem 2c > 2^{r+1}·#(u) + #(10^{r}) = < Theorem 1 > #(u10^{r}) = < P(B) = u10^{r} > #(P(B)) 
All this is quite interesting, of course, but is there any advantage in employing a set of codewords that arises from a LefterisDeeper tree? Answer: Yes.
Briefly, we list them:
One way to encode the distribution is by indicating the minimum and maximum among the codeword lengths, and then indicating, for each length, the number of codewords of that length. For the example in the figure above, the list would be <2, 6, 1, 4, 3, 1, 2>.
For our ridiculously small example, encoding this list would require a slightly larger number of bits than encoding the tree's structure. But for a realisticsized example, the opposite would usually be the case, although the savings would typically be small.
Details to be provided at some point in time...
The biggest gains to come from using Canonical Huffman Coding are in performing decompression, so we look at those first.
Because of the constrained nature of the codeword set (in particular, the Longer'sPrefixisLesser and ConsecutiveValues properties), it turns out that, in place of storing an explicit representation of the Huffman Tree, all that the decompresser needs are two arrays, minCW[] and CW2Symbol[][]. For each relevant value of i, minCW[i] contains the (numeric) value of the smallest codeword of length i. For each pair of relevant values of i and j, CW2Symbol[i][j] contains the native representation of the symbol to which has been assigned the jth ranked codeword of length i. In a reallife application, a native representation would likely be a byte or a short sequence of bytes (e.g., representing an English word).
For the sake of making our example (as seen in the figure above) concrete, we use the lower case letters a through k to refer to the (native representations of the) eleven symbols in the source alphabet and we assign a codeword to each one. This assignment, as well as the corresponding values of the arrays minCW[] and CW2Symbol[], can be seen in the figure below. Note that each element of minCW[] actually contains the numeric value of a codeword (i.e., #(x) for codeword x), but we also show in parentheses the (binary) codeword itself.
Codeword Symbol 000000 j 000001 g 00001 f 0001 h 0010 c 0011 b 010 d 011 a 100 k 101 i 11 e 
minCW ++ 2  3 (11)  ++ 3  2 (010)  ++ 4  1 (0001)  ++ 5  1 (00001)  ++ 6  0 (000000) ++ 
CW2Symbol ++ 2 'e' +++++ 3 'd''a''k''i' +++++ 4 'h''c''b' ++++ 5 'f' +++ 6 'j''g' +++ 
Of course, the decompresser must make use of the metadata at the beginning of the compressed file to construct these arrays. (How that is accomplished is addressed later.) Having done that, its job is to carry out this highlevel algorithm:
while (hasMoreBits()) { BitString x := emptyString; do { x.append(nextBit()); // append next bit onto rear of x } while (!isCodeword(x)); emit nativeRepOf(x); // emit the native representation of } // the symbol whose codeword is x 
What is not obvious is how to implement isCodeword() and nativeRepOf() making use of nothing but the data stored in arrays minCW[] and CW2Symbol[][].
The solutions to these two problems rely, respectively, upon the guarantees that the set of codewords possesses the Longer'sPrefixisLesser and ConsecutiveValues properties!
To illustrate how we can tell, as bits are appended to x, when it has finally become equal to some codeword, suppose that z is a codeword and let z_{k} be the prefix of z of length k, for all k in the range 1..z. By the LongerisLesser property of the codewords, we have that #(z_{i}) < minCW[i] for all i<z. Trivially, we also have that #(z) ≥ minCW[z]. That is, every proper prefix z' of z is numerically less than the smallest codeword of length z', but z itself is (obviously) numerically greater than or equal to the smallest codeword of length z. Thus, as bits are appended to x during execution of the algorithm, we know that its value has become a codeword upon the condition #(x) < minCW(x) becoming false. Moreover, because of the ConsecutiveValues property, we know at that point that the symbol whose codeword is x is the one in CW2Symbol[x][j], where j = #(x)  minCW(x).
Here, then, is a more concrete version of the decompression algorithm. Rather than using variable x to store the bit string consumed so far (which is necessarily the prefix of some codeword), variables v and len are used, where v = #(x) and len = x.
while (hasMoreBits()) { v := 0; len := 0; // loop invariant: // Let x be the bit string consumed so far (during current iteration of outer loop) // Then #(x) = v ∧ len = x ∧ no prefix of x is a codeword do { v := 2*v + nextBit(); len := len+1; } while (v < minCW[len]); // v is a codeword of length len j := v  minCW[len]; emit(CW2Symbol[len][j]); } 
One issue that must be addressed is what value to place into minCW[k] in case there are no codewords of length k. Of course, that value must be greater than that of the largest prefix of length k of any codeword having length greater than k. One solution is to set minCW[k] to 2^{k}, as every bit string of length k has a smaller value. Later we will see that, in order to avoid having to treat any lengths (except for the maximum codeword length) as special cases, we can fill the values of minCW[] like this, where r_{i} is the number of codewords of length i:
minCW[maxLen] := 0; k := maxLen1; while (k != 0) { minCW[k] := (minCW[k+1] + r_{k+1} + 1) / 2; k := k1 } 
Note that this calculation of minCW[k] is consistent with the ConsecutiveValues and HalfofSuccessor properties. In particular, the m mentioned in the HalfofSuccessor property, which refers to the largest codeword of length k+1, corresponds to minCW[k+1] + r_{k+1} because of the ConsecutiveValues property.
In order to show that these calculations work out when there are lengths for which there are no codewords requires a little bit of work.
We will assume, of course, that the file produced by the compresser begins with metadata describing a symboltocodeword mapping that is consistent with a LefterisDeeper Huffman Tree. We consider two possible ways in which the metadata might describe that mapping, one in which the Huffman tree is described explicitly and the other in which it is described implicitly. (Both of these possibilities were mentioned earlier.)
Following that would be a list of the native codes of the symbols, going from the symbol with the lexicographically smallest codeword (corresponding to the tree's leftmost leaf) to the one with the largest (corresponding to the tree's rightmost leaf). Of course, this list of native codes would have to be parsable, meaning that the boundaries between the elements could be determined algorithmically. (If the native codes are of a known fixed length, that would not be a problem; otherwise one could precede each native code with a length indicator in EliasGamma form, for example.) Here we are not concerned with the details of how to encode the list of native codes, however.