Thread: Trying to figure out DEFLATE

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    30

    Trying to figure out DEFLATE

    The part I'm having trouble with is decoding the Huffman codes when they're specified in the header. This is done by the code lengths, and I get most of the process. The part I don't get, is the application of DEFLATE's two extra rules to make decoding using just the lengths possible. To quote the RFC:

    * All codes of a given bit length have lexicographically
    consecutive values, in the same order as the symbols
    they represent;

    * Shorter codes lexicographically precede longer codes.
    The part I don't get is the second rule. Again, quoting the RFC:

    We could recode the example above to follow this rule as
    follows, assuming that the order of the alphabet is ABCD:

    Symbol Code
    ------ ----
    A 10
    B 0
    C 110
    D 111

    I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are
    lexicographically consecutive.
    Shouldn't B be before A, seeing that it's shorter than A? How is 0 preceding 10? It's shown again in another example in the RFC:

    Symbol Length Code
    ------ ------ ----
    A 3 010
    B 3 011
    C 3 100
    D 3 101
    E 3 110
    F 2 00
    G 4 1110
    H 4 1111
    Shouldn't the length 2 be at the top, making 00 the code for A, and so on? I don't get it.

    EDIT: Sorry, formatting didn't work as expected.

  2. #2
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    The shortest code goes with the symbol used the most.
    The listing you posted was in alphabetical order; not the lexicographically order mentioned in the RFC.

    Huffman coding - Wikipedia, the free encyclopedia

    Tim S.
    Last edited by stahta01; 11-13-2011 at 09:21 AM.

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by stahta01 View Post
    The listing you posted was in alphabetical order; not the lexicographically order mentioned in the RFC.
    Umm -- just for clarity, alphabetical and lexicographical are same thing:

    lexicographical order

    So the difference is that the lexicographical/alphabetical order used in the table is not the same as the lexicographical/alphabetical order "mentioned in the RFC", which is potentially confusing considering the table is in the RFC :\

    Ie, the table is ordered with the left hand column, but discussed in terms of ordering with the right hand column. RFCs can be very obtuse. Not that I know anything much about Huffman encoding.
    Last edited by MK27; 11-13-2011 at 10:03 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by MK27 View Post
    Umm -- alphabetical and lexicographical are same thing:

    lexicographical order

    So the difference is that the lexicographical/alphabetical order used in the table is not the same as the lexicographical/alphabetical order "mentioned in the RFC".

    Ie, the table is ordered with the left hand column, but discussed in terms of ordering with the right hand column. RFCs can be very obtuse. Not that I know anything much about Huffman encoding.
    Visually version of above remark

    alphabetical/lexicographical of column 1
    Code:
    Symbol Code
    ------    ----
    A         10
    B         0
    C         110
    D         111
    alphabetical/lexicographical of column 2
    Code:
    Symbol Code
    ------    ----
    B         0
    A         10
    C         110
    D         111
    There are not the same in the RFC because they were likely talking about sorting by different columns.

    Tim S.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Cant figure it out.
    By Coder87C in forum C++ Programming
    Replies: 6
    Last Post: 05-20-2005, 01:47 PM
  2. Can't figure out!
    By bear_26 in forum C Programming
    Replies: 3
    Last Post: 10-30-2002, 08:31 AM
  3. Can't figure this out
    By dat in forum C Programming
    Replies: 3
    Last Post: 10-24-2002, 01:11 AM
  4. Can't Figure this out
    By Unregistered in forum C++ Programming
    Replies: 7
    Last Post: 07-08-2002, 03:33 PM
  5. Can't figure this out
    By Unregistered in forum C Programming
    Replies: 4
    Last Post: 06-04-2002, 07:58 AM