Thread: Increment gone wrong

  1. #16
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    Re-reading the spec, now I understand it and I see I haven't honored that "all code lengths form a single sequence"

    Ah well, better fix it up...

  2. #17
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Quote Originally Posted by hamster_nz View Post
    Re-reading the spec, now I understand it and I see I haven't honored that "all code lengths form a single sequence"

    Ah well, better fix it up...
    Dunno if it's of value to you but this is where I've got with just the deflation of the huffman data, currently got a bug with UIDs, specifically with getting the right length for length/literal symbols. As soon as that's fixed the only thing left to do is implement the look back through deflated data part of the code.

    Also since you're working on PNGs anyways this resource should be helpful (don't remember if I mentioned it in a previous post but here it is anyways):

    PngSuite - the official set of PNG test images
    Attached Files Attached Files

  3. #18
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    I ran you code against a test file, and got this output:


    Code:
    main.c:281: symbols[  0] (0x5560170045f0): src =   0, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0010
    main.c:281: symbols[  4] (0x556017004690): src =   4, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0011
    main.c:281: symbols[  5] (0x5560170046b8): src =   5, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0100
    main.c:281: symbols[  6] (0x5560170046e0): src =   6, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0101
    main.c:281: symbols[  7] (0x556017004708): src =   7, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 110
    main.c:281: symbols[  8] (0x556017004730): src =   8, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 2, uid = 10
    main.c:281: symbols[  9] (0x556017004758): src =   9, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 111
    main.c:281: symbols[ 10] (0x556017004780): src =  10, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0110
    main.c:281: symbols[ 11] (0x5560170047a8): src =  11, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 000

    The first shortest Hufman code should be always be all zeros.

    So assuming your length table is correct, here's how they should be generated:


    We start with "length 2" codes - "00" -

    Assign to the length 2 codes, in order they appear in the table:


    Code:
    main.c:281: symbols[  0] len = 4, 
    main.c:281: symbols[  4] len = 4, 
    main.c:281: symbols[  5] len = 4, 
    main.c:281: symbols[  6] len = 4, 
    main.c:281: symbols[  7] len = 3, 
    main.c:281: symbols[  8] len = 2, 00
    main.c:281: symbols[  9] len = 3, 
    main.c:281: symbols[ 10] len = 4, 
    main.c:281: symbols[ 11] len = 3,
    At this point next symbol that would be assigned is "01". Add a '0' to it and do the threes:

    Code:
    main.c:281: symbols[  0] 
    main.c:281: symbols[  4]
    main.c:281: symbols[  5]
    main.c:281: symbols[  6]
    main.c:281: symbols[  7] 010
    main.c:281: symbols[  8] 00
    main.c:281: symbols[  9] 011
    main.c:281: symbols[ 10]
    main.c:281: symbols[ 11] 100
    At this point next symbol that would be assigned is "101". Add a '0' to it and do the fours:

    Then do the same again (Lengths 4):

    Code:
    main.c:281: symbols[  0] 1010
    main.c:281: symbols[  4] 1011
    main.c:281: symbols[  5] 1100
    main.c:281: symbols[  6] 1101
    main.c:281: symbols[  7] 010
    main.c:281: symbols[  8] 00
    main.c:281: symbols[  9] 011
    main.c:281: symbols[ 10] 1110
    main.c:281: symbols[ 11] 100
    All done (but we seem to be one code short (1111)!
    Last edited by hamster_nz; 07-28-2021 at 05:16 PM.

  4. #19
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    Here's my code for that, with error checking removed for clarity:

    Code:
    void codetable_generate(struct Codetable_entry *table, size_t len) {
       uint32_t next_code = 0;
       for(uint32_t pass_len = 1; pass_len < 16; pass_len++) {
          for(size_t i = 0; i < len; i++) {
             if(table[i].length == pass_len) {
                table[i].code = reverse(next_code,pass_len);
                next_code++;
             }
          }
          next_code <<= 1;
       } 
    }
    'reverse(x,y)' just flips the last 'y' bits around in 'x' - i.e. reverses the bit ordering in the lowest 'y' bits.

  5. #20
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    I *think* this is what you want in BuildCodes():

    Code:
    void BuildCodes( BUFF *Symbols, uint longest )
    {
            uint uid = 0;
            HUFFMAN_SYMBOL *symbols = Symbols->addr;
            for ( uint len = 1; len <= longest; ++len )
            {
                    for ( uint i = 0; i < Symbols->used; ++i )
                    {
                            if ( symbols[i].len == len )
                               symbols[i].uid = uid++;
                    }
                    uid <<= 1;
            }
    }
    Seems to be better:

    Code:
    main.c:281: Hidden Symbols, used = 13
    main.c:281: symbols[  0] (0x55d7b4b7a5f0): src =   0, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 1010
    main.c:281: symbols[  4] (0x55d7b4b7a690): src =   4, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 1011
    main.c:281: symbols[  5] (0x55d7b4b7a6b8): src =   5, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 1100
    main.c:281: symbols[  6] (0x55d7b4b7a6e0): src =   6, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 1101
    main.c:281: symbols[  7] (0x55d7b4b7a708): src =   7, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 010
    main.c:281: symbols[  8] (0x55d7b4b7a730): src =   8, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 2, uid = 00
    main.c:281: symbols[  9] (0x55d7b4b7a758): src =   9, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 011
    main.c:281: symbols[ 10] (0x55d7b4b7a780): src =  10, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 1110
    main.c:281: symbols[ 11] (0x55d7b4b7a7a8): src =  11, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 100
    main.c:285: Hidden Symbols, used = 13
    Bit ordering might need to be reversed - I didn't look too far into it..

  6. #21
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Quote Originally Posted by hamster_nz View Post
    I ran you code against a test file, and got this output:


    Code:
    main.c:281: symbols[  0] (0x5560170045f0): src =   0, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0010
    main.c:281: symbols[  4] (0x556017004690): src =   4, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0011
    main.c:281: symbols[  5] (0x5560170046b8): src =   5, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0100
    main.c:281: symbols[  6] (0x5560170046e0): src =   6, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0101
    main.c:281: symbols[  7] (0x556017004708): src =   7, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 110
    main.c:281: symbols[  8] (0x556017004730): src =   8, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 2, uid = 10
    main.c:281: symbols[  9] (0x556017004758): src =   9, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 111
    main.c:281: symbols[ 10] (0x556017004780): src =  10, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 4, uid = 0110
    main.c:281: symbols[ 11] (0x5560170047a8): src =  11, use = true, sym = '' (  0), lit = '' (  0), get = 0, len = 3, uid = 000

    The first shortest Hufman code should be always be all zeros.

    So assuming your length table is correct, here's how they should be generated:


    We start with "length 2" codes - "00" -

    Assign to the length 2 codes, in order they appear in the table:


    Code:
    main.c:281: symbols[  0] len = 4, 
    main.c:281: symbols[  4] len = 4, 
    main.c:281: symbols[  5] len = 4, 
    main.c:281: symbols[  6] len = 4, 
    main.c:281: symbols[  7] len = 3, 
    main.c:281: symbols[  8] len = 2, 00
    main.c:281: symbols[  9] len = 3, 
    main.c:281: symbols[ 10] len = 4, 
    main.c:281: symbols[ 11] len = 3,
    At this point next symbol that would be assigned is "01". Add a '0' to it and do the threes:

    Code:
    main.c:281: symbols[  0] 
    main.c:281: symbols[  4]
    main.c:281: symbols[  5]
    main.c:281: symbols[  6]
    main.c:281: symbols[  7] 010
    main.c:281: symbols[  8] 00
    main.c:281: symbols[  9] 011
    main.c:281: symbols[ 10]
    main.c:281: symbols[ 11] 100
    At this point next symbol that would be assigned is "101". Add a '0' to it and do the fours:

    Then do the same again (Lengths 4):

    Code:
    main.c:281: symbols[  0] 1010
    main.c:281: symbols[  4] 1011
    main.c:281: symbols[  5] 1100
    main.c:281: symbols[  6] 1101
    main.c:281: symbols[  7] 010
    main.c:281: symbols[  8] 00
    main.c:281: symbols[  9] 011
    main.c:281: symbols[ 10] 1110
    main.c:281: symbols[ 11] 100
    All done (but we seem to be one code short (1111)!
    I did say there was a bug with UIDs, thank you for the code you provided after this though

  7. #22
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Quote Originally Posted by hamster_nz View Post
    Here's my code for that, with error checking removed for clarity:

    Code:
    void codetable_generate(struct Codetable_entry *table, size_t len) {
       uint32_t next_code = 0;
       for(uint32_t pass_len = 1; pass_len < 16; pass_len++) {
          for(size_t i = 0; i < len; i++) {
             if(table[i].length == pass_len) {
                table[i].code = reverse(next_code,pass_len);
                next_code++;
             }
          }
          next_code <<= 1;
       } 
    }
    'reverse(x,y)' just flips the last 'y' bits around in 'x' - i.e. reverses the bit ordering in the lowest 'y' bits.
    Colour me surprised that such a simple change would fix all of them (I had fixed the code & length UIDs but distance ones got mess up, this resolved that a lot better), thanks

  8. #23
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Hit a crash with the final section of my code, since I need to do some shopping I figured I dump it here and see if I get lucky by the time I get back and someone both felt like looking through it and found the bug. If you do look I recommend just downloading the attached file, compile and search for the "Deflating data..." string, at least you'll have the syntax highlighting then.

    A reminder of the document I was refering to when making this code:
    Understanding gzip

    The Code:
    Code:
    			i = 0; j = 0;
    			while ( stream.num < stream.max )
    			{
    				leng_symbol = IdentifyNextSymbol( &stream, &Tree, leng_longest, true );
    
    				if ( !leng_symbol )
    				{
    					PrintStreamDetails( &stream );
    					return Return( ret, EINVAL );
    				}
    
    				//puts("Found:\n");
    				PrintSymbol( leng_symbol, __LINE__, leng_symbol->uid );
    
    				if ( leng_symbol->sym < 256 )
    				{
    					j = i++;
    					lazy[j] = leng_symbol->sym;
    				}
    				else
    				{
    					uint lookup, stop;
    					dist_symbol = IdentifyNextSymbol( &stream, &Dist, dist_longest, true );
    
    					if ( !dist_symbol )
    					{
    						PrintStreamDetails( &stream );
    						return Return( ret, EINVAL );
    					}
    
    					lookup = CopyStreamBits( &stream, dist_symbol->get, true );
    					lookup += hufd_extra_bits_array[dist_symbol->src];
    					lookup = j - lookup;
    
    					for ( j = 0; j < dist_symbol->len; ++j )
    						lazy[i++] = lazy[lookup++];
    
    					j = i - 1;
    
    					PrintSymbol( dist_symbol, __LINE__, dist_symbol->uid );
    				}
    			}
    The Output:
    Code:
    ./a.out aba.gz
    path = 'aba.gz'
    PrintStreamDetails( 0x7fffcc29b530 ): ptr = 0x56257145d480, pos = 0, num = 0, max = 328, fed = 0, got =
    PrintBytes( 0x56257145d480, 41, 16 )
    1F 8B  8  0  0  0  0  0  0  3 1D C6 49  1  0  0
    40 C0 AC A3 7F 88 3D 3C 20 2A 97 9D 37 5E 1D  C
    29 34 94 23  0  0  0
    PrintStreamDetails( 0x7fffcc29b530 ): ptr = 0x56257145d480, pos = 136, num = 80, max = 328, fed = 56, got = 00010000000000000000000000000001010010011100011000011101
    gzip.magic = 1F8B, gzip.format =  8, gzip.flags =  0, gzip.mtime =        0, zlib.xflags =        0, zlib.system =        3
    gzip.flag_TEXT = false, gzip.flag_HCRC = false, gzip.flag_MORE = false, gzip.flag_NAME = false, gzip.flag_NOTE = false, gzip.flag_RESERVED = 0
    PrintStreamDetails( 0x7fffcc29b530 ): ptr = 0x56257145d480, pos = 136, num = 80, max = 296, fed = 56, got = 00010000000000000000000000000001010010011100011000011101
    last = true, type = 2
    lengc = 260, distc = 7, codec = 18, left = 60
    pos = 212, max = 296, count = 267
    left = 61, byte = 18, bit = 7
    main.c:757: Code Symbols, used = 16
    main.c:757: symbols[  0] (0x5625714604d0): src =   2, use = true, sym = '' (  0), lit = '' (  0), get = 0, cpy = 0, len = 1, uid = 0
    main.c:757: symbols[  2] (0x562571460520): src =  18, use = true, sym = '' (  0), lit = '' (  0), get = 7, cpy = 11, len = 2, uid = 10
    main.c:757: symbols[ 12] (0x5625714606b0): src =   1, use = true, sym = '' (  0), lit = '' (  0), get = 0, cpy = 0, len = 4, uid = 1100
    main.c:757: symbols[ 13] (0x5625714606d8): src =   4, use = true, sym = '' (  0), lit = '' (  0), get = 0, cpy = 0, len = 4, uid = 1101
    main.c:757: symbols[ 14] (0x562571460700): src =  16, use = true, sym = '' (  0), lit = '' (  0), get = 2, cpy = 3, len = 4, uid = 1110
    main.c:757: symbols[ 15] (0x562571460728): src =  17, use = true, sym = '' (  0), lit = '' (  0), get = 3, cpy = 3, len = 4, uid = 1111
    main.c:878: RdHuffmanLengDistCodes( 0x7fffcc29b530, 0x56256f60e420, 0x56256f60e440, 260, 0 )
    main.c:771: Length Symbols, used = 16
    main.c:771: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:771: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:771: symbols[ 12] (0x5625714609b0): src = 256, use = true, sym = '' (256), lit = '' (256), get = 0, cpy = 0, len = 4, uid = 1100
    main.c:771: symbols[ 13] (0x5625714609d8): src = 257, use = true, sym = '' (257), lit = '' (257), get = 2, cpy = 0, len = 4, uid = 1101
    main.c:771: symbols[ 14] (0x562571460a00): src = 258, use = true, sym = '' (257), lit = '' (258), get = 2, cpy = 0, len = 4, uid = 1110
    main.c:771: symbols[ 15] (0x562571460a28): src = 259, use = true, sym = '' (257), lit = '' (259), get = 2, cpy = 0, len = 4, uid = 1111
    main.c:878: RdHuffmanLengDistCodes( 0x7fffcc29b530, 0x56256f60e420, 0x56256f60e460, 7, 260 )
    main.c:785: Distance Symbols, used = 4
    main.c:785: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:785: symbols[  1] (0x562571460a88): src =   4, use = true, sym = '' (264), lit = '' (264), get = 0, cpy = 0, len = 2, uid = 01
    main.c:785: symbols[  2] (0x562571460ab0): src =   5, use = true, sym = '' (265), lit = '' (265), get = 0, cpy = 0, len = 2, uid = 10
    main.c:785: symbols[  3] (0x562571460ad8): src =   6, use = true, sym = '' (266), lit = '' (266), get = 0, cpy = 0, len = 2, uid = 11
    main.c:806: main() Deflating data...
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    double free or corruption (out)
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[ 14] (0x562571460a00): src = 258, use = true, sym = '' (257), lit = '' (258), get = 2, cpy = 0, len = 4, uid = 1110
    main.c:847: symbols[  1] (0x562571460a88): src =   4, use = true, sym = '' (264), lit = '' (264), get = 0, cpy = 0, len = 2, uid = 01
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[ 13] (0x5625714609d8): src = 257, use = true, sym = '' (257), lit = '' (257), get = 2, cpy = 0, len = 4, uid = 1101
    main.c:847: symbols[  1] (0x562571460a88): src =   4, use = true, sym = '' (264), lit = '' (264), get = 0, cpy = 0, len = 2, uid = 01
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[ 15] (0x562571460a28): src = 259, use = true, sym = '' (257), lit = '' (259), get = 2, cpy = 0, len = 4, uid = 1111
    main.c:847: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:820: symbols[ 12] (0x5625714609b0): src = 256, use = true, sym = '' (256), lit = '' (256), get = 0, cpy = 0, len = 4, uid = 1100
    main.c:847: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:820: symbols[ 15] (0x562571460a28): src = 259, use = true, sym = '' (257), lit = '' (259), get = 2, cpy = 0, len = 4, uid = 1111
    main.c:847: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[  2] (0x562571460820): src =  98, use = true, sym = 'b' ( 98), lit = 'b' ( 98), get = 0, cpy = 0, len = 2, uid = 10
    main.c:820: symbols[ 14] (0x562571460a00): src = 258, use = true, sym = '' (257), lit = '' (258), get = 2, cpy = 0, len = 4, uid = 1110
    main.c:847: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[ 12] (0x5625714609b0): src = 256, use = true, sym = '' (256), lit = '' (256), get = 0, cpy = 0, len = 4, uid = 1100
    main.c:847: symbols[  0] (0x562571460a60): src =   0, use = true, sym = '' (260), lit = '' (260), get = 0, cpy = 0, len = 2, uid = 00
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[  0] (0x5625714607d0): src =  97, use = true, sym = 'a' ( 97), lit = 'a' ( 97), get = 0, cpy = 0, len = 1, uid = 0
    main.c:820: symbols[ 14] (0x562571460a00): src = 258, use = true, sym = '' (257), lit = '' (258), get = 2, cpy = 0, len = 4, uid = 1110
    main.c:847: symbols[  1] (0x56257
    make: *** [GNUmakefile:2: run] Aborted (core dumped)
    Compilation failed.
    Attached Files Attached Files

  9. #24
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Judging by the lack of replies I take it no one felt like it, no biggie, eventually "fixed" it, I say fixed because I got passed the section only to encounter a crash during the attempt to free memory on the way out, here's my current code (with mini makefile and the test archives)

    deflate.zip - Google Drive

  10. #25
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    I just tested it with a gz file I made with "echo test | gzip > a.gz" and it crashed out.

    Cause was around line 680 - with a buffer overflow.

    Code:
                                            lengi = RevBits( CopyStreamBits( &stream, 7, true ), 7 ) - 1;
    
    
                                            bits = leng_extra_bits_array[lengi];
    lengi is unsigned, so if CopyStreamBits returns 0, then lengi gets set to 4294967295, and of course leng_extra_bits_array[4294967295] is a crash.

  11. #26
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,392
    Quote Originally Posted by hamster_nz View Post
    I just tested it with a gz file I made with "echo test | gzip > a.gz" and it crashed out.

    Cause was around line 680 - with a buffer overflow.

    Code:
                                            lengi = RevBits( CopyStreamBits( &stream, 7, true ), 7 ) - 1;
    
    
                                            bits = leng_extra_bits_array[lengi];
    lengi is unsigned, so if CopyStreamBits returns 0, then lengi gets set to 4294967295, and of course leng_extra_bits_array[4294967295] is a crash.
    Strange, that should only be reached under a type 1 scenario, the one used for the "hello hello hello hello" test (hello.gz in the zip I provided), well either way I should probably add a check that it's in bounds anyways

    Edit:
    For now, to get the output, add this after the assignment:
    Code:
    					if ( lengi >= 32 )
    						return Return( ret, err );
    Edit 2: btw I'm going to bed in case anyone responds after I've shut the computer down (need to reset after an update anyways)

  12. #27
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    Oh, thanks for that link to the test images... It is proving invaluable.

    One thing I find really fascinating about zlib compression is that nowhere in the file can you point to a set of bits and say "those bits encode that letter" - so if the compressed file contains a capital A there never is 8 bits that represent the capital A symbol.

    It's all indirectly implied.

  13. #28
    Registered User
    Join Date
    Sep 2020
    Posts
    80
    engi is unsigned, so if CopyStreamBits returns 0, then lengi gets set to 4294967295, and of course leng_extra_bits_array[4294967295] is a crash.
    What are you talking about? 0 fits in an unsigned int.
    Code:
    #include <stdio.h>
    
    int f()
    {
        return 0;
    }
    
    
    int main()
    {
        unsigned int lengi = f();
        
        printf("lengi: %u", lengi);
    }
    Output:
    lengi: 0

  14. #29
    Registered User
    Join Date
    Sep 2020
    Posts
    288
    Quote Originally Posted by thmm View Post
    What are you talking about? 0 fits in an unsigned int.
    Code:
    #include <stdio.h>
    
    int f()
    {
        return 0;
    }
    
    
    int main()
    {
        unsigned int lengi = f();
        
        printf("lengi: %u", lengi);
    }
    Output:
    lengi: 0
    You did look at the code included in the post, where one is subtracted from the value returned before it is assigned?

  15. #30
    Registered User
    Join Date
    Sep 2020
    Posts
    80
    No I didn't. I just saw the comment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. hex increment
    By davidx in forum C Programming
    Replies: 3
    Last Post: 10-19-2019, 07:06 AM
  2. Two pre increment in one expression
    By h255874 in forum C Programming
    Replies: 4
    Last Post: 09-21-2019, 08:47 AM
  3. Post Increment an Pre Increment operators in c++
    By anil_ in forum C++ Programming
    Replies: 4
    Last Post: 11-12-2011, 08:27 PM
  4. can't get loop to increment
    By rivkyfried1 in forum C Programming
    Replies: 2
    Last Post: 10-11-2010, 04:03 AM
  5. Post increment and pre increment help
    By noob2c in forum C++ Programming
    Replies: 5
    Last Post: 08-05-2003, 03:03 AM

Tags for this Thread