Thread: Increment gone wrong

  1. #31
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Yeah, I prefer a different compression I thought up that uses some similar concepts but is a LOT easier to deal with, basically the 1st 64 bits should read ".HUFFMAN" to make it clear that it's huffman type compression, the nibble that follows should have 1 added and be multiplied against 4 to get the size of each element, the next 12 bit are reserved for flags I haven't decided yet, the next bits (size decided by aforementioned (nibble + 1) * 4) are for the expected number of elements in the resulting data, I think I'll probably just make it a flag holder, next follows a list of indices (their size is as previously explained with the (nibble + 1) * 4) with a prefix bit that says to whether or not to use everything from the index itself to what comes before the next index provided so the shortest list could be 1 0000 1 1111, those 2 are the only required indices, these indices have a 1:1 relationship with the values to add into the buffer later, the huffman codes are to be generated in the same order, until the last index is reached, whatever your final next code is is gonna be the code of the last index anyways, it will be considered uncompressed data if more than a qtr of the available indices are used before getting the huffman codes in which case every value from here on until will be for the buffer or in the case of compressed data huffman tree to pull the value to put into the buffer, the will be only one prefix bit to these values, that tells you to either use it as uncompressed data/huffman code or to read 2 counts (again size is determine by (nibble + 1) * 4) to 1. minus from current value count and 2. begin copying existing values upto the 2nd count read

    So "hello hello hello hello" could result in something like this (I've given interpretations, not bit representations in most cases):
    ".HUFFMAN" 10 00 24 [1 '\0', 0 1, 1 ' ', 0 33, 1 'e', 0 'f', 1 'h', 0 'i', 1 'l', 0 'm', 1 'o', 0 'p', 0 255] 0 'h' 0 'e' 0 'l' 0 'l' 0 'o' 0 ' ' 1 6 17 0 '\0'

    I still need to do a bit more thinking on it but I like it better than zlib, it's a lot clearer what is what after all

  2. #32
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    Quote Originally Posted by awsdert View Post
    Yeah, I prefer a different compression I thought up that uses some similar concepts but is a LOT easier to deal with, basically the 1st 64 bits should read ".HUFFMAN" to make it clear that it's huffman type compression, the nibble that follows should have 1 added and be multiplied against 4 to get the size of each element, the next 12 bit are reserved for flags I haven't decided yet, the next bits (size decided by aforementioned (nibble + 1) * 4) are for the expected number of elements in the resulting data, I think I'll probably just make it a flag holder, next follows a list of indices (their size is as previously explained with the (nibble + 1) * 4) with a prefix bit that says to whether or not to use everything from the index itself to what comes before the next index provided so the shortest list could be 1 0000 1 1111, those 2 are the only required indices, these indices have a 1:1 relationship with the values to add into the buffer later, the huffman codes are to be generated in the same order, until the last index is reached, whatever your final next code is is gonna be the code of the last index anyways, it will be considered uncompressed data if more than a qtr of the available indices are used before getting the huffman codes in which case every value from here on until will be for the buffer or in the case of compressed data huffman tree to pull the value to put into the buffer, the will be only one prefix bit to these values, that tells you to either use it as uncompressed data/huffman code or to read 2 counts (again size is determine by (nibble + 1) * 4) to 1. minus from current value count and 2. begin copying existing values upto the 2nd count read

    So "hello hello hello hello" could result in something like this (I've given interpretations, not bit representations in most cases):
    ".HUFFMAN" 10 00 24 [1 '\0', 0 1, 1 ' ', 0 33, 1 'e', 0 'f', 1 'h', 0 'i', 1 'l', 0 'm', 1 'o', 0 'p', 0 255] 0 'h' 0 'e' 0 'l' 0 'l' 0 'o' 0 ' ' 1 6 17 0 '\0'

    I still need to do a bit more thinking on it but I like it better than zlib, it's a lot clearer what is what after all
    I feel there has been a lot of research put into zlib's scheme that shouldn't be discounted so quickly. For example, the way the code tables are bootstrapped is pretty neat and effective.

    Or for another perspective, it's always good to implement following an existing standard, as it means you don't have to write the documentation

  3. #33
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Why'd you think I copied some of the concepts? For example the look back feature, or the huffman code usage, there are places for storing the huffman codes sure, and that's part of the reason I left room for flags but in cases like this it is better not to store them at all, for the example of "hello hello hello hello" it is is also possible with my algorithm to go so small that it's almost just the string itself, for example:

    ".HUFFMAN" 10 01 24 0 'h' 0 'e' 0 'l' 0 'l' 0 'o' 0 ' ' 1 6 17 0 '\0'

    64 + 4 + 12 + 8 + (8 * 1) + (7 * 8) + (2 * 8) results in 168 bits or 21 bytes, literally smaller than the source string which is something not even the zlib compression can do, there's even room for me to add another 3 bytes for extra flags if I deem necessary, and thinking about it I just decided on a method for psuedo infinite flags, just use a flag in that 1st nibble to say more flags expected, if I move the psuedo NO_LIST flag I cooked up in the example I just provided to the first nibble of flags then I can further reduce the size to 20 bytes, then if I introduce a pseudo NO_SIZE flag to that 1st nibble also then I can further reduce the bytes to 19 bytes, whereas gzip needed a whole 29 bytes which is contrary the purpose of compression

    Edit: For streams that can't get an EOF error a lookup at the end can be introduced that is a simple 1 0000 0000, which basically says "because we are copying 0 values stop reading the stream here".

    Edit 2: As another space saving if ".HUF" is available as a "magic number" then I can further reduce that 19 to 15 bytes, nearly half the original 24 bytes

  4. #34
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Fixed the crash, now my problem is missing data, this is what I get versus what I should get:
    Code:
    abaabbbaba       babaaaaba    bbbaa
    abaabbbabaababbaababaaaabaaabbbbbaa
    Attached Files Attached Files

  5. #35
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    zlib encoding for "hello hello hello hello" would be

    3 bits - list block, uses default codeblook.
    8 bits - h
    8 bits - e
    8 bits - l
    8 bits - l
    8 bits - o
    8 bits - space
    8 bits - Symbol 268 - repeat of length 17, including the extra bit
    6 bits - distance code 4, distance of 6 bytes, including one extra bit
    7 bits - End of block code

    That makes 72 bits (9 bytes). to encode 184 bits (23 bytes).

    Whatever metadata you want to wrap around it is pretty much a moot point - if it is 10 or 15 bytes won't make a difference when you are encoding things longer than a tens of bytes.
    Last edited by hamster_nz; 07-30-2021 at 10:29 PM.

  6. #36
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    PNG decoding progress going quite well - all the Basic images in PngSuite convert correctly to PPM.

    Might look at interlacing tonight..
    Attached Images Attached Images Increment gone wrong-success-jpg 

  7. #37
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    zlib encoding for "hello hello hello hello" would be

    3 bits - list block, uses default codeblook.
    8 bits - h
    8 bits - e
    8 bits - l
    8 bits - l
    8 bits - o
    8 bits - space
    8 bits - Symbol 268 - repeat of length 17, including the extra bit
    6 bits - distance code 4, distance of 6 bytes, including one extra bit
    7 bits - End of block code

    That makes 72 bits (9 bytes). to encode 184 bits (23 bytes).

    Whatever metadata you want to wrap around it is pretty much a moot point - if it is 10 or 15 bytes won't make a difference when you are encoding things longer than a tens of bytes.
    I was more thinking about the resulting file as a whole, sure there are cases where zlib is more appropriate, I was simply saying in this case picking zlib inside gzip results in wasted space whereas the encoding I thought up saves space as an archive (which is obviously the point of an archive, if I were to convert it to just an encoding gzip could make use of the scrapping the remain 4 bytes of the "magic number" it would bring the size down to 11 bytes, then if ( disinclude the null character as you did then it goes further to just barely under 11 bytes I think

    4 + 4 + (7 * 1) + (6 * 8) + (2 * 8) = 4 + 4 + 7 + 48 + 16 = 79 / 8 = 9 r7 = 10 bytes

    If I also introduce a further flag that tells of a multiplier for look back / copy bit lengths and reduce their default size to 4 bits also then I save another byte giving the same 9 bytes in a simpler form at 71 bits, if you restrain yourself by saying that further efficiency is impossible, you'll never find improvements.

    Edit: BTW I did the math after stating barely under 11 bytes

  8. #38
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Since you've managed to implement the deflate algorithm correctly (unless I've misunderstood you somewhere) I'll ask you, can you see where I'm going wrong here?

    Output of length/distance construction:
    Code:
    main.c:782: Code Symbols, foresee =  18, longest =  4, have =  32, used = 16
    main.c:782: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = '?', lit = 'ÿ' (  0), uid = 0
    main.c:782: symbols[  2] (0x5598b41d1528): src =  18, use = true, sym =   0, get = 7, cpy = 11, len = 2, name = '?', lit = 'ÿ' (  0), uid = 10
    main.c:782: symbols[ 12] (0x5598b41d16e0): src =   1, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1100
    main.c:782: symbols[ 13] (0x5598b41d170c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1101
    main.c:782: symbols[ 14] (0x5598b41d1738): src =  16, use = true, sym =   0, get = 2, cpy = 3, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1110
    main.c:782: symbols[ 15] (0x5598b41d1764): src =  17, use = true, sym =   0, get = 3, cpy = 3, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1111
    main.c:909: RdHuffmanLengDistCodes( 0x7fff047ee6a0, 0x5598b397a4c0, 0x5598b397a500, 0 )
    main.c:958: symbols[ 12] (0x5598b41d16e0): src =   1, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 1100
    main.c:970: symbols[ 97] (0x5598b41d2b0c): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'Leng Symbol', lit = 'a' ( 97), uid = 0
    main.c:958: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 0
    main.c:970: symbols[ 98] (0x5598b41d2b38): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'Leng Symbol', lit = 'b' ( 98), uid = 00
    main.c:958: symbols[ 13] (0x5598b41d170c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 1101
    main.c:970: symbols[256] (0x5598b41d4660): src = 256, use = true, sym =   0, get = 0, cpy = 4, len = 4, name = 'Leng Symbol', lit = 'ÿ' (256), uid = 0000
    main.c:958: symbols[ 14] (0x5598b41d1738): src =  16, use = true, sym =   0, get = 2, cpy = 3, len = 4, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 1110
    main.c:970: symbols[257] (0x5598b41d468c): src = 257, use = true, sym =   3, get = 0, cpy = 0, len = 4, name = 'Leng Symbol', lit = 'ÿ' (257), uid = 0000
    main.c:958: symbols[ 14] (0x5598b41d1738): src =  16, use = true, sym =   0, get = 2, cpy = 3, len = 4, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 1110
    main.c:970: symbols[258] (0x5598b41d46b8): src = 258, use = true, sym =   4, get = 0, cpy = 0, len = 4, name = 'Leng Symbol', lit = 'ÿ' (258), uid = 0000
    main.c:958: symbols[ 14] (0x5598b41d1738): src =  16, use = true, sym =   0, get = 2, cpy = 3, len = 4, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 1110
    main.c:970: symbols[259] (0x5598b41d46e4): src = 259, use = true, sym =   5, get = 0, cpy = 0, len = 4, name = 'Leng Symbol', lit = 'ÿ' (259), uid = 0000
    main.c:909: RdHuffmanLengDistCodes( 0x7fff047ee6a0, 0x5598b397a4c0, 0x5598b397a540, 260 )
    main.c:958: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 0
    main.c:963: symbols[  0] (0x5598b41d4750): src =   0, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'Dist Symbol', lit = 'ÿ' (260), uid = 00
    main.c:958: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 0
    main.c:963: symbols[  4] (0x5598b41d4800): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'Dist Symbol', lit = 'ÿ' (264), uid = 00
    main.c:958: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 0
    main.c:963: symbols[  5] (0x5598b41d482c): src =   5, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'Dist Symbol', lit = 'ÿ' (265), uid = 00
    main.c:958: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = 'Code Symbol', lit = 'ÿ' (  0), uid = 0
    main.c:963: symbols[  6] (0x5598b41d4858): src =   6, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'Dist Symbol', lit = 'ÿ' (266), uid = 00
    Related code:
    Code:
    	while ( pos < Symbols->foresee && stream->num < stream->max )
    	{
    		uint i = 0, copy = 0;
    		uintmax_t val;
    
    		code = IdentifyNextSymbol( stream, Codes, true, false );
    
    		if ( !code )
    		{
    #if 0
    			Symbols->entries.used = pos;
    			printf( PFX_FORMAT " pos = %u\n", PFX_VALUES, pos );
    			PrintSymbols( Codes, __LINE__, "Code Symbols" );
    			PrintSymbols( Symbols, __LINE__, name );
    			PrintStreamDetails( stream, __LINE__ );
    #endif
    			return EINVAL;
    		}
    
    		//PrintStreamDetails( stream );
    		val = CopyStreamBits( stream, code->get, true );
    
    		//PrintSymbol( code, __LINE__, j );
    
    		copy = (uint)(code->cpy + val);
    
    		//if ( copy ) printf( "copy = %3u\n", copy );
    
    		i = 0;
    
    		do
    		{
    			if ( pos >= Symbols->entries.have )
    			{
    				printf( PFX_FORMAT " pos = %u\n", PFX_VALUES, pos );
    				PrintSymbols( Symbols, __LINE__, name );
    				return ERANGE;
    			}
    
    			symbol = symbols + pos;
    			symbol->src = pos;
    			symbol->lit = lit;
    
    			if ( !copy || (!start_with_literal && lit >= 256) )
    			{
    				symbol->use = true;
    
    				PrintSymbol( code, __LINE__, code->uid, "Code Symbol" );
    				if ( start_with_literal )
    				{
    					symbol->len = code->len + 1;
    					symbol->get = code->get;
    					PrintSymbol( symbol, __LINE__, pos, "Dist Symbol" );
    				}
    				else
    				{
    					symbol->sym = (lit > 256) ? used : 0;
    					symbol->len += (lit < 256) ? code->src : 4;
    					symbol->cpy = (code->src < 16) ? code->src : 0;
    					PrintSymbol( symbol, __LINE__, pos, "Leng Symbol" );
    				}
    
    				if ( symbol->len > Symbols->longest )
    					Symbols->longest = symbol->len;
    
    				++used;
    			}
    
    			++lit;
    			++pos;
    			++i;
    		}
    		while ( i < copy );
    	}
    Defaltation Output:
    Code:
    main.c:812: main() Deflating data...
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  2] (0x5598b41d1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'leng_symbol', lit = 'b' ( 98), uid = 10
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  2] (0x5598b41d1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'leng_symbol', lit = 'b' ( 98), uid = 10
    main.c:826: symbols[  2] (0x5598b41d1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'leng_symbol', lit = 'b' ( 98), uid = 10
    main.c:826: symbols[  2] (0x5598b41d1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'leng_symbol', lit = 'b' ( 98), uid = 10
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[ 14] (0x5598b41d1cc8): src = 258, use = true, sym =   4, get = 0, cpy = 0, len = 4, name = 'leng_symbol', lit = 'ÿ' (258), uid = 1110
    main.c:856: symbols[  2] (0x5598b41d47a8): src =   5, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'dist_symbol', lit = 'ÿ' (265), uid = 10
    main.c:861: lookup =   4
    main.c:826: symbols[ 13] (0x5598b41d1c9c): src = 257, use = true, sym =   3, get = 0, cpy = 0, len = 4, name = 'leng_symbol', lit = 'ÿ' (257), uid = 1101
    main.c:856: symbols[  3] (0x5598b41d47d4): src =   6, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'dist_symbol', lit = 'ÿ' (266), uid = 11
    main.c:861: lookup =   3
    main.c:826: symbols[ 15] (0x5598b41d1cf4): src = 259, use = true, sym =   5, get = 0, cpy = 0, len = 4, name = 'leng_symbol', lit = 'ÿ' (259), uid = 1111
    main.c:856: symbols[  1] (0x5598b41d477c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'dist_symbol', lit = 'ÿ' (264), uid = 01
    main.c:861: lookup =   6
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[ 15] (0x5598b41d1cf4): src = 259, use = true, sym =   5, get = 0, cpy = 0, len = 4, name = 'leng_symbol', lit = 'ÿ' (259), uid = 1111
    main.c:856: symbols[  1] (0x5598b41d477c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'dist_symbol', lit = 'ÿ' (264), uid = 01
    main.c:861: lookup =   5
    main.c:826: symbols[  2] (0x5598b41d1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = 'leng_symbol', lit = 'b' ( 98), uid = 10
    main.c:826: symbols[ 14] (0x5598b41d1cc8): src = 258, use = true, sym =   4, get = 0, cpy = 0, len = 4, name = 'leng_symbol', lit = 'ÿ' (258), uid = 1110
    main.c:856: symbols[  0] (0x5598b41d4750): src =   0, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = 'dist_symbol', lit = 'ÿ' (260), uid = 00
    main.c:861: lookup =   4
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[  0] (0x5598b41d1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = 'leng_symbol', lit = 'a' ( 97), uid = 0
    main.c:826: symbols[ 12] (0x5598b41d1c70): src = 256, use = true, sym =   0, get = 0, cpy = 4, len = 4, name = 'leng_symbol', lit = 'ÿ' (256), uid = 1100
    PrintBytes( 0x5598b41d14a0, 0, 16 )
    abaabbbaaaabaa
    Related Code:
    Code:
    		else if ( type == 2 )
    		{
    			uint count = 0;
    			uint i = 0, j = 0;
    			HUFFMAN_SYMBOL *codes, *code, *leng_symbol, *dist_symbol;
    			int err;
    
    			LengSymbols.foresee = CopyStreamBits( &stream, 5, true ) + 257;
    			DistSymbols.foresee = CopyStreamBits( &stream, 5, true ) + 1;
    			CodeSymbols.foresee = CopyStreamBits( &stream, 4, true ) + 4;
    
    			PrintSymbols( &DistSymbols, __LINE__, "Distance Symbols" );
    			PrintSymbols( &LengSymbols, __LINE__, "Length Symbols" );
    			PrintSymbols( &CodeSymbols, __LINE__, "Code Symbols" );
    			PrintStreamDetails( &stream, __LINE__ );
    
    			i = CodeSymbols.foresee;
    
    			codes = ReAllocSymbols( &CodeSymbols, i < 32 ? 32 : i + 1 );
    
    			if ( !codes )
    				return Return( ret, ENOMEM );
    
    			for ( i = 0; i < CodeSymbols.foresee; ++i )
    			{
    				static uint adjust[] =
    				{
    					16, 17, 18, 0,  8, 7,
    					 9,  6, 10, 5, 11, 4,
    					12,  3, 13, 2, 14, 1,
    					15, 19
    				};
    
    				code = codes + adjust[i];
    
    				code->src = adjust[i];
    				code->get = hufd_extra_bits_array[code->src];
    				code->cpy = hufd_repeat_bits_array[code->src];
    				code->len = CopyStreamBits( &stream, 3, true );
    				code->use = !!(code->len);
    
    				if ( code->len > CodeSymbols.longest )
    					CodeSymbols.longest = code->len;
    			}
    
    			CodeSymbols.entries.used = i + 1;
    
    			count = LengSymbols.foresee + DistSymbols.foresee;
    			printf( "pos = %u, max = %u, count = %u\n", stream.pos, stream.max, count );
    			size = stream.num;
    			printf( "left = %u, byte = %zu, bit = %zu\n", stream.fed , size / CHAR_BIT, size % CHAR_BIT );
    
    			BuildCodes( &CodeSymbols );
    
    			SortSymbolsByUID( &CodeSymbols );
    
    			PrintSymbols( &CodeSymbols, __LINE__, "Code Symbols" );
    
    			err = RdHuffmanLengDistCodes
    			(
    				&stream,
    				&CodeSymbols,
    				&LengSymbols,
    				0,
    				"Length Symbols"
    			);
    
    			if ( err )
    				return Return( ret, err );
    
    			//PrintSymbols( &LengSymbols, __LINE__, "Length Symbols" );
    
    			err = RdHuffmanLengDistCodes
    			(
    				&stream,
    				&CodeSymbols,
    				&DistSymbols,
    				LengSymbols.foresee,
    				"Distance Symbols"
    			);
    
    			if ( err )
    				return Return( ret, err );
    
    			//PrintSymbols( &DistSymbols, __LINE__, "Distance Symbols" );
    
    			printf("%s:%u: %s() Deflating data...\n", __FILE__, __LINE__, __func__ );
    
    			i = 0; j = 0;
    			while ( stream.num < stream.max )
    			{
    				leng_symbol = IdentifyNextSymbol( &stream, &LengSymbols, true, false );
    
    				if ( !leng_symbol )
    				{
    					PrintStreamDetails( &stream, __LINE__ );
    					return Return( ret, EINVAL );
    				}
    
    				//puts("Found:\n");
    				PrintSymbol( leng_symbol, __LINE__, leng_symbol->uid, "leng_symbol" );
    
    				if ( leng_symbol->lit == 256 )
    					break;
    				else if ( leng_symbol->lit < 256 )
    				{
    					if ( i == Lazy.size )
    					{
    						printf("%s:%u: Hit buffer limit!\n", __FILE__, __LINE__ );
    						return Return( ret , ERANGE );
    					}
    					j = i++;
    					lazy[j] = leng_symbol->lit;
    				}
    				else
    				{
    					uint dref;
    					uint lookup;
    					static uint extends[8] = { 0, 0, 0, 0, 1, 1, 2, 0 };
    
    					dist_symbol = IdentifyNextSymbol( &stream, &DistSymbols, true, false );
    
    					if ( !dist_symbol )
    					{
    						PrintStreamDetails( &stream, __LINE__ );
    						return Return( ret, EINVAL );
    					}
    
    					dref = dist_symbol->src;
    
    					PrintSymbol( dist_symbol, __LINE__, dist_symbol->uid, "dist_symbol" );
    
    					lookup = leng_symbol->sym;
    					lookup += CopyStreamBits( &stream, extends[dref], true );
    
    					printf( "%s:%u: lookup = %3u\n", __FILE__, __LINE__, lookup );
    
    					lookup = i - lookup;
    
    					for ( j = 0; j < leng_symbol->cpy; ++j )
    					{
    						if ( i == Lazy.size )
    						{
    							printf("%s:%u: Hit buffer limit!\n", __FILE__, __LINE__ );
    							return Return( ret , ERANGE );
    						}
    						lazy[i++] = lazy[lookup++];
    					}
    
    					j = i - 1;
    
    					//PrintSymbol( dist_symbol, __LINE__, dist_symbol->uid );
    				}
    			}
    		}
    There's only 1 instance of "else if ( type == 2 )" in the attached file so if you want to get more information you can just search for that to get your starting point.
    Attached Files Attached Files

  9. #39
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    The second table will never work - there are two entries of length 1, so that uses both codes '0' & '1', leaving no other possible codes for longer length codes.

    A few posts ago supplied an edited bit of your code that assigned code symbols correctly, as well as a walk through of the assignment algorithm.

    It would pay to revisit that, after working out why your table has too many entries with short lengths.

  10. #40
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    If you can post a ASCII hex dump of *just* the zlib data I will turn up the debug level on my code and print out the code tables.

  11. #41
    Registered User
    Join Date
    Sep 2020
    Posts
    425
    I got a few spare minutes.

    Attached is a program that should decode a .gz file less than 8kb, and if you add "-v" on the command line it will print out things like code tables/dictionaries along the way.

    Code:
    $ echo "Hi there" | gzip > a.gz
    
    $ ./main2 a.gz
    Have loaded 29 bytes from file
    Last block = true
    Compressed (default dictionary)
    data: 48 'H'
    data: 69 'i'
    data: 20 ' '
    data: 74 't'
    data: 68 'h'
    data: 65 'e'
    data: 72 'r'
    data: 65 'e'
    data: 0a
      End of block 256
      This is the last block
    It assumes that the zlib data start at byte 10.

    Don't judge the code too harshly - it's just a hack to test my understanding, not production ready.

    So here I'm gziping up the source file "inflate.c", then decoding it with 'verbose' enabled:
    Code:
    $ cat inflate.c | gzip > abc.gz
    $ ./main2 -v abc.gz | more
    And here is the output showing the code tables that were used:
    Code:
    Compressed (dynamic dictionary)
      HLIT 282, HDIST 29, HCLEN 13
    Codelen table
        0:  4 1100
        4:  4 1101
        5:  3 000
        6:  3 001
        7:  3 010
        8:  3 011
        9:  3 100
       10:  3 101
       11:  4 1110
       12:  5 11110
       16:  6 111110
       17:  7 1111110
       18:  7 1111111
    Reading in literal/length plus distance dictionary lengths
    Literal/Repeat Length table
       10:  7 1011010
       32:  5 00100
       33:  9 111101000
       34:  8 11100010
       35: 12 111111111100
       37:  8 11100011
       38:  8 11100100
       39:  9 111101001
       40:  7 1011011
       41:  6 011110
       42:  9 111101010
       43:  7 1011100
       44:  6 011111
       45:  8 11100101
       46:  9 111101011
       47:  9 111101100
       48:  7 1011101
       49:  6 100000
       50:  6 100001
       51:  7 1011110
       52:  7 1011111
       53:  7 1100000
       54:  7 1100001
       55:  7 1100010
       56:  7 1100011
       57:  7 1100100
       58:  8 11100110
       59:  7 1100101
       60:  8 11100111
       61:  8 11101000
       62:  8 11101001
    ......
    Attached Files Attached Files
    Last edited by hamster_nz; 07-31-2021 at 05:21 PM.

  12. #42
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    The second table will never work - there are two entries of length 1, so that uses both codes '0' & '1', leaving no other possible codes for longer length codes.

    A few posts ago supplied an edited bit of your code that assigned code symbols correctly, as well as a walk through of the assignment algorithm.

    It would pay to revisit that, after working out why your table has too many entries with short lengths.
    I've just double checked the data (no modifications from what I posted) and there's no duplicate lengths, are you sure you're not confusing symbols from 2 different tables? I added the names to make clear which table/tree they're from.

  13. #43
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    If you can post a ASCII hex dump of *just* the zlib data I will turn up the debug level on my code and print out the code tables.
    Did you mean this?
    Code:
    main.c:782: Code Symbols, foresee =  18, longest =  4, have =  32, used = 16
    main.c:782: symbols[  0] (0x5598b41d14d0): src =   2, use = true, sym =   0, get = 0, cpy = 0, len = 1, name = '?', lit = 'ÿ' (  0), uid = 0
    main.c:782: symbols[  2] (0x5598b41d1528): src =  18, use = true, sym =   0, get = 7, cpy = 11, len = 2, name = '?', lit = 'ÿ' (  0), uid = 10
    main.c:782: symbols[ 12] (0x5598b41d16e0): src =   1, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1100
    main.c:782: symbols[ 13] (0x5598b41d170c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1101
    main.c:782: symbols[ 14] (0x5598b41d1738): src =  16, use = true, sym =   0, get = 2, cpy = 3, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1110
    main.c:782: symbols[ 15] (0x5598b41d1764): src =  17, use = true, sym =   0, get = 3, cpy = 3, len = 4, name = '?', lit = 'ÿ' (  0), uid = 1111
    main.c:808: Length Symbols, foresee = 260, longest =  4, have = 261, used = 16
    main.c:808: symbols[  0] (0x560a164e1a60): src =  97, use = true, sym =   0, get = 0, cpy = 1, len = 1, name = '?', lit = 'a' ( 97), uid = 0
    main.c:808: symbols[  2] (0x560a164e1ab8): src =  98, use = true, sym =   0, get = 0, cpy = 2, len = 2, name = '?', lit = 'b' ( 98), uid = 10
    main.c:808: symbols[ 12] (0x560a164e1c70): src = 256, use = true, sym =   0, get = 0, cpy = 4, len = 4, name = '?', lit = 'ÿ' (256), uid = 1100
    main.c:808: symbols[ 13] (0x560a164e1c9c): src = 257, use = true, sym =   3, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (257), uid = 1101
    main.c:808: symbols[ 14] (0x560a164e1cc8): src = 258, use = true, sym =   4, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (258), uid = 1110
    main.c:808: symbols[ 15] (0x560a164e1cf4): src = 259, use = true, sym =   5, get = 0, cpy = 0, len = 4, name = '?', lit = 'ÿ' (259), uid = 1111
    main.c:809: Distance Symbols, foresee =   7, longest =  2, have =   8, used = 4
    main.c:809: symbols[  0] (0x560a164e4750): src =   0, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = '?', lit = 'ÿ' (260), uid = 00
    main.c:809: symbols[  1] (0x560a164e477c): src =   4, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = '?', lit = 'ÿ' (264), uid = 01
    main.c:809: symbols[  2] (0x560a164e47a8): src =   5, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = '?', lit = 'ÿ' (265), uid = 10
    main.c:809: symbols[  3] (0x560a164e47d4): src =   6, use = true, sym =   0, get = 0, cpy = 0, len = 2, name = '?', lit = 'ÿ' (266), uid = 11
    Just in case you didn't realise that's what those commented out PrintSymbols() calls produce when given the final tables, they iterate through the all the nodes up until it hits the used count and prints only those whose "use" boolean is equal to true, otherwise it only prints info about the table.

    Edit: Ah, btw that "len" member is only used for generating the UIDs later (I deem UID less confusing than code when code is also used for the code table, confusing to say every code, leng & dist symbol has a code member, less confusing to say they all have a UID for huffman purposes).

    The "get" member is for how many extra bits need to be read after the symbol's UID is found in the stream/buffer/file.

    The "lit" member will always be of the original leng "alphabet" which is why it's used as the value to be put in the target buffer

    Currently the "sym" member is being used for for how to look back

    The "cpy" member is naturally supposed to hold the base number of characters to copy, the number obtained from using the "get" member to read more of the stream is added to this.

    The "src" member is supposed to hold the non-huffman code mentioned in the tables of this document: Understanding gzip

    I think that about covers the potentially confusing members

  14. #44
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by hamster_nz View Post
    I got a few spare minutes.

    Attached is a program that should decode a .gz file less than 8kb, and if you add "-v" on the command line it will print out things like code tables/dictionaries along the way.

    Code:
    $ echo "Hi there" | gzip > a.gz
    
    $ ./main2 a.gz
    Have loaded 29 bytes from file
    Last block = true
    Compressed (default dictionary)
    data: 48 'H'
    data: 69 'i'
    data: 20 ' '
    data: 74 't'
    data: 68 'h'
    data: 65 'e'
    data: 72 'r'
    data: 65 'e'
    data: 0a
      End of block 256
      This is the last block
    It assumes that the zlib data start at byte 10.

    Don't judge the code too harshly - it's just a hack to test my understanding, not production ready.

    So here I'm gziping up the source file "inflate.c", then decoding it with 'verbose' enabled:
    Code:
    $ cat inflate.c | gzip > abc.gz
    $ ./main2 -v abc.gz | more
    And here is the output showing the code tables that were used:
    Code:
    Compressed (dynamic dictionary)
      HLIT 282, HDIST 29, HCLEN 13
    Codelen table
        0:  4 1100
        4:  4 1101
        5:  3 000
        6:  3 001
        7:  3 010
        8:  3 011
        9:  3 100
       10:  3 101
       11:  4 1110
       12:  5 11110
       16:  6 111110
       17:  7 1111110
       18:  7 1111111
    Reading in literal/length plus distance dictionary lengths
    Literal/Repeat Length table
       10:  7 1011010
       32:  5 00100
       33:  9 111101000
       34:  8 11100010
       35: 12 111111111100
       37:  8 11100011
       38:  8 11100100
       39:  9 111101001
       40:  7 1011011
       41:  6 011110
       42:  9 111101010
       43:  7 1011100
       44:  6 011111
       45:  8 11100101
       46:  9 111101011
       47:  9 111101100
       48:  7 1011101
       49:  6 100000
       50:  6 100001
       51:  7 1011110
       52:  7 1011111
       53:  7 1100000
       54:  7 1100001
       55:  7 1100010
       56:  7 1100011
       57:  7 1100100
       58:  8 11100110
       59:  7 1100101
       60:  8 11100111
       61:  8 11101000
       62:  8 11101001
    ......

    TBH I find that output to be unclear about what the values represent, part of the reason I print the member names is so I can more easily see what should've been what compared to what it actually is. Did look through you're code but didn't find what I was looking for so at present it wasn't of help to me, thx anyways

  15. #45
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Finally got that blasted algorithm right, for anyone wanting to reference my code (which I've already cleaned up a bit prior to getting the algorithm right) here's where it's uploaded:
    Gitlab upload instance

    Just don't expect it to be super clean, I literally only just got it right just now, I'll do a cleaner version for my paw project later.
    Last edited by awsdert; 08-02-2021 at 06:32 PM. Reason: Auto-generated URL name didn't make sense in context

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. hex increment
    By davidx in forum C Programming
    Replies: 3
    Last Post: 10-19-2019, 07:06 AM
  2. Two pre increment in one expression
    By h255874 in forum C Programming
    Replies: 4
    Last Post: 09-21-2019, 08:47 AM
  3. Post Increment an Pre Increment operators in c++
    By anil_ in forum C++ Programming
    Replies: 4
    Last Post: 11-12-2011, 08:27 PM
  4. can't get loop to increment
    By rivkyfried1 in forum C Programming
    Replies: 2
    Last Post: 10-11-2010, 04:03 AM
  5. Post increment and pre increment help
    By noob2c in forum C++ Programming
    Replies: 5
    Last Post: 08-05-2003, 03:03 AM

Tags for this Thread