Thread: Bit manipulation

  1. #1
    Internet Superhero
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    964

    Bit manipulation

    I'm trying to implement MD5 in a C++ function, and even though the math is killing me, the real problem is the preprocessing of the data that has to be done before doing the MD5 rounds.

    I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512.

    Then finally i have to append the length of the unpadded data to the file as a 64-bit little endian integer.

    I've got no idea how to manipulate the bits in my data, the smallest type i know of is a char which is guaranteed to be 1 byte (8bit). I've been googling around and i think what i need to use is a Bitfield, amirite?

    My question is, how should i declare my bitfield struct? Atm i've got this:
    Code:
    struct BitField
    {
    	char data : 1;
    }
    Would this work for what i'm trying to do? Basically i've got a vector full of these BitField structs, and then i was planning to just loop over a file and load it into my vector bit for bit.

    Would this work? Then to add the "1" bit i could do something like this:
    Code:
    BitField BF;
    BF.data = 1;     //Is this correct?
    vData.push_back(BF);     //vData is my vector
    This will probably not be the last problem i'm going to have with this, so i hope you guys can be patient with me
    How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    "I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512."

    This simply means that the "high" bit should bet set (0x80).

    Also, don't use bit fields. They suck and are subject to "endianess".

    Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

    Code:
    static const unsigned padding[64] =
    	{
    		0x80, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0
    	};
    Soma

  3. #3
    Internet Superhero
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    964
    Quote Originally Posted by phantomotap View Post
    "I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512."

    This simply means that the "high" bit should bet set (0x80).

    Also, don't use bit fields. They suck and are subject to "endianess".

    Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

    Code:
    static const unsigned padding[64] =
    	{
    		0x80, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0
    	};
    Soma
    Okay, so i've dropped the Bitfields for now...

    I'm not quite sure i understand the memcat() part though, can i memcat data into a Vector? :S

    I've tried the following with my data vector:
    Code:
    std::vector <unsigned> vData;
    ....
    vData.push_back(0x80);
    	
    	FileSizeInBits = FileSize.QuadPart * 8;
    	while(FileSizeInBits != 448 % 512)
    	{
    		vData.push_back(0);
    		FileSizeInBits++;
    	}
    	vData.push_back(FileSizeInBits);
    But it isn't working, if i try to cout the contents of the vector i just get a segfault. I'm not sure if a vector was such a good idea after all, what do you guys think? I bet some of you have implemented MD5 before, what did you do to make it work?
    How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Internet Superhero
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    964
    Quote Originally Posted by matsp View Post
    For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

    --
    Mats
    I was however just told that Bitfields "suck"? Do you reckon that a vector of bitfields would get the job done, similar to what i suggested in my original post...
    How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    My 'memcat' is the same as 'memcpy(buffer + size_of_current_buffered_content, data_to_append, size_of_data_to_append);'.

    Most compilers will treat 'unsigned', as in 'std::vector <unsigned> vData;', as a 'unsigned int'. (That is, you are manipulating content of a double word size.)

    This, '448 % 512', will always be equal to 448. I doubt the instructions even get to the assembler. (That is, '%' has a higher precedence that '!='.)

    A 'char' can be treated as 'signed' or 'unsigned'. The value of the target when the bit is set to one is compiler dependent.

    No, a 'std::vector' of bit fields will not work.

    Soma

  7. #7
    Internet Superhero
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    964
    Okay, so no vectors and no bitfields, just a dynamically allocated array of unsigned chars.

    So far, this is how my code looks (Note: no hashing going on, just trying to get the pre processed data right)
    Code:
    DWORD GetHashFromFile(HANDLE hFile, std::string &Hash)
    {
    	unsigned char *pBuffer = NULL;
    	unsigned int FileSizeInBits, NoOfZeroes = 0;
    	LARGE_INTEGER FileSizeInBytes;
    	DWORD i, j, dwBytesRead;
    	
    	if(!GetFileSizeEx(hFile, &FileSizeInBytes))
    	{
    		return (GetLastError());
    	}
    	
            /*Convert from bytes to bits and  add 1 to the size for the "1" bit.*/
    	FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;
    
            /* Calculate how many zeroes are needed. */
    	while((FileSizeInBits &#37; 512) != 448)
    	{
    		FileSizeInBits++;
    		NoOfZeroes++;
    	}
    	
            /*Add 64 to the size for the 64 bit size at the end of the data. */
    	FileSizeInBits += 64
    	
    	try
    	{
                    /* Allocate enough memory. */
    		pBuffer = new unsigned char[FileSizeInBits];
    	}
    	catch (std::bad_alloc&)
    	{
    		return 1;
    	}
    	
    	if(!ReadFile(hFile, pBuffer, FileSizeInBytes.QuadPart, &dwBytesRead, NULL))
    	{
    		return (GetLastError());
    	}
    	
            /* Add the "1" bit */
    	pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;
    	
    	for(i = 0; i < NoOfZeroes; i++)
    	{
                    /* Backtrack through the buffer to right after the "1" bit, and start adding zeroes. */
    		pBuffer[FileSizeInBits - 64 - NoOfZeroes + 1 + i] = 0;	
    	}
    	
    	pBuffer[FileSizeInBits - 64] = FileSizeInBits;
            // So now, i _should_ have a buffer with the file loaded into it, 
            // a "1" bit after the file data, a bunch of zeroes and a 64 bit size value at the end, with a filesize divisible by 512.
    }
    When i try to output the contents of my buffer, i get the file and then a bunch of irrelevant data, which is not what i want.

    Another thing: A char is 8 bits right? So when i do
    Code:
    pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;
    I'm not adding one "1" bit, i'm actually adding "00000001", or am i completely misunderstanding something here? :S

    Perhaps i'm in over my head, i must admit, i didn't expect this part to be the one where i'd get stuck.
    Last edited by Neo1; 03-22-2008 at 05:02 PM.
    How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.

  8. #8
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Code:
    pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;
    In general, just read my first post again, and turn the warning level up as high as it will go. You have some conversion problems.

    Code:
    FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;
    Wrong. You are, with this code, assuming an 8 bit byte. (That's fine. It is virtually always correct.) Stop converting to bits! (That is, divide 448 and 512 by 8 and use those values in the calculation.)

    Code:
    while((FileSizeInBits % 512) != 448)
    {
    	FileSizeInBits++;
    	NoOfZeroes++;
    }
    Wrong. The mathematics are really simple. You do not need a loop.

    Soma

  9. #9
    Internet Superhero
    Join Date
    Sep 2006
    Location
    Denmark
    Posts
    964
    phantomotap:

    Do you by:
    'memcat' input by '(512 - ((length + 64) % 512))' with:

    Code:
    static const unsigned padding[64] =
    	{
    		0x80, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0,
    		0, 0, 0, 0, 0, 0, 0, 0
    	};
    Mean something like this?
    Code:
    unsigned char *pBuffer2 = &pBuffer[FileSize.QuadPart];
    	
    memcpy(pBuffer2, Padding, (512 - ((FileSize.QuadPart + 64) % 512)) ) ;
    How I need a drink, alcoholic in nature, after the heavy lectures involving quantum mechanics.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 32 bit to 64 bit Ubuntu
    By Akkernight in forum Tech Board
    Replies: 15
    Last Post: 11-17-2008, 03:14 AM
  2. bit value check efficiency
    By George2 in forum C Programming
    Replies: 5
    Last Post: 11-05-2007, 07:59 AM
  3. Bit processing in C
    By eliomancini in forum C Programming
    Replies: 8
    Last Post: 06-07-2005, 10:54 AM
  4. Porting from 32 bit machine to 64 bit machine!
    By anoopks in forum C Programming
    Replies: 10
    Last Post: 02-25-2005, 08:02 PM
  5. Copy bit to bit
    By Coder2Die4 in forum C Programming
    Replies: 15
    Last Post: 06-26-2003, 09:58 AM