Bit manipulation

**Neo1** · 03-20-2008

I'm trying to implement MD5 in a C++ function, and even though the math is killing me, the real problem is the preprocessing of the data that has to be done before doing the MD5 rounds.

I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512.

Then finally i have to append the length of the unpadded data to the file as a 64-bit little endian integer.

I've got no idea how to manipulate the bits in my data, the smallest type i know of is a char which is guaranteed to be 1 byte (8bit). I've been googling around and i think what i need to use is a Bitfield, amirite?

My question is, how should i declare my bitfield struct? Atm i've got this:

Code:

struct BitField
{
	char data : 1;
}

Would this work for what i'm trying to do? Basically i've got a vector full of these BitField structs, and then i was planning to just loop over a file and load it into my vector bit for bit.

Would this work? Then to add the "1" bit i could do something like this:

Code:

BitField BF;
BF.data = 1;     //Is this correct?
vData.push_back(BF);     //vData is my vector

This will probably not be the last problem i'm going to have with this, so i hope you guys can be patient with me

**phantomotap** · 03-20-2008

"I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512."

This simply means that the "high" bit should bet set (0x80).

Also, don't use bit fields. They suck and are subject to "endianess".

Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:

static const unsigned padding[64] =
	{
		0x80, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0
	};

Soma

**Neo1** · 03-22-2008

Originally Posted by phantomotap

"I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512."

This simply means that the "high" bit should bet set (0x80).

Also, don't use bit fields. They suck and are subject to "endianess".

Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:

static const unsigned padding[64] =
	{
		0x80, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0
	};

Soma

Okay, so i've dropped the Bitfields for now...

I'm not quite sure i understand the memcat() part though, can i memcat data into a Vector? :S

I've tried the following with my data vector:

Code:

std::vector <unsigned> vData;
....
vData.push_back(0x80);
	
	FileSizeInBits = FileSize.QuadPart * 8;
	while(FileSizeInBits != 448 % 512)
	{
		vData.push_back(0);
		FileSizeInBits++;
	}
	vData.push_back(FileSizeInBits);

But it isn't working, if i try to cout the contents of the vector i just get a segfault. I'm not sure if a vector was such a good idea after all, what do you guys think? I bet some of you have implemented MD5 before, what did you do to make it work?

**matsp** · 03-22-2008

For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

--
Mats

**Neo1** · 03-22-2008

Originally Posted by matsp

For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

--
Mats

I was however just told that Bitfields "suck"? Do you reckon that a vector of bitfields would get the job done, similar to what i suggested in my original post...

**phantomotap** · 03-22-2008

My 'memcat' is the same as 'memcpy(buffer + size_of_current_buffered_content, data_to_append, size_of_data_to_append);'.

Most compilers will treat 'unsigned', as in 'std::vector <unsigned> vData;', as a 'unsigned int'. (That is, you are manipulating content of a double word size.)

This, '448 % 512', will always be equal to 448. I doubt the instructions even get to the assembler. (That is, '%' has a higher precedence that '!='.)

A 'char' can be treated as 'signed' or 'unsigned'. The value of the target when the bit is set to one is compiler dependent.

No, a 'std::vector' of bit fields will not work.

Soma

**Neo1** · 03-22-2008

Okay, so no vectors and no bitfields, just a dynamically allocated array of unsigned chars.

So far, this is how my code looks (Note: no hashing going on, just trying to get the pre processed data right)

Code:

DWORD GetHashFromFile(HANDLE hFile, std::string &Hash)
{
	unsigned char *pBuffer = NULL;
	unsigned int FileSizeInBits, NoOfZeroes = 0;
	LARGE_INTEGER FileSizeInBytes;
	DWORD i, j, dwBytesRead;
	
	if(!GetFileSizeEx(hFile, &FileSizeInBytes))
	{
		return (GetLastError());
	}
	
        /*Convert from bytes to bits and  add 1 to the size for the "1" bit.*/
	FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;

        /* Calculate how many zeroes are needed. */
	while((FileSizeInBits &#37; 512) != 448)
	{
		FileSizeInBits++;
		NoOfZeroes++;
	}
	
        /*Add 64 to the size for the 64 bit size at the end of the data. */
	FileSizeInBits += 64
	
	try
	{
                /* Allocate enough memory. */
		pBuffer = new unsigned char[FileSizeInBits];
	}
	catch (std::bad_alloc&)
	{
		return 1;
	}
	
	if(!ReadFile(hFile, pBuffer, FileSizeInBytes.QuadPart, &dwBytesRead, NULL))
	{
		return (GetLastError());
	}
	
        /* Add the "1" bit */
	pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;
	
	for(i = 0; i < NoOfZeroes; i++)
	{
                /* Backtrack through the buffer to right after the "1" bit, and start adding zeroes. */
		pBuffer[FileSizeInBits - 64 - NoOfZeroes + 1 + i] = 0;	
	}
	
	pBuffer[FileSizeInBits - 64] = FileSizeInBits;
        // So now, i _should_ have a buffer with the file loaded into it, 
        // a "1" bit after the file data, a bunch of zeroes and a 64 bit size value at the end, with a filesize divisible by 512.
}

When i try to output the contents of my buffer, i get the file and then a bunch of irrelevant data, which is not what i want.

Another thing: A char is 8 bits right? So when i do

Code:

pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;

I'm not adding one "1" bit, i'm actually adding "00000001", or am i completely misunderstanding something here? :S

Perhaps i'm in over my head, i must admit, i didn't expect this part to be the one where i'd get stuck.

**phantomotap** · 03-22-2008

Code:

pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;

In general, just read my first post again, and turn the warning level up as high as it will go. You have some conversion problems.

Code:

FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;

Wrong. You are, with this code, assuming an 8 bit byte. (That's fine. It is virtually always correct.) Stop converting to bits! (That is, divide 448 and 512 by 8 and use those values in the calculation.)

Code:

while((FileSizeInBits % 512) != 448)
{
	FileSizeInBits++;
	NoOfZeroes++;
}

Wrong. The mathematics are really simple. You do not need a loop.

Soma

**Neo1** · 03-24-2008

phantomotap:

Do you by:

'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:

static const unsigned padding[64] =
	{
		0x80, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0
	};

Mean something like this?

Code:

unsigned char *pBuffer2 = &pBuffer[FileSize.QuadPart];
	
memcpy(pBuffer2, Padding, (512 - ((FileSize.QuadPart + 64) % 512)) ) ;

Thread: Bit manipulation

Thread Tools

Search Thread

Display

Bit manipulation

Similar Threads

32 bit to 64 bit Ubuntu

bit value check efficiency

Bit processing in C

Porting from 32 bit machine to 64 bit machine!

Copy bit to bit