1. ## Bit manipulation

I'm trying to implement MD5 in a C++ function, and even though the math is killing me, the real problem is the preprocessing of the data that has to be done before doing the MD5 rounds.

I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512.

Then finally i have to append the length of the unpadded data to the file as a 64-bit little endian integer.

I've got no idea how to manipulate the bits in my data, the smallest type i know of is a char which is guaranteed to be 1 byte (8bit). I've been googling around and i think what i need to use is a Bitfield, amirite?

My question is, how should i declare my bitfield struct? Atm i've got this:
Code:
```struct BitField
{
char data : 1;
}```
Would this work for what i'm trying to do? Basically i've got a vector full of these BitField structs, and then i was planning to just loop over a file and load it into my vector bit for bit.

Would this work? Then to add the "1" bit i could do something like this:
Code:
```BitField BF;
BF.data = 1;     //Is this correct?
vData.push_back(BF);     //vData is my vector```
This will probably not be the last problem i'm going to have with this, so i hope you guys can be patient with me

2. "I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 &#37; 512."

This simply means that the "high" bit should bet set (0x80).

Also, don't use bit fields. They suck and are subject to "endianess".

Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:
```static const unsigned padding[64] =
{
0x80, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
};```
Soma

3. Originally Posted by phantomotap
"I need to add a "1" bit to the end of my data and then append "0" bits onto my data until LengthInBits = 448 % 512."

This simply means that the "high" bit should bet set (0x80).

Also, don't use bit fields. They suck and are subject to "endianess".

Also, 'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:
```static const unsigned padding[64] =
{
0x80, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
};```
Soma
Okay, so i've dropped the Bitfields for now...

I'm not quite sure i understand the memcat() part though, can i memcat data into a Vector? :S

I've tried the following with my data vector:
Code:
```std::vector <unsigned> vData;
....
vData.push_back(0x80);

while(FileSizeInBits != 448 % 512)
{
vData.push_back(0);
FileSizeInBits++;
}
vData.push_back(FileSizeInBits);```
But it isn't working, if i try to cout the contents of the vector i just get a segfault. I'm not sure if a vector was such a good idea after all, what do you guys think? I bet some of you have implemented MD5 before, what did you do to make it work?

4. For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

--
Mats

5. Originally Posted by matsp
For single bit bitfields, I'd recommend using "unsigned" - whether it's "unsigned char" or "unsigned long" makes little difference here, but if you say "char bit:1", then you have a valid value of 0 or -1...

--
Mats
I was however just told that Bitfields "suck"? Do you reckon that a vector of bitfields would get the job done, similar to what i suggested in my original post...

6. My 'memcat' is the same as 'memcpy(buffer + size_of_current_buffered_content, data_to_append, size_of_data_to_append);'.

Most compilers will treat 'unsigned', as in 'std::vector <unsigned> vData;', as a 'unsigned int'. (That is, you are manipulating content of a double word size.)

This, '448 % 512', will always be equal to 448. I doubt the instructions even get to the assembler. (That is, '%' has a higher precedence that '!='.)

A 'char' can be treated as 'signed' or 'unsigned'. The value of the target when the bit is set to one is compiler dependent.

No, a 'std::vector' of bit fields will not work.

Soma

7. Okay, so no vectors and no bitfields, just a dynamically allocated array of unsigned chars.

So far, this is how my code looks (Note: no hashing going on, just trying to get the pre processed data right)
Code:
```DWORD GetHashFromFile(HANDLE hFile, std::string &Hash)
{
unsigned char *pBuffer = NULL;
unsigned int FileSizeInBits, NoOfZeroes = 0;
LARGE_INTEGER FileSizeInBytes;

if(!GetFileSizeEx(hFile, &FileSizeInBytes))
{
return (GetLastError());
}

/*Convert from bytes to bits and  add 1 to the size for the "1" bit.*/
FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;

/* Calculate how many zeroes are needed. */
while((FileSizeInBits &#37; 512) != 448)
{
FileSizeInBits++;
NoOfZeroes++;
}

/*Add 64 to the size for the 64 bit size at the end of the data. */
FileSizeInBits += 64

try
{
/* Allocate enough memory. */
pBuffer = new unsigned char[FileSizeInBits];
}
{
return 1;
}

{
return (GetLastError());
}

/* Add the "1" bit */
pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;

for(i = 0; i < NoOfZeroes; i++)
{
/* Backtrack through the buffer to right after the "1" bit, and start adding zeroes. */
pBuffer[FileSizeInBits - 64 - NoOfZeroes + 1 + i] = 0;
}

pBuffer[FileSizeInBits - 64] = FileSizeInBits;
// So now, i _should_ have a buffer with the file loaded into it,
// a "1" bit after the file data, a bunch of zeroes and a 64 bit size value at the end, with a filesize divisible by 512.
}```
When i try to output the contents of my buffer, i get the file and then a bunch of irrelevant data, which is not what i want.

Another thing: A char is 8 bits right? So when i do
Code:
`pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;`
I'm not adding one "1" bit, i'm actually adding "00000001", or am i completely misunderstanding something here? :S

Perhaps i'm in over my head, i must admit, i didn't expect this part to be the one where i'd get stuck.

8. Code:
`pBuffer[FileSizeInBits - 64 - NoOfZeroes] = 1;`
In general, just read my first post again, and turn the warning level up as high as it will go. You have some conversion problems.

Code:
`FileSizeInBits = (FileSizeInBytes.QuadPart * 8) + 1;`
Wrong. You are, with this code, assuming an 8 bit byte. (That's fine. It is virtually always correct.) Stop converting to bits! (That is, divide 448 and 512 by 8 and use those values in the calculation.)

Code:
```while((FileSizeInBits % 512) != 448)
{
FileSizeInBits++;
NoOfZeroes++;
}```
Wrong. The mathematics are really simple. You do not need a loop.

Soma

9. phantomotap:

Do you by:
'memcat' input by '(512 - ((length + 64) % 512))' with:

Code:
```static const unsigned padding[64] =
{
0x80, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0
};```
Mean something like this?
Code:
```unsigned char *pBuffer2 = &pBuffer[FileSize.QuadPart];