Can you explain these bitwise operations?

• 10-29-2008
6tr6tr
Can you explain these bitwise operations?
I am working with a file that has a header with the size of the file encoded into 4 bytes like:

Quote:

The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as \$00 00 02 01.
I have found someone's code that puts this together into the integer value, but i don't understand why they do each step. Is there anyone here who can explain this code to me?

(I know the code's not C++, but I'm hoping you can explain what they're doing and I can convert it to c++)

Code:

```//Read in the bytes (why do they read char[] instead of byte[]?) char[] tagSize = br.ReadChars(4);  // I use this to read the bytes in from the file   //Store the shifted bytes (why is it int[], not byte[]?) int[] bytes = new int[4];    // for bit shifting   int size = 0;  // for the final number   /*  * Why are they combining these bytes in this way if they're  * going to again combine them below (in the line setting "size")?  */   //how do they know they only care about the rightmost bit on the 3rd byte? //how do they know to shift it 7 to the left? bytes[3] =  tagSize[3] | ((tagSize[2] & 1) << 7) ;   //Why do they use 63 here (I know it's 111111)? //how do they know they only want the 3 rightmost of byte 2nd byte? //And how know to shift it 6 to the left? bytes[2] = ((tagSize[2] >> 1) & 63) | ((tagSize[1] & 3) << 6) ; bytes[1] = ((tagSize[1] >> 2) & 31) | ((tagSize[0] & 7) << 5) ; bytes[0] = ((tagSize[0] >> 3) & 15) ;   //how do they know to shift these bytes the amount that they do to the left? size  = ((UInt64)bytes[3] | ((UInt64)bytes[2] << 8)  | ((UInt64)bytes[1] << 16) | ((UInt64)bytes[0] << 24)) ;```
• 10-29-2008
matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats
• 10-29-2008
6tr6tr
Quote:

Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats

THANK YOU! That was a fantastic explanation!

I was pretty close to figuring it out but what threw me off was the second line where the byte is "and"-ed with 3: (tagSize[1] & 3). I was reading that as:
xxxxxxxx &
- - - - - xxx
But obviously 3 means: 00000011 not 00000111.

Thanks again!
• 10-29-2008
laserlight
Quote:

Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

With at least one small caveat: in C++, the brackets used to denote an array when declaring an array come after the array name.
• 10-29-2008
matsp
Quote:

Originally Posted by laserlight
With at least one small caveat: in C++, the brackets used to denote an array when declaring an array come after the array name.

Yes. I originally thought it was C-code [I didn't look very carefully, of course, as there is a "new" in there, as well as brackets in "weird" places].

--
Mats
• 10-29-2008
6tr6tr
Quote:

Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats

One question:

when constructing the last byte, why is it "& 15" and not "& 255"? Aren't we constructing an 8-bit byte? Or is it that it won't matter, it's the same thing since everything moving in from the left is "0" anyways? Is there any reason that makes "& 15" better/quicker than "&255"?

Referring to:
Code:

`bytes[0] = ((tagSize[0] >> 3) & 15) ;`
• 10-29-2008
tabstop
Quote:

Originally Posted by 6tr6tr
One question:

when constructing the last byte, why is it "& 15" and not "& 255"? Aren't we constructing an 8-bit byte?

Referring to:
Code:

`bytes[0] = ((tagSize[0] >> 3) & 15) ;`

Because you just want the last four bits of tagSize[0]>>3? (In other words, if the sign bit gets set somehow, and the machine fills in with the sign bit when shifting right, you don't want all that in your number.)