# Thread: Can you explain these bitwise operations?

1. ## Can you explain these bitwise operations?

I am working with a file that has a header with the size of the file encoded into 4 bytes like:

The ID3v2 tag size is encoded with four bytes where the most significant bit (bit 7) is set to zero in every byte, making a total of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is represented as \$00 00 02 01.
I have found someone's code that puts this together into the integer value, but i don't understand why they do each step. Is there anyone here who can explain this code to me?

(I know the code's not C++, but I'm hoping you can explain what they're doing and I can convert it to c++)

Code:
```//Read in the bytes (why do they read char[] instead of byte[]?)
char[] tagSize = br.ReadChars(4);    // I use this to read the bytes in from the file

//Store the shifted bytes (why is it int[], not byte[]?)
int[] bytes = new int[4];      // for bit shifting

int size = 0;    // for the final number

/*
* Why are they combining these bytes in this way if they're
* going to again combine them below (in the line setting "size")?
*/

//how do they know they only care about the rightmost bit on the 3rd byte?
//how do they know to shift it 7 to the left?
bytes[3] =  tagSize[3] | ((tagSize[2] & 1) << 7) ;

//Why do they use 63 here (I know it's 111111)?
//how do they know they only want the 3 rightmost of byte 2nd byte?
//And how know to shift it 6 to the left?
bytes[2] = ((tagSize[2] >> 1) & 63) | ((tagSize[1] & 3) << 6) ;
bytes[1] = ((tagSize[1] >> 2) & 31) | ((tagSize[0] & 7) << 5) ;
bytes[0] = ((tagSize[0] >> 3) & 15) ;

//how do they know to shift these bytes the amount that they do to the left?
size  = ((UInt64)bytes[3] | ((UInt64)bytes[2] << 8)  | ((UInt64)bytes[1] << 16) | ((UInt64)bytes[0] << 24)) ;```

2. First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats

3. Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats
THANK YOU! That was a fantastic explanation!

I was pretty close to figuring it out but what threw me off was the second line where the byte is "and"-ed with 3: (tagSize[1] & 3). I was reading that as:
xxxxxxxx &
- - - - - xxx
But obviously 3 means: 00000011 not 00000111.

Thanks again!

4. Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.
With at least one small caveat: in C++, the brackets used to denote an array when declaring an array come after the array name.

5. Originally Posted by laserlight
With at least one small caveat: in C++, the brackets used to denote an array when declaring an array come after the array name.
Yes. I originally thought it was C-code [I didn't look very carefully, of course, as there is a "new" in there, as well as brackets in "weird" places].

--
Mats

6. Originally Posted by matsp
First of all, this is perfectly valid C++ code, I see no reason why you would want/need to change it.

Second, the original format uses 7 bits out of each byte. To make that into a 32 bit (actually 32-bit) number, it is first converted to a set of 8-bit bytes.

Bytes[3] is the lowest byte, so it holds 7 bits from tagsize[3] and 1 bit from tagsize[2].
Bytes[2] is the second lowest byte, so it holds the remaining 6 bits form tagsize[2], and 2 bits from tagsize[1].
Bytes[1] is the reamining bits of tagsize[1] and part of tagsize[0]
Bytes[0] is the last bits of the tagsize[0].

The number of bits correspond to the masks used in the & operation, for example 1 bit -> & 1, 2 bits -> & 3 and 6 bits -> & 63

Once we have the bytes values, we can then shuffle it all into a 32-bit integer. As each byte is 8 bits, we need to shift by 0, 8, 16 and 24 bits to form the 32-bit number.

--
Mats
One question:

when constructing the last byte, why is it "& 15" and not "& 255"? Aren't we constructing an 8-bit byte? Or is it that it won't matter, it's the same thing since everything moving in from the left is "0" anyways? Is there any reason that makes "& 15" better/quicker than "&255"?

Referring to:
Code:
`bytes[0] = ((tagSize[0] >> 3) & 15) ;`

7. Originally Posted by 6tr6tr
One question:

when constructing the last byte, why is it "& 15" and not "& 255"? Aren't we constructing an 8-bit byte?

Referring to:
Code:
`bytes[0] = ((tagSize[0] >> 3) & 15) ;`
Because you just want the last four bits of tagSize[0]>>3? (In other words, if the sign bit gets set somehow, and the machine fills in with the sign bit when shifting right, you don't want all that in your number.)