# How to set an 32 bit allocation to 31 bit for address, 1 bit for flag?

This is a discussion on How to set an 32 bit allocation to 31 bit for address, 1 bit for flag? within the C Programming forums, part of the General Programming Boards category; I am using the following code to read in the characters of a file and put them into an array. ...

1. ## How to set an 32 bit allocation to 31 bit for address, 1 bit for flag?

I am using the following code to read in the characters of a file and put them into an array. Thus, each array element contains a character from the file

Code:
unsigned char *str;
str = (unsigned char*)malloc(filelength*sizeof(unsigned char));
fread(str, sizeof(unsigned char), filelength, file);
But how do I just assign 31 bits of str as the address location and the MSB 1 bit as a flag bit?

This flag bit is used to indicate if this array element is read before. If this array element is read before, then the flag bit is set to 1.

I think it is possible to assign an integer to every array element as the flag bit, but that will be a waste of space, using 1 byte as a flag bit.

Any suggestions or help! Thanks!

2. >> But how do I just assign 31 bits of str as the address location and the MSB 1 bit as a flag bit? <<

You don't*. Store the flag elsewhere.

>> I think it is possible to assign an integer to every array element as the flag bit, but that will be a waste of space, using 1 byte as a flag bit. <<

Then store 8 flag bits in one byte.

*This is technically possible on some platforms, some of the time, but must not be relied on.

3. Huh? You've got a string where each element is 8 bits (1 byte). But then you switch to 32-bit usage. If you want a flag bit for each array element then you're going to have to do it on the MSB of every byte, not every 4 bytes.

As long as you stick to the standard ASCII character set you can use the MSB of each array element as a flag bit without any problem since standard ASCII characters fit in 7 bits.

So if your string looks like this:
Code:
char str[] = "itsme86";
You could do something like this:
Code:
itsme@dreams:~/C$cat flag.c #include <stdio.h> int main(void) { char str[] = "itsme86"; int i; // Let's mark the 3rd and 5th letters as read str[2] |= 0x80; str[4] |= 0x80; for(i = 0;str[i];i++) { printf("Character '%c' has been read? ", str[i] & 0x7F); if(str[i] & 0x80) puts("Yes"); else puts("No"); } return 0; } itsme@dreams:~/C$ ./flag
Character 'i' has been read? No
Character 't' has been read? No
Character 's' has been read? Yes
Character 'm' has been read? No
Character 'e' has been read? Yes
Character '8' has been read? No
Character '6' has been read? No
itsme@dreams:~/C\$

4. Well ive not got to the stage in my textbook on I/O so im not sure on responses etc. but here is a little bit .... about bits!

for instance:
you could create a struct of 3 unsigned ints; list, seen and number that do *something*. However, the fields 'list' and 'seen' can have only two values, 0 and 1, so only one bit is needed to represent them. we can redefine the structure using bit fields, so that it takes only two bytes, by following each field with a colon and a number of bits to be used for that field:

Code:
struct item {
unsigned int list:1;    /* true if item is in the list */
unsigned int seen:1;    /* true if this item has been seen */
unsigned int number:14; /* item number */
};
note: packed structures should be used with care. The code to extract data from bit fields is relatively large and slow. So there is no need to use this unless storage is a problem?!

i mean, that whole thing could be irrelevant, but it sounds like something you might consider? i mean i dunno, im only a student! itsme86 looks like hes got a good idea too!

--CHriS

5. If you attempt to use a bit in a string as a numeric value and/or flag then all I/O functions relating to strings will fail miserably and cause all sorts of problems. The problem is that the system does not know that you want bit <x> to represent something other than a string literal. In essence you are working in direct contradiction to what your compiler and your OS expect. Don't do it.

6. itsme86, you ROCK!!!

thanks for your help! This is what I want!
But is it possible that there is character that need 8 bits to represent it?

Another question, since I can use 7 bits to represent a character and 1 bit to represent a flag. How do I use 30 bits to represent a pointer (that is, the 30 bits is use to represent an address that the pointer to), and 2 bits as flag?

I am reading a program source code which is doing this, but contain alot of files and its coding is too hard for me to comprehend. Here is a comment from the source code
Code:
/*
The most significant bits in the entries of table \texttt{streetab}
are used as marking bits. For each branching node we need 3 marking bits, and
for each leaf we need 2 marking bits. For each leaf and each
branching node we store 2 marking bits in their first integer: The
\emph{leaf-bit} and the \emph{rightmost-child} bit. For
each branching node we store 1 marking bit in the second integer,
the \emph{unevaluated} bit.
*/

/*
As a consequence, we have 30 bits to store
$$\mathit{left}(\overline{u})$$ or $$\mathit{lp}(\overline{u})$$
for a branching node or leaf $$\overline{u}$$.
Moreover, we have 31 bits to store $$\mathit{right}(\overline{u})$$ or
$$\mathit{firstchild}(\overline{u})$$ for a branching node $$\overline{u}$$.
$$\mathit{left}(\overline{u})$$,
$$\mathit{right}(\overline{u})$$, and
$$\mathit{lp}(\overline{u})$$ are in the range $$[0,n]$$.
$$\mathit{firstchild}(\overline{u})$$ is in the range $$[0,3n]$$. Hence
$$3n\leq 2^{31}-1$$ must be satisfied, i.e.\
the maximal length of the input string is $$715{,}827{,}882$$.
*/
It talks about the data structure of a suffix tree, where a branching node got 2 integers. The first integer is a pointer to the string, second integer is a pointer to a leaf node. But the first integer use 30 bits to store address, second integer use 31 bits to store address.

7. Originally Posted by franziss
But is it possible that there is character that need 8 bits to represent it?
Didn't you want to represent your characters as 2-bits (in your last post?). Because if you don't, it doesn't make sense to try to cramp a few more bits into your sequence pointers (below).
Anyways, Google for ASCII table. Most english characters including the entire alphabet are within the lower 128 entries of the table, so none of them requires their 8th bit. You can use that, but realize that by changing it you also change the representation of that character. If you want to get back to the original char, you'll have to mask that bit before printing.

Originally Posted by franziss
Another question, since I can use 7 bits to represent a character and 1 bit to represent a flag. How do I use 30 bits to represent a pointer (that is, the 30 bits is use to represent an address that the pointer to), and 2 bits as flag?
Why would you want to do this? How long is the sequence/string your pointer points to? If it's any longer than a few bytes there is absolutely no reason to get so drastic. Just store a byteflag at the beginning of the sequence.

Usually you don't know much about the location malloc returns to you. But if you really want to produce some bad, non-portable code, try using bit 0 and bit 31. Again, you'll have to strip those before dereferencing the pointer.
Bit 0 should usually be 0 due to alignment rules on Windows and most Linux platforms. Bit 31 is usually unused, too. On an AMD64 you'll get into serious trouble though.
Again, you shouldn't be doing this though. It's highly unportable, unreadable, bad code.
Note that you cannot assume that on a, i.e. 256MB system all pointers will be below 2^28 because all memory locations are virtual to your process and don't tell you anything about the real location in memory.

8. Originally Posted by Nyda
Why would you want to do this? How long is the sequence/string your pointer points to? If it's any longer than a few bytes there is absolutely no reason to get so drastic. Just store a byteflag at the beginning of the sequence.
Well, my project also requires me to implement an suffix tree data structure (which is not to store the input string) of an algorithm by Dr. Stefan Kurtz, paper titled "Efficient Implementation of lazy suffix trees". His data structure is supposed to be currently the most efficient one for suffix tree.

And his has coded it and the code is at http://bibiserv.techfak.uni-bielefeld.de/wotd

But.... I'm too lousy to understand his code....