I would like to know if there is a certain minimum value below which it can be assumed that the address of a pointer does not actually contain information which can be dereferenced by the program. Let me give a use case.
I am defining transitions for a scanner's state machine. Let's assume we have something like:
Code:
typedef enum {Q0, Q1, ..., UNDEFINED, ..., numstates } state;
state scanner_table[num_states][UCHAR_MAX+1];
typedef struct {
state current_state;
state next_state;
const unsigned char *transition_chars;
} transition;
const transition transitions[] = {
{ Q0, UNDEFINED, NULL },
{ Q0, Q1, "ABCDEFGH...XYZabc...xyz" },
...
};
My approach is to populate scanner table by iterating over the elements of transitions in with the following algorithm:
Code:
/* scanner_table will automatically be built from transitions as follows:
*
* foreach transition in transitions:
* s = transition.transition_chars
* if s is NULL:
* scanner_table[transition.current_state][*] <-
* transition.next
* else:
* foreach c in s:
* scanner_table[transition.current_state][c] <-
* transition.next
*
* In order for this to work as intended, any NULL transitions must be
* first. This is so that default transitions can be included without
* having to list every possible character. All states must be part of
* the enum state */
The problem here is the "foreach c in s" step. The only way to realize the end of the string is to encounter a NUL character. This becomes a problem if one wishes to be able to accept NUL in the input. My idea for fixing this was to do something like:
Code:
enum {DEFAULT, NUL_CHAR, ...};
In this way, special transitions could be indicated with the following syntax:
Code:
{ Q0, QNUL, NUL_CHAR }
And my initialization code would go something like:
Code:
transition *t = &transitions[i];
unsigned char *s = t->transitions_char;
switch ((size_t) s) {
case DEFAULT:
/* code to initialize scanner_table[t->current_state][0..UCHAR_MAX]
* to t->next_state */
break;
case NUL_CHAR:
scanner_table[t->current_state]['\0'] = t->next_state;
break;
default:
/* code to initialize scanner_table[t->current_state][s[0...]] to
* t->next_state */
}
In my opinion, this would be a nice solution in terms of making the format of the transitions simple while also making it reasonably efficient to populate state_table (and anyways, that is a one-time operation).
The only gotcha is that e.g. NUL_CHAR is actually pointer address 0x00000001 (and if I defined helpers like AtoZ they would live at 0x00000002, etc.). In general, this should not be a problem, because it would be highly unlikely that any usable data (and especially my const strings declared in transitions) would actually live at one of those addresses, but I would like to know if there is any way that I can be certain of it. Obviously, 0 is not a valid memory address.
Is there a C standard which defines a minimum valid address? If so, I would be able to ensure that any special "pointer constants" fall below this value. I skimmed C99 but didn't see anything related. If there's not a C standard, is there a Unix standard or a strong convention covering this?