Thread: Minimum valid address value

  1. #1
    Registered User
    Join Date
    Apr 2008
    Posts
    6

    Minimum valid address value

    I would like to know if there is a certain minimum value below which it can be assumed that the address of a pointer does not actually contain information which can be dereferenced by the program. Let me give a use case.

    I am defining transitions for a scanner's state machine. Let's assume we have something like:

    Code:
    typedef enum {Q0, Q1, ..., UNDEFINED, ..., numstates } state;
    
    state scanner_table[num_states][UCHAR_MAX+1];
    
    typedef struct {
        state current_state;
        state next_state;
        const unsigned char *transition_chars;
    } transition;
    
    const transition transitions[] = {
        { Q0, UNDEFINED, NULL },
        { Q0, Q1, "ABCDEFGH...XYZabc...xyz" },
        ...
    };
    My approach is to populate scanner table by iterating over the elements of transitions in with the following algorithm:

    Code:
    /* scanner_table will automatically be built from transitions as follows:
     *
     *     foreach transition in transitions:
     *         s = transition.transition_chars
     *         if s is NULL:
     *             scanner_table[transition.current_state][*] <-
     *                 transition.next
     *         else:
     *             foreach c in s:
     *                 scanner_table[transition.current_state][c] <-
     *                     transition.next
     *
     * In order for this to work as intended, any NULL transitions must be
     * first. This is so that default transitions can be included without
     * having to list every possible character. All states must be part of
     * the enum state */
    The problem here is the "foreach c in s" step. The only way to realize the end of the string is to encounter a NUL character. This becomes a problem if one wishes to be able to accept NUL in the input. My idea for fixing this was to do something like:

    Code:
    enum {DEFAULT, NUL_CHAR, ...};
    In this way, special transitions could be indicated with the following syntax:

    Code:
    { Q0, QNUL, NUL_CHAR }
    And my initialization code would go something like:

    Code:
    transition *t = &transitions[i];
    unsigned char *s = t->transitions_char;
    switch ((size_t) s) {
        case DEFAULT:
            /* code to initialize scanner_table[t->current_state][0..UCHAR_MAX]
             * to t->next_state */
            break;
        case NUL_CHAR:
            scanner_table[t->current_state]['\0'] = t->next_state;
            break;
        default:
            /* code to initialize scanner_table[t->current_state][s[0...]] to
             * t->next_state */
    }
    In my opinion, this would be a nice solution in terms of making the format of the transitions simple while also making it reasonably efficient to populate state_table (and anyways, that is a one-time operation).

    The only gotcha is that e.g. NUL_CHAR is actually pointer address 0x00000001 (and if I defined helpers like AtoZ they would live at 0x00000002, etc.). In general, this should not be a problem, because it would be highly unlikely that any usable data (and especially my const strings declared in transitions) would actually live at one of those addresses, but I would like to know if there is any way that I can be certain of it. Obviously, 0 is not a valid memory address.

    Is there a C standard which defines a minimum valid address? If so, I would be able to ensure that any special "pointer constants" fall below this value. I skimmed C99 but didn't see anything related. If there's not a C standard, is there a Unix standard or a strong convention covering this?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    There is no such thing.

    Even on the same physical machine, different operating systems will have different ideas as to what the minimum address is.

    On machines without an OS, the minimum valid address may in fact be 0x00000000 and NULL is not trapped.

    It's even an assumption (on your part) that the whole of the code space and the whole of the data space lie in contiguous memory.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2008
    Posts
    6
    Thanks for the quick reply.

    Quote Originally Posted by Salem View Post
    It's even an assumption (on your part) that the whole of the code space and the whole of the data space lie in contiguous memory.
    I don't follow this comment. Where am I making this assumption in my example? I ask in earnest in case I'm doing something wrong (other than the hackish solution I am purporting, of course). Are you saying that I shouldn't even assume that a const char * is not located at 0x0000?

    By the way, this code doesn't have to work in microcomputers or safety-critical applications or anything else exotic.... It's just a Unix project.
    Last edited by BMintern; 04-11-2008 at 11:52 PM.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Consider that in a Unix/Linux system, if you touch an address you don't own then the OS will kill your program with a segmentation fault.

    So the first thing you need to do is write yourself an 'isValidAddress()' function which utilises the 'signal' API to trap SIGSEGV. This function should then return the appropriate boolean result.

    Looking up the page size of your system (it's usually 4K on Linux IIRC) will enable you to test the start of each 4K page, then know you can access (or not) the following 4K of data.

    Also study the output of
    nm a.out

    In particular, look for symbols like "text_begin" and "text_end" which mark the lower and upper limit of your program code. There should be similar symbols for the data segment as well. With a bit of work, you might be able to read these values within your program.


    The last thing you need to take care of is the pattern you're looking for is itself stored in the memory you're looking through, so you need to detect that as well.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Apr 2008
    Posts
    6
    That's a pretty clever solution. Of course, it would be awesome if the answer was something like "0-4095 is never a valid address", but it's never that easy, is it?

    In other words, if I wanted my code above to be robust, I would want to call isValidAddress() on each special value (DEFAULT, NUL_CHAR, etc.) to ensure that those values really are not valid addresses. In most cases, the result from that call would only hold temporarily (I assume, as addressable memory changes), but in this case I am referring to static strings, so it should be ok.

    For each special value which returns true (hopefully none of them), I would then need to turn off that functionality for that particular run of the program.

    After some thought, I have realized that I only need 2 special values: DEFAULT and NUL_CHAR. Other values such as A_to_Z can be defined as macros, like:

    Code:
    #define A_to_Z "ABCD...XYZ"
    #define a_to_z "abcd...xyz"
    #define DIGITS "0123456789"
    ...
    ... transitions[] = {
        ...
        {Q0, Q1, A_to_Z a_to_z DIGITS},
        ...
    };
    Because I only needed two special values, I'm going to go ahead and
    #define DEFAULT 0
    #define NUL_CHAR 1
    and treat those two cases special. I'm willing to take the extremely slim (if not impossible) chance that one of the transition_chars strings will turn out to be located at address 0 or 1. That is the only time I would experience any problems, when a state transition which is supposed to be on some list of characters is instead interpreted to be one of the special cases.

    Again, thanks for carefully considering my question and giving me the answer I was looking for, along with some good ideas if I ever really do need to determine which addresses are valid.
    Last edited by BMintern; 04-12-2008 at 02:56 AM.

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I've been told that on a Windows PC you can assume that there are no valid memory addresses below 64K (0x00010000). That gives sufficient room to still crash a program that tries to access a member of a large struct for example.
    Note this is rather platform specific advice here.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    In order to implement your scheme, you are being forced to delve into platform-specific details. I think that's good evidence that there's a better approach.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 15
    Last Post: 09-30-2008, 02:12 AM
  2. import address table (IAT)
    By George2 in forum C++ Programming
    Replies: 5
    Last Post: 02-20-2008, 08:01 AM
  3. Writing array, to file
    By zootreeves in forum C Programming
    Replies: 9
    Last Post: 09-08-2007, 05:06 PM
  4. IPv6 multicast example code
    By Sang-drax in forum Networking/Device Communication
    Replies: 7
    Last Post: 07-25-2005, 09:26 AM
  5. MSN Vital Information
    By iain in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 09-22-2001, 08:55 PM