Thread: Arrays with NULL bytes

  1. #1
    Registered User
    Join Date
    Nov 2010
    Posts
    5

    Arrays with NULL bytes

    Hello,

    I'm working with arrays that might have NULL bytes in them and I'm wondering how to determine the length of the array or store it somewhere with the array (strlen() won't help because of the NULL, right?).

    I've found advice like store the length of the array in the first byte of the array, but since sizeof(size_t) is 8 should I leave the first 8 bytes for the length?

    Would it be better do define my own structure which would store the array and its length? What's the usual way these things are handled in practice?

    Thank you.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Sure, I would suggest
    Code:
    struct rec {
        unsigned char *data;
        size_t allocatedSpace;
        size_t usedSpace;
    };
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    If the array of characters is always 255 or less in size, you can store the size in the first character. This is called a p-string (p for pascal).

  4. #4
    Registered User
    Join Date
    May 2012
    Posts
    505
    You've got two options.

    One is to define a two member structure, with a char * and a length member. The other is to pass about the length in a separate variable. Which you choose depends on whether you need lots of these objects (go for the structure), and if the code is small and simple enough to do without a structure.
    If the reason you need nuls is that the array contains several strings, it's quite common to use two nuls to represent the terminating state.
    You can store length in the buffer, and in fact you'll do this for serialisation, but to do it for passing data around internally is a hack, and best avoided, unless, as with the pstrings, you're integrating with another language.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  5. #5
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by Malcolm McLean View Post
    You can store length in the buffer, and in fact you'll do this for serialisation, but to do it for passing data around internally is a hack, and best avoided, unless, as with the pstrings, you're integrating with another language.
    I've used p-strings for text file sorting, replacing the newline characters with length counts. I read in the file at buffer +8, to leave room for the first count of the first record (at buffer +7). I'm also using an array of pointers to those strings, and it's the array of pointers that gets sorted. There's a bit of a speed advantage on the compares since I can use memcmp which will use the Intel X86 string compare instruction (repe cmpsb).

  6. #6
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by rcgldr View Post
    I've used p-strings for text file sorting, replacing the newline characters with length counts. I read in the file at buffer +8, to leave room for the first count of the first record (at buffer +7). I'm also using an array of pointers to those strings, and it's the array of pointers that gets sorted. There's a bit of a speed advantage on the compares since I can use memcmp which will use the Intel X86 string compare instruction (repe cmpsb).
    The other advantage over the structure method is that you only need one allocation. There are reasons for doing it. But I think it's a bit of a hack. Alignment is a big problem for the mem and str functions. If you can convince the compiler that you're aligned and a multiple of 4 or 8 bytes, you can get a huge speed up.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  7. #7
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by Malcolm McLean View Post
    Alignment is a big problem for the mem and str functions.
    That's somewhat masked by the way the outer cache works. A cache line is going to load or store on ram boundaries, and the boundary depends if the cpu is dual channel, tripple channel, or quad channel. If the process is somewhat sequential, such as a bottom up merge sort, then most of the cache loads end up being utilized.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Malcolm McLean View Post
    The other advantage over the structure method is that you only need one allocation. There are reasons for doing it. But I think it's a bit of a hack. Alignment is a big problem for the mem and str functions. If you can convince the compiler that you're aligned and a multiple of 4 or 8 bytes, you can get a huge speed up.
    If you combine the allocated size and used size, i.e. the string doesn't need to grow, then it can be done as one allocation with the string tacked onto the end of the struct.

    Then one could go a bit fuerther and basically reimplement the BSTR type window's uses.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. null and int arrays
    By MarlonDean in forum C++ Programming
    Replies: 9
    Last Post: 05-15-2008, 04:55 AM
  2. Sending null-bytes over a socket?
    By Siphon in forum C Programming
    Replies: 2
    Last Post: 10-08-2007, 03:21 PM
  3. Few questions: Bytes, Arrays, Declarations, and Buff
    By viciousv322 in forum C Programming
    Replies: 7
    Last Post: 12-15-2005, 07:54 PM
  4. Null Terminated Arrays
    By sean in forum A Brief History of Cprogramming.com
    Replies: 29
    Last Post: 06-17-2002, 11:39 AM
  5. Ez ... right? Has't to do with arrays being null'd
    By knave in forum C Programming
    Replies: 4
    Last Post: 02-15-2002, 01:56 AM