Thread: Pointing to a flexible array member (C99). How do we know the point in the array?

  1. #1
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357

    Pointing to a flexible array member (C99). How do we know the point in the array?

    Hello to all.

    I have this code :

    Code:
    /*********************************************************
     * From C PROGRAMMING: A MODERN APPROACH, Second Edition *
     * By K. N. King                                         *
     * Copyright (c) 2008, 1996 W. W. Norton & Company, Inc. *
     * All rights reserved.                                  *
     * This program may be freely distributed for class use, *
     * provided that this copyright notice is retained.      *
     *********************************************************/
    
    /* remind2.c (Chapter 17, page 418) */
    /* Prints a one-month reminder list (dynamic string version) */
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define MAX_REMIND 50   /* maximum number of reminders */
    #define MSG_LEN 120     /* max length of reminder message */
    
    struct vstring {
      int size;
      char s[];
    };
    
    int read_line(char str[], int n);
    
    int main(void)
    {
      struct vstring  *reminders[MAX_REMIND];
      char day_str[3], msg_str[MSG_LEN];
      int len, day, i, j, num_remind = 0;
    
      for (;;) {
        if (num_remind == MAX_REMIND) {
          printf("-- No space left --\n");
          break;
        }
    
        printf("Enter day and reminder: ");
        scanf("%2d", &day);
        if (day == 0)
          break;
        sprintf(day_str, "%2d", day);
    
        len = 3;  /* dd + space */
        len += read_line(msg_str, MSG_LEN);
    
        for (i = 0; i < num_remind; i++)
          if (strcmp(day_str, reminders[i]->s) < 0)
            break;
        for (j = num_remind; j > i; j--)
          reminders[j] = reminders[j-1];
    
        reminders[i] = malloc(2 + sizeof(struct vstring) + len);
        if (reminders[i] == NULL) {
          printf("-- No space left --\n");
          break;
        }
        reminders[i]->size = len;
    
        strcpy(reminders[i]->s, day_str);
        strcat(reminders[i]->s, " ");
        strcat(reminders[i]->s, msg_str);
    
        num_remind++;
      }
    
      printf("\nDay Reminder\n");
      for (i = 0; i < num_remind; i++)  {
        printf(" %.*s\n", reminders[i]->size, reminders[i]->s);
      }
    
      return 0;
    }
    
    int read_line(char str[], int n)
    {
      int ch, i = 0;
    
      while (isspace(ch = getchar()));
    
      while (ch != '\n' && ch != EOF) {
        if (i < n)
          str[i++] = ch;
        ch = getchar();
      }
      return i;
    }
    This is from here https://github.com/twcamper/c-progra...ng-reminders.c

    if someone wants the solutions of King's Book C programming a Modern Approach Second Edition. The above link in github is very helpful.

    I don't understand here :

    Code:
     strcpy(reminders[i]->s, day_str);
     strcat(reminders[i]->s, " ");
     strcat(reminders[i]->s, msg_str);
    We have a structure that is growing according to size of len variable that represents the number of characters of msg_str. read_line functions returns this length.

    What is not clear to me is the point that every pointer points in the s array. For example if I have the "20 June shopping" and "22 June Holidays" then reminders[0] points to "20 June shopping" and so forth . But what is the view of the memory? At the first point s array has 3 + 13 chars size and reminders[0] points to this string. At the second point s array has 13 + 13 = 26 and reminders[1] points to this string?

    How the reminder[i] "knows" where it points in s array? s is standart address is not changed.I am little confused.

    In the Wikipedia's example http://en.wikipedia.org/wiki/Flexible_array_member it is clear that every time vec pointer points to array[0] , array[1] and so forth. Instead of reminder code that I posted above.
    Last edited by Mr.Lnx; 05-08-2015 at 07:21 AM.

  2. #2
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Mr.Lnx View Post
    What is not clear to me is the point that every pointer points in the s array.
    No. There is no "the s array".

    Each element in the reminders array has their own member s , which happens to be a flexible array of char type.

    Consider a different example:
    Code:
    struct pair {
        int a;
        int b;
    };
    
    struct pair my_pairs[50];
    Now, my_pairs[0].b and my_pairs[1].b are obviously not the same, are they? If you modify one, the other does not change.

    In the code you showed, reminder is an array of pointers (to struct vstring). Each of them is dynamically allocated, to sizeof (struct vstring) + string_length + 1 to accommodate both the structure and the string contained in the flexible array member -- the +1 is for the end-of-string '\0'. (Well, actually the example uses +2, just to make sure there is room for one extra char, in addition to the end of string mark.)

    Assuming that reminder[0] and reminder[1] are valid pointers, and differ (point to different instances of struct vstring, for example separately allocated using malloc()), then it should be obvious that reminder[0]->s and reminder[1]->s point to different strings, just like reminder[0]->size and reminder[1]->size are different integers.

    In the loop, the loop variable i changes for each new reminder. Although within the loop body, reminder[i] does refer to the same pointer, the value of i changes from loop iteration to the next, and therefore reminder[i] refers to different pointers during different iterations of the loop body. Remember: i changes!

    The memory layout of the structure, according to C99 rules, is
    Code:
    int size;
    [ optional padding, if the compiler thinks it is necessary ]
    s[0]
    s[1]
    s[2]
    [ and so on ]
    Because the structure ends with a flexible array member, you cannot create an array of them (struct vstring doesnotwork[15]; for example), but must always use a pointer to refer to such structs. If you wanted to use a single malloc()'d area to hold more than one of them, it'd be difficult; the alignment required is platform and architecture-specific, and I don't think you can find it out using pure C99 alone. In practice, you'd skip bytes/chars until the pointer, converted to intptr_t, is a multiple of the required alignment.

    All that matters, really, is that the contents of the flexible array member immediately follow the other contents of the structure. Not counting any padding the compiler decides to insert between struct members, of course.

  3. #3
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    @Nominal Animal

    the correct version of read_line function I think is :

    Code:
    int read_line(char str[] , int n)
    {
    int ch , i =0 ;
    while( (ch= getchar()) !='\n' && i<n) 
              str[i++]= ch;
    str[i]='\0';
    return i;
    }
    In order to take the '\0' character. In addition the +2 in the malloc I think is for the day i.e day_str.
    Code:
     strcpy(reminders[i]->s, day_str);
    So according to your explanation if user gives 5 strings for this reminder program and assume that they have all 5 characters length for simplicity then the memory layout will be

    Code:
    int size // the length of the string
    [optional padding -wholes for alignment]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5]
    and this * 5 times. Right?
    Last edited by Mr.Lnx; 05-08-2015 at 05:21 PM.

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Mr.Lnx View Post
    the correct version of read_line function I think is
    Well, there is a slight issue: it does not consider premature end of input (EOF), only a newline. Using while((ch=getchar()) != EOF && ch != '\n' && i<n) would fix that.

    Plus, I prefer getline() semantics. One way to write it yourself, with small changes, is
    Code:
    #include <stdlib.h>
    #include <string.h>
    #include <stdio.h>
    #include <ctype.h>
    #include <errno.h>
    
    size_t read_line(char **const lineptr, size_t *const sizeptr, FILE *const in)
    {
        char *line;
        size_t size, have;
        int c;
    
        if (!lineptr || !sizeptr || !in) {
            errno = EINVAL; /* Invalid parameters! */
            return 0;
        }
    
        if (*lineptr) {
            line = *lineptr;
            size = *sizeptr;
        } else {
            line = NULL;
            size = 0;
        }
        have = 0;
    
        while (1) {
    
            if (have + 2 >= size) {
                /* Grow size by a bit more, to avoid calling realloc too often.
                 * This one grows size to the next multiple of 128 from have.
                 * If you don't like it, just use
                 *     size = have + 256;
                 * or similar, that'll work just as well.
                 * Note that realloc(NULL, size) == malloc(size).
                */
                size = (have | 127) + 129;
                line = realloc(line, size);
                if (!line) {
                    errno = ENOMEM;
                    return (size_t)0;
                }
                *lineptr = line;
                *sizeptr = size;
            }
    
            c = getc(in);
            if (c == EOF || c == '\n')
                break;
    
            /* Note: We do not store newline in line. */
    
            /* Skip control characters, except whitespace. */
            if (iscntrl(c) && !isspace(c))
                continue;
    
            line[have++] = c;
        }
    
        /* I/O error? Those are rare, but possible. */
        if (ferror(in)) {
            errno = EIO;
            return 0;
        }
    
        /* Because we made sure we had room for
         * at least two characters, and one might have
         * taken by c, we still are guaranteed to have
         * room for the string-terminating '\0'. */
        line[have] = '\0';
    
        /* We always set errno, unlike getline(). */
        errno = 0;
        return have;
    }
    This has no line length limitations, and is very easy to use:
    Code:
        char  *line = NULL;
        size_t size = 0;
        size_t len;
    
        while (1) {
            len = read_line(&line, &size, stdin);
            if (errno) {
                fprintf(stderr, "Error reading from standard input: %s.\n", strerror(errno));
                return EXIT_FAILURE;
            }
    
            /* End of input? Note, we tried to read first. */
            if (len == 0 && feof(stdin))
                break;
    
            /* We have 'len' characters in line, no newline,
             * and no non-whitespace control characters
             * such as embedded '\0'. Do something with line,
             * but do not modify the pointer itself. */
    
        }
    
        /* Line buffer is no longer needed, so free it. */
        free(line); /* Note, free(NULL) is safe. */
        line = NULL;
        size = 0;
    In your use case, I'd read each user day and reminder as a line, then just parse the day out of the line using sscanf():
    Code:
        char *curr = line;
        int day, n;
    
        n = -1; (void)sscanf(curr, " %d %n", &day, &n);
        if (n > 0) {
            curr += n;
            have -= n;
    
            /* Day is in 'day', 'curr' has the rest of the line,
             * and 'have' is updated to the length left at 'curr'. */
    
        } else {
    
            /* Silly user, did not start the line with a number. */
    
        }
    Although sscanf() returns the number of successful conversions, the implementations differ on whether %n (which means "store the number of characters consumed thus far to an int") is included as a conversion or not. The above is a safe workaround.

    Whether you use this unlimited-line-length approach or not, is completely up to you; I'm sure you can see why it would be useful in real world applications. For learning purposes or an exercise, it might be just unnecessary added complexity. However, because it is so useful in practice, I wanted to show how to do it.

    Quote Originally Posted by Mr.Lnx View Post
    the memory layout will be
    Pretty much right, yes.

    There is optional padding at the end of each structure, because malloc() allocates in aligned chunks. (That padding just makes the size of the allocated memory chunk a multiple of that alignment.)

    Depending on how the C library implements memory allocation, there is often also metadata stored before each chunk. So, if we're really precise, the memory layout is likely to be something like
    Code:
    [ C library metadata for dynamically allocated chunk 1 ]
    size == 5
    [ possible padding between an int and a char ]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5] == '\0'
    [ possible padding depending on the C library ]
    
    [ C library metadata for dynamically allocated chunk 2 ]
    size == 5
    [ possible padding between an int and a char ]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5] == '\0'
    [ possible padding depending on the C library ]
    
    [ C library metadata for dynamically allocated chunk 3 ]
    size == 5
    [ possible padding between an int and a char ]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5] == '\0'
    [ possible padding depending on the C library ]
    
    [ C library metadata for dynamically allocated chunk 4 ]
    size == 5
    [ possible padding between an int and a char ]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5] == '\0'
    [ possible padding depending on the C library ]
    
    [ C library metadata for dynamically allocated chunk 5 ]
    size == 5
    [ possible padding between an int and a char ]
    s[0]
    s[1]
    s[2]
    s[3]
    s[4]
    s[5] == '\0'
    [ possible padding depending on the C library ]
    but the amount of padding depends on both the compiler (it selects the best one for the architecture) and the C library (as it chooses the alignment for the chunks), and the metadata depends on the C library malloc() implementation. And there is no guarantee the chunks are consecutive in memory, either.

    (Technically, most C libraries extend the uninitialized data segment using architecture-specific implementation of sbrk() for small allocations, and switch to memory-mapping for large allocations. This means that freeing a lot of small allocations is unlikely to release the memory back to the operating system, but freeing a large allocation is likely to be released back to the operating system. But we're getting well outside what the C standards tell us, into the practical details of most implementations, which are subject to differ and change.)

    Because the metadata often precedes the structure itself, writing to flexible array members at negative indexes or exceeding the amount of memory allocated to a previous structure in memory, garbles the metadata. Such bugs are often only seen when you attempt to free() the structures. The errors seen vary from C library to another, but any error at free() time is almost certainly an indication that you accidentally wrote to a dynamically allocated memory past the allocated amount. (Further complication is the allocation padding, as often you have something like one to 15 extra char's available in the structure, before you start overwriting the metadata in another dynamically allocated structure.)
    Anyway, it does explain why, if you see library errors or crashes at free(), the almost guaranteed culprit is a write out of bounds in one of your dynamically allocated structures. I've never seen any other reason, actually

  5. #5
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    @Nominal Thank you for the big explanation. I have some queries though

    I don't understand what is [ C library metadata for dynamically allocated chunk n ] for example is only a label for your example? or it is a part of the memory layout of the structure?
    what the metadata is? the [ possible padding depending on the C library] is a part in the structure? because It has to do with the pointer reminder[i].
    Last edited by Mr.Lnx; 05-09-2015 at 12:20 PM.

  6. #6
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    As the program begins. There is no something that you call "metadata" for dynamically allocated chunk 1 at the beginning because malloc hasn't called yet.
    Last edited by Mr.Lnx; 05-09-2015 at 03:00 PM.

  7. #7
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Mr.Lnx View Post
    I don't understand what is [ C library metadata for dynamically allocated chunk n ]
    When you allocate memory using malloc(), you specify a size. When you call free(), you do not specify a size, only the pointer malloc() returned to you. How do you think the C library knows how much was allocated for that pointer? How do you think the C library keeps track of which are in use, and which already freed? No such thing as magic; no such thing as "somehow".

    That data is C library metadata, the internal bookkeeping data the C library needs to keep track of the allocations. Using a separate arrays of pointers and sizes turns out not that efficient, and many C libraries are implemented to keep that bookkeeping metadata near the allocations themselves.

    One possible implementation is
    Code:
    struct chunk {
        struct chunk *next; /* Next chunk in a singly-linked list */
        size_t        size; /* Size of this chunk, not including these two fields */
    
        /* User data follows. For illustration: */
        union {
            double d;
            float f;
            long l;
            int i;
            time_t t;
            size_t s;
        } data;
    };
    Now, a malloc() call does not return a pointer to the above structure, but a pointer to the data member in it. (The above union is just one simplified example how a C library might ensure that the data member is suitably aligned for all standard data types. The C library does not actually care what type it is, or what data it contains, so typically you won't see any such unions at all; the C library will just manipulate the pointer directly.)

    Using the above, a C library could have two linked lists internally: one for chunks still in use, and one for unused or freed chunks. A malloc() call will check the unused chunk list to see if there is one of suitable size, and use that; otherwise it will ask more memory from the OS (kernel), and add two chunks describing the new memory: one to serve the malloc() call, and one to cover any leftover unused memory.

    When free() is called, the C library can verify that the pointer is valid -- it should point to a data member in one of the chunks still in use --, and if so, move the chunk to the unused list. If it is adjacent to another entry in the unused list, it is merged with that one.

    Let's assume a 32-bit architecture with 8 byte alignment. Your five structures might have the following layout in memory:
    Code:
    Offset │   Type  │ Description
    ═══════╪═════════╪═══════════════════════
        -8 │ chunk * │ Internal next pointer, points to -8 at next chunk
        -4 │ size_t  │ Internal size, here 16
      ─ ─ ─┼─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ─
         0 │   int   │ size = 5
         4 │  char   │ s[0]
         5 │  char   │ s[1]
         6 │  char   │ s[2]
         7 │  char   │ s[3]
         8 │  char   │ s[4]
         9 │  char   │ s[5] = '\0'
        10 │  char   │ s[6] = '\0'
        11 │ padding │
        12 │ padding │
        13 │ padding │
        14 │ padding │
        15 │ padding │
    ═══════╪═════════╪═══════════════════════
        -8 │ chunk * │ Internal next pointer, points to -8 at next chunk
        -4 │ size_t  │ Internal size, here 16
      ─ ─ ─┼─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ─
         0 │   int   │ size = 5
         4 │  char   │ s[0]
         5 │  char   │ s[1]
         6 │  char   │ s[2]
         7 │  char   │ s[3]
         8 │  char   │ s[4]
         9 │  char   │ s[5] = '\0'
        10 │  char   │ s[6] = '\0'
        11 │ padding │
        12 │ padding │
        13 │ padding │
        14 │ padding │
        15 │ padding │
    ═══════╪═════════╪═══════════════════════
        -8 │ chunk * │ Internal next pointer, points to -8 at next chunk
        -4 │ size_t  │ Internal size, here 16
      ─ ─ ─┼─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ─
         0 │   int   │ size = 5
         4 │  char   │ s[0]
         5 │  char   │ s[1]
         6 │  char   │ s[2]
         7 │  char   │ s[3]
         8 │  char   │ s[4]
         9 │  char   │ s[5] = '\0'
        10 │  char   │ s[6] = '\0'
        11 │ padding │
        12 │ padding │
        13 │ padding │
        14 │ padding │
        15 │ padding │
    ═══════╪═════════╪═══════════════════════
        -8 │ chunk * │ Internal next pointer, points to -8 at next chunk
        -4 │ size_t  │ Internal size, here 16
      ─ ─ ─┼─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ─
         0 │   int   │ size = 5
         4 │  char   │ s[0]
         5 │  char   │ s[1]
         6 │  char   │ s[2]
         7 │  char   │ s[3]
         8 │  char   │ s[4]
         9 │  char   │ s[5] = '\0'
        10 │  char   │ s[6] = '\0'
        11 │ padding │
        12 │ padding │
        13 │ padding │
        14 │ padding │
        15 │ padding │
    ═══════╪═════════╪═══════════════════════
        -8 │ chunk * │ Internal next pointer, could be NULL
        -4 │ size_t  │ Internal size, here 16
      ─ ─ ─┼─ ─ ─ ─ ─┼─ ─ ─ ─ ─ ─ ─ ─
         0 │   int   │ size = 5
         4 │  char   │ s[0]
         5 │  char   │ s[1]
         6 │  char   │ s[2]
         7 │  char   │ s[3]
         8 │  char   │ s[4]
         9 │  char   │ s[5] = '\0'
        10 │  char   │ s[6] = '\0'
        11 │ padding │
        12 │ padding │
        13 │ padding │
        14 │ padding │
        15 │ padding │
    The zero offsets are where reminder[0], reminder[1], reminder[2], reminder[3], and reminder[4] would point to. The difference in successive pointers ((size_t)(reminder[4] - reminder[3])) would be 24, and the entire data above would fit in 120 bytes of memory.

    Now, the C standards do not say how the C library should do its job. There is no standard for how to track the memory, so the above is just one possibility among many. However, if the C library keeps track of dynamically allocated chunks using internal data fields before and/or after the memory chunk, those fields are the metadata I referred to. Just internal C library bookkeeping we must avoid mangling, but otherwise unknown and irrelevant to us.

    The reason the bookkeeping data is put before the user data, and not after, is realloc(). If you put the bookkeeping data after the user data, you'd have to move it every time it was realloc()'d. Putting it before the user data means it can stay put, unless the entire chunk is moved elsewhere.

    None of this is purely academic. At least in Linux, you can easily write your own malloc()/calloc()/realloc()/free() functions, using sbrk() or mmap() to ask more memory from the OS/kernel, just by writing those functions yourself. (The ELF binary format allows that, and the functions are specifically marked to allow that.)

    In most cases, you don't override them, but write your own interface. A typical example is implementing pool-based allocations, like the Apache web server uses internally. (Pool-based allocations are nothing weird, it just means that your allocations are logically linked, in "pools", so that instead of keeping track of each and every single allocation, you free an entire pool, not the individual allocations in it. When allocating a chunk of memory, you also name the "pool" the allocation should belong to, in addition to the size you need. For complex tasks where realloc() is rare, and you need many (smallish) allocations that all get released at the same time (when the task completes), pool-based allocations are a very good choice. Beats "automagic" garbage collection, for example.)

  8. #8
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    @Nominal ok we have a structure with members (one integer and seven characters) that have 11 byte size (assuming that int is 4 and each character 1 byte) so 16byte size and 8byte alignment. (5 bytes padding) I understood what the metadata is. I tried to catch the general meaning of your text because is very big and advanced!!!!

    Last question the internal next pointer chunk* is different from reminder[i] pointer isn't it? maybe a silly question :P
    Internal chunk* points to metadata each time the structure used?

    Thank you very much for your time.
    Last edited by Mr.Lnx; 05-09-2015 at 06:09 PM.

  9. #9
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Mr.Lnx View Post
    ok
    Yes, that was what I meant. (Sorry for being so long-winded; I can't help it.)

    Quote Originally Posted by Mr.Lnx View Post
    Internal chunk* points to metadata each time the structure used?
    Yes. It is only used by the C library, and it points to the other chunk* pointers, making a linked list.

    The application never sees them, and they are different to the reminder[] pointers.

    The reminder[] pointers in your application would point to the offset 0 rows in the table (8 bytes after the location of a chunk *)s. Your application doesn't see the metadata/bookkeeping details, because they reside before the pointer we get from malloc().

    If you are interested in the details, check out Doug Lea's malloc, or dlalloc(). It uses a different metadata format -- a size/status word before the user data, and a size word after it. It also uses 128 separate lists for unused chunks of different sizes to make allocation much faster and more efficient. Many C libraries use some derivative of dlmalloc(), but it is a bit more complicated than my example above. GNU glibc uses a derivative, ptmalloc(), which is extended to work better with threaded programs.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Flexible array members C99
    By Mr.Lnx in forum C Programming
    Replies: 7
    Last Post: 07-29-2013, 12:38 PM
  2. array type flexible array?
    By AaronP in forum C Programming
    Replies: 12
    Last Post: 12-12-2011, 09:21 PM
  3. memory allocation for flexible array member of struct
    By jcarouth in forum C Programming
    Replies: 3
    Last Post: 09-11-2005, 12:19 PM
  4. flexible array member not at end of struct
    By krappa in forum C Programming
    Replies: 13
    Last Post: 05-14-2004, 01:36 PM
  5. Replies: 5
    Last Post: 09-10-2003, 09:30 PM