Thread: using realloc for a dynamically growing array

  1. #1
    Registered User
    Join Date
    Apr 2008
    Posts
    87

    using realloc for a dynamically growing array

    Hi, I'm trying to write a program that will create a dynamically growing array. There is a parent array and from this I want to create a seperate array with elements that are less than <= 166 (some random number). For some reason, this code is working for smaller values like 5 or 6 but failed when i entered parent_size = 23.


    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(void)
    {
    
        int  *parent;
        int *child = NULL, *tempstore;
        size_t child_size, i, parent_size;
    
        printf("Enter size of parent array\n");
        if(scanf("&#37;d", &parent_size) != 1)
          return (1);
    
        parent = malloc(sizeof(int) * parent_size);
        if(parent == NULL)
        { 
            fprintf(stderr, "Memory allocation failed\n");
            return (1);
        }
    
        for(i = 0; i < parent_size; i++)
        {
           printf("Enter element %d:", i);
           if(scanf("%d", &parent[i]) != 1)
              return (1);
           printf("\n");
        }
       
        child_size = 0;
    
        for(i = 0; i < parent_size; i++)
        {
           if(parent[i] <= 166)
           {
              tempstore = realloc(child, ++child_size);
              if(tempstore == NULL)
              {
                  fprintf(stderr, "Memory reallocation failed\n");
                  return (1);
              }
               tempstore[child_size - 1] = parent[i];
               child = tempstore;
           }
       }
    
        for(i = 0; i < child_size; i++)
          printf("%d\n", child[i]);
    
        return (0);
    }

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Your realloc call isn't multiplying by sizeof(int), so you are only allocating child_size number of bytes, not space for child_size number of integers, which is what the rest of the code expects.

    In general, I wouldn't call realloc every time you have a new item to put in the second array, since it essentially means that you copy all the data every time you realloc [1] - two possible solutions:
    1. Walk the entire array and count the number of child entries.
    2. Have a separate size for the allocated size, and double that each time it reaches it's limit. That means you only do log2(n) allocations+copies, rather than (n) allocation/copy calls.

    I don't see anything else directly wrong with your code.

    [1] We do assume that realloc is stupid here - it may be that SOMETIMES you get the same pointer back. But eventually, if you keep doing it, you will grow the allocation by only a small amount and then having to copy it each time it outgrows it's space. [You could test that by comparing tempstore with child and increment a counter each time they are different - compare that with the child_size.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    The larch
    Join Date
    May 2006
    Posts
    3,573
    The problem is probably that you are asking for child_size bytes, not for room for child_size integers.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  4. #4
    Registered User
    Join Date
    Apr 2008
    Posts
    87
    Quote Originally Posted by matsp View Post
    Your realloc call isn't multiplying by sizeof(int), so you are only allocating child_size number of bytes, not space for child_size number of integers, which is what the rest of the code expects.

    In general, I wouldn't call realloc every time you have a new item to put in the second array, since it essentially means that you copy all the data every time you realloc [1] - two possible solutions:
    1. Walk the entire array and count the number of child entries.
    2. Have a separate size for the allocated size, and double that each time it reaches it's limit. That means you only do log2(n) allocations+copies, rather than (n) allocation/copy calls.

    I don't see anything else directly wrong with your code.

    [1] We do assume that realloc is stupid here - it may be that SOMETIMES you get the same pointer back. But eventually, if you keep doing it, you will grow the allocation by only a small amount and then having to copy it each time it outgrows it's space. [You could test that by comparing tempstore with child and increment a counter each time they are different - compare that with the child_size.
    Thanks. I have tried the approach 1 i.e. counting the number of child entries using the condition, allocating memory using malloc function, and finally storing the elements in child array by once again running through the parent array and checking for condition. If I understand correctly, that's O(n^2) time complexity. What I posted is just a sample code. In my actual project, there can be as many as 10,000,000 entries in the parent array. Do you think going through 10,000,000 entries twice is a better idea than using realloc ?

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    I have tried the approach 1 i.e. counting the number of child entries using the condition, allocating memory using malloc function, and finally storing the elements in child array by once again running through the parent array and checking for condition. If I understand correctly, that's O(n^2) time complexity.
    You are only making two passes, so it is still linear time, not quadratic time.

    In my actual project, there can be as many as 10,000,000 entries in the parent array. Do you think going through 10,000,000 entries twice is a better idea than using realloc ?
    Are you able to estimate how large the new array will be? If you are, then perhaps you could malloc() that size, then expand if necessary with realloc(). On the other hand, if you are very concerned about space, then the two pass solution would be appropriate.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    Apr 2008
    Posts
    87
    Quote Originally Posted by laserlight View Post
    You are only making two passes, so it is still linear time, not quadratic time..
    ok..agree

    Quote Originally Posted by laserlight View Post
    Are you able to estimate how large the new array will be? If you are, then perhaps you could malloc() that size, then expand if necessary with realloc().
    No, there is no proper estimates.


    Quote Originally Posted by laserlight View Post
    On the other hand, if you are very concerned about space, then the two pass solution would be appropriate.
    ok, but what is wrong with just realloc option ? Is it because its not a very portable option to pass a null pointer as a parameter to realloc function. I've read that only ANSI C allows this.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    ok, but what is wrong with just realloc option ?
    Nothing, except that you may waste some space since the capacity (amount allocated) will probably be larger than the size (amount in use).

    Is it because its not a very portable option to pass a null pointer as a parameter to realloc function. I've read that only ANSI C allows this.
    C99 states that realloc() behaves like malloc() if a null pointer is passed to it. If C90 states the same, then this would be portable without doubt.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    There is nothing WRONG with the realloc() as such, but assuming that you have 10M entries, and 10% of those are > 166, then you would call realloc 1M times. With my suggested doubling method, it would call realloc 20 times - that's a whole lot better, because compared to your test and insert functions, the realloc probably takes 10-100x the amount of time.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Apr 2008
    Posts
    87
    Quote Originally Posted by matsp View Post
    There is nothing WRONG with the realloc() as such, but assuming that you have 10M entries, and 10% of those are > 166, then you would call realloc 1M times. With my suggested doubling method, it would call realloc 20 times - that's a whole lot better, because compared to your test and insert functions, the realloc probably takes 10-100x the amount of time.
    but wouldn't doubling method lead to wastage of memory.

    Suppose you have allocated space for 6,000,000 and you need 8,000,000 in total then if the allocated size is doubled it means new array size is 12,000,000 which causes a lot of space to be wasted and that memory cannot be freed either.

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by broli86 View Post
    but wouldn't doubling method lead to wastage of memory.

    Suppose you have allocated space for 6,000,000 and you need 8,000,000 in total then if the allocated size is doubled it means new array size is 12,000,000 which causes a lot of space to be wasted and that memory cannot be freed either.
    Yes it would. Almost all optimizations are a compromise between space and time. If you want to allocate the exact number, then I would recommend passing through the array twice and count the number needed. Acdtually, you can realloc() to the actual size at the end to reduce it's size. Not all realloc() implementations will always actually reduce the size of the allocation tho'.

    There are other options: You could start with a small number, e.g. 10 or 100 elements. When you run out of space, you calculate how much of the total you have used up, and calculate how much you will need for the whole set (e.g, if you run out of 100 elements after 1100 items in the original array and the original size is 10M, then you need to have 10M/(1100/100) elements to cope with the whole array). Of course, this will not work if your selected elements are not distributed reasonably evenly (say all items are at the very beginning of the array, so after you've done 100 items, you have found 100 selected items, and there are only another 20 items, you'd have a lot of wastage).

    Another possibility is that you use a hybrid technique, where you double each time you run off the edge until you have reached the quarter or halfway point of the bigger array, and then use the above method for further extensions. Of course, this will still fail in the same way as the orignal double if all the items you want to keep are in the initial section.

    You could also reallocate with a smaller factor than 2, say 1.5 or 1.25 - you would then do logX(n) reallocations, so if you need 1M elements in your array, you would do log(n)/log(1.25) or log(n)/log(1.5) reallocations. You would still waste some space, but not quite as much.

    I personally don't see wasting some memory as such a big issue - but it does depend on how many of these child arrays you would have, and how much of the total memory would be wasted.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User
    Join Date
    Apr 2008
    Posts
    87
    Quote Originally Posted by matsp View Post
    Yes it would. Almost all optimizations are a compromise between space and time. If you want to allocate the exact number, then I would recommend passing through the array twice and count the number needed. Acdtually, you can realloc() to the actual size at the end to reduce it's size. Not all realloc() implementations will always actually reduce the size of the allocation tho'.

    There are other options: You could start with a small number, e.g. 10 or 100 elements. When you run out of space, you calculate how much of the total you have used up, and calculate how much you will need for the whole set (e.g, if you run out of 100 elements after 1100 items in the original array and the original size is 10M, then you need to have 10M/(1100/100) elements to cope with the whole array). Of course, this will not work if your selected elements are not distributed reasonably evenly (say all items are at the very beginning of the array, so after you've done 100 items, you have found 100 selected items, and there are only another 20 items, you'd have a lot of wastage).

    Another possibility is that you use a hybrid technique, where you double each time you run off the edge until you have reached the quarter or halfway point of the bigger array, and then use the above method for further extensions. Of course, this will still fail in the same way as the orignal double if all the items you want to keep are in the initial section.

    You could also reallocate with a smaller factor than 2, say 1.5 or 1.25 - you would then do logX(n) reallocations, so if you need 1M elements in your array, you would do log(n)/log(1.25) or log(n)/log(1.5) reallocations. You would still waste some space, but not quite as much.

    I personally don't see wasting some memory as such a big issue - but it does depend on how many of these child arrays you would have, and how much of the total memory would be wasted.

    --
    Mats


    hi, I checked up the requirements and the array is to be split into two child arrays. The array has to be split at the median value such that the left array contains all elements <= median and right array contains elements >= median. A binary tree of such arrays will be built so its likely some space *might* be wasted. For eg.

    1 2 3 3 4 5 6

    median is 3

    the left array is

    1 2 3 3

    right array is

    3 3 4 5 6

    Based on some tests that I have conducted on a program that i wrote, I got following result :

    number of elements in parent = 7778
    number of elements in left child = 3961
    number of elements in right child = 3961

    Splitting left child,
    number of elements in left child = 2017
    number of elements in right child = 2017

    Splitting right child,
    number of elements in left child = 2017
    number of elements in right child = 2017


    Because we split at the median , it is bound to happen that both child arrays size is a little more than half of the parent array's size. Now I tried with a even bigger data size :

    number of elements in parent = 249698
    number of elements in left child = 125257
    number of elements in right child = 125257

    Splitting left child,
    number of elements in left child = 62833
    number of elements in right child = 62833

    Splitting right child,
    number of elements in left child = 62833
    number of elements in right child = 62833

    This actually provides a pretty good estimate to the size of child array: n/2 and from there on we can increase the size by calling realloc a few times. This will be still slower but definetely an improvement. Speed is probably not as much concern to me right now.
    Last edited by broli86; 06-27-2008 at 08:55 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File I/O problem for dynamically allocated struct array
    By veecee in forum C++ Programming
    Replies: 2
    Last Post: 05-05-2006, 09:28 PM
  2. Merge sort please
    By vasanth in forum C Programming
    Replies: 2
    Last Post: 11-09-2003, 12:09 PM
  3. Hi, could someone help me with arrays?
    By goodn in forum C Programming
    Replies: 20
    Last Post: 10-18-2001, 09:48 AM
  4. Replies: 4
    Last Post: 09-12-2001, 02:05 PM