using realloc for a dynamically growing array

**broli86** · 06-27-2008

Hi, I'm trying to write a program that will create a dynamically growing array. There is a parent array and from this I want to create a seperate array with elements that are less than <= 166 (some random number). For some reason, this code is working for smaller values like 5 or 6 but failed when i entered parent_size = 23.

Code:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{

    int  *parent;
    int *child = NULL, *tempstore;
    size_t child_size, i, parent_size;

    printf("Enter size of parent array\n");
    if(scanf("&#37;d", &parent_size) != 1)
      return (1);

    parent = malloc(sizeof(int) * parent_size);
    if(parent == NULL)
    { 
        fprintf(stderr, "Memory allocation failed\n");
        return (1);
    }

    for(i = 0; i < parent_size; i++)
    {
       printf("Enter element %d:", i);
       if(scanf("%d", &parent[i]) != 1)
          return (1);
       printf("\n");
    }
   
    child_size = 0;

    for(i = 0; i < parent_size; i++)
    {
       if(parent[i] <= 166)
       {
          tempstore = realloc(child, ++child_size);
          if(tempstore == NULL)
          {
              fprintf(stderr, "Memory reallocation failed\n");
              return (1);
          }
           tempstore[child_size - 1] = parent[i];
           child = tempstore;
       }
   }

    for(i = 0; i < child_size; i++)
      printf("%d\n", child[i]);

    return (0);
}

**matsp** · 06-27-2008

Your realloc call isn't multiplying by sizeof(int), so you are only allocating child_size number of bytes, not space for child_size number of integers, which is what the rest of the code expects.

In general, I wouldn't call realloc every time you have a new item to put in the second array, since it essentially means that you copy all the data every time you realloc [1] - two possible solutions:
1. Walk the entire array and count the number of child entries.
2. Have a separate size for the allocated size, and double that each time it reaches it's limit. That means you only do log2(n) allocations+copies, rather than (n) allocation/copy calls.

I don't see anything else directly wrong with your code.

[1] We do assume that realloc is stupid here - it may be that SOMETIMES you get the same pointer back. But eventually, if you keep doing it, you will grow the allocation by only a small amount and then having to copy it each time it outgrows it's space. [You could test that by comparing tempstore with child and increment a counter each time they are different - compare that with the child_size.

--
Mats

**anon** · 06-27-2008

The problem is probably that you are asking for child_size bytes, not for room for child_size integers.

**broli86** · 06-27-2008

Originally Posted by matsp

Your realloc call isn't multiplying by sizeof(int), so you are only allocating child_size number of bytes, not space for child_size number of integers, which is what the rest of the code expects.

In general, I wouldn't call realloc every time you have a new item to put in the second array, since it essentially means that you copy all the data every time you realloc [1] - two possible solutions:
1. Walk the entire array and count the number of child entries.
2. Have a separate size for the allocated size, and double that each time it reaches it's limit. That means you only do log2(n) allocations+copies, rather than (n) allocation/copy calls.

I don't see anything else directly wrong with your code.

[1] We do assume that realloc is stupid here - it may be that SOMETIMES you get the same pointer back. But eventually, if you keep doing it, you will grow the allocation by only a small amount and then having to copy it each time it outgrows it's space. [You could test that by comparing tempstore with child and increment a counter each time they are different - compare that with the child_size.

Thanks. I have tried the approach 1 i.e. counting the number of child entries using the condition, allocating memory using malloc function, and finally storing the elements in child array by once again running through the parent array and checking for condition. If I understand correctly, that's O(n^2) time complexity. What I posted is just a sample code. In my actual project, there can be as many as 10,000,000 entries in the parent array. Do you think going through 10,000,000 entries twice is a better idea than using realloc ?

**laserlight** · 06-27-2008

I have tried the approach 1 i.e. counting the number of child entries using the condition, allocating memory using malloc function, and finally storing the elements in child array by once again running through the parent array and checking for condition. If I understand correctly, that's O(n^2) time complexity.

You are only making two passes, so it is still linear time, not quadratic time.

In my actual project, there can be as many as 10,000,000 entries in the parent array. Do you think going through 10,000,000 entries twice is a better idea than using realloc ?

Are you able to estimate how large the new array will be? If you are, then perhaps you could malloc() that size, then expand if necessary with realloc(). On the other hand, if you are very concerned about space, then the two pass solution would be appropriate.

**broli86** · 06-27-2008

Originally Posted by laserlight

You are only making two passes, so it is still linear time, not quadratic time..

ok..agree

Originally Posted by laserlight

Are you able to estimate how large the new array will be? If you are, then perhaps you could malloc() that size, then expand if necessary with realloc().

No, there is no proper estimates.

Originally Posted by laserlight

On the other hand, if you are very concerned about space, then the two pass solution would be appropriate.

ok, but what is wrong with just realloc option ? Is it because its not a very portable option to pass a null pointer as a parameter to realloc function. I've read that only ANSI C allows this.

**laserlight** · 06-27-2008

ok, but what is wrong with just realloc option ?

Nothing, except that you may waste some space since the capacity (amount allocated) will probably be larger than the size (amount in use).

Is it because its not a very portable option to pass a null pointer as a parameter to realloc function. I've read that only ANSI C allows this.

C99 states that realloc() behaves like malloc() if a null pointer is passed to it. If C90 states the same, then this would be portable without doubt.

**matsp** · 06-27-2008

There is nothing WRONG with the realloc() as such, but assuming that you have 10M entries, and 10% of those are > 166, then you would call realloc 1M times. With my suggested doubling method, it would call realloc 20 times - that's a whole lot better, because compared to your test and insert functions, the realloc probably takes 10-100x the amount of time.

--
Mats

**broli86** · 06-27-2008

Originally Posted by matsp

There is nothing WRONG with the realloc() as such, but assuming that you have 10M entries, and 10% of those are > 166, then you would call realloc 1M times. With my suggested doubling method, it would call realloc 20 times - that's a whole lot better, because compared to your test and insert functions, the realloc probably takes 10-100x the amount of time.

but wouldn't doubling method lead to wastage of memory.

Suppose you have allocated space for 6,000,000 and you need 8,000,000 in total then if the allocated size is doubled it means new array size is 12,000,000 which causes a lot of space to be wasted and that memory cannot be freed either.

**matsp** · 06-27-2008

Originally Posted by broli86

but wouldn't doubling method lead to wastage of memory.

Suppose you have allocated space for 6,000,000 and you need 8,000,000 in total then if the allocated size is doubled it means new array size is 12,000,000 which causes a lot of space to be wasted and that memory cannot be freed either.

Yes it would. Almost all optimizations are a compromise between space and time. If you want to allocate the exact number, then I would recommend passing through the array twice and count the number needed. Acdtually, you can realloc() to the actual size at the end to reduce it's size. Not all realloc() implementations will always actually reduce the size of the allocation tho'.

There are other options: You could start with a small number, e.g. 10 or 100 elements. When you run out of space, you calculate how much of the total you have used up, and calculate how much you will need for the whole set (e.g, if you run out of 100 elements after 1100 items in the original array and the original size is 10M, then you need to have 10M/(1100/100) elements to cope with the whole array). Of course, this will not work if your selected elements are not distributed reasonably evenly (say all items are at the very beginning of the array, so after you've done 100 items, you have found 100 selected items, and there are only another 20 items, you'd have a lot of wastage).

Another possibility is that you use a hybrid technique, where you double each time you run off the edge until you have reached the quarter or halfway point of the bigger array, and then use the above method for further extensions. Of course, this will still fail in the same way as the orignal double if all the items you want to keep are in the initial section.

You could also reallocate with a smaller factor than 2, say 1.5 or 1.25 - you would then do logX(n) reallocations, so if you need 1M elements in your array, you would do log(n)/log(1.25) or log(n)/log(1.5) reallocations. You would still waste some space, but not quite as much.

I personally don't see wasting some memory as such a big issue - but it does depend on how many of these child arrays you would have, and how much of the total memory would be wasted.

--
Mats

**broli86** · 06-27-2008

Originally Posted by matsp

Yes it would. Almost all optimizations are a compromise between space and time. If you want to allocate the exact number, then I would recommend passing through the array twice and count the number needed. Acdtually, you can realloc() to the actual size at the end to reduce it's size. Not all realloc() implementations will always actually reduce the size of the allocation tho'.

There are other options: You could start with a small number, e.g. 10 or 100 elements. When you run out of space, you calculate how much of the total you have used up, and calculate how much you will need for the whole set (e.g, if you run out of 100 elements after 1100 items in the original array and the original size is 10M, then you need to have 10M/(1100/100) elements to cope with the whole array). Of course, this will not work if your selected elements are not distributed reasonably evenly (say all items are at the very beginning of the array, so after you've done 100 items, you have found 100 selected items, and there are only another 20 items, you'd have a lot of wastage).

Another possibility is that you use a hybrid technique, where you double each time you run off the edge until you have reached the quarter or halfway point of the bigger array, and then use the above method for further extensions. Of course, this will still fail in the same way as the orignal double if all the items you want to keep are in the initial section.

You could also reallocate with a smaller factor than 2, say 1.5 or 1.25 - you would then do logX(n) reallocations, so if you need 1M elements in your array, you would do log(n)/log(1.25) or log(n)/log(1.5) reallocations. You would still waste some space, but not quite as much.

I personally don't see wasting some memory as such a big issue - but it does depend on how many of these child arrays you would have, and how much of the total memory would be wasted.

--
Mats

hi, I checked up the requirements and the array is to be split into two child arrays. The array has to be split at the median value such that the left array contains all elements <= median and right array contains elements >= median. A binary tree of such arrays will be built so its likely some space *might* be wasted. For eg.

1 2 3 3 4 5 6

median is 3

the left array is

1 2 3 3

right array is

3 3 4 5 6

Based on some tests that I have conducted on a program that i wrote, I got following result :

number of elements in parent = 7778
number of elements in left child = 3961
number of elements in right child = 3961

Splitting left child,
number of elements in left child = 2017
number of elements in right child = 2017

Splitting right child,
number of elements in left child = 2017
number of elements in right child = 2017

Because we split at the median , it is bound to happen that both child arrays size is a little more than half of the parent array's size. Now I tried with a even bigger data size :

number of elements in parent = 249698
number of elements in left child = 125257
number of elements in right child = 125257

Splitting left child,
number of elements in left child = 62833
number of elements in right child = 62833

Splitting right child,
number of elements in left child = 62833
number of elements in right child = 62833

This actually provides a pretty good estimate to the size of child array: n/2 and from there on we can increase the size by calling realloc a few times. This will be still slower but definetely an improvement. Speed is probably not as much concern to me right now.

Thread: using realloc for a dynamically growing array

Thread Tools

Search Thread

Display

using realloc for a dynamically growing array

Similar Threads

File I/O problem for dynamically allocated struct array

Merge sort please

Hi, could someone help me with arrays?

Array of pointers for dynamically allocating class instances