Dynamic array help

**Perogy** · 04-08-2009

Ok here's what I've got now, I have made some minor revisions to the structure and names of stuff so it might be a little hard to follow from the previous example, but nothing too major.

The main.c file

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "student.h"

#define BUFSIZE 256

int main()
{
	record *array = 0;
	record std;
	int number_of_elements = 0;
	int array_size = 0;
	
	char line[BUFSIZE];
	char first[21];
	char last[21];
	int score;
	
	/*get the user input line while not EOF*/
	while (fgets(line, BUFSIZE, stdin) != 0)
	{
		sscanf(line, "%s%s%d", last, first, &score);
		strcpy(std.name.lastname, last);
		strcpy(std.name.firstname, first);
		std.score = score;
		add(&array, &std, &number_of_elements, &array_size);
	}
	free(array);
	return 0;
}

The student.h header file (note I made a couple of changes to the structures)

Code:

#include <stdlib.h>

#ifndef STUDENT_C
#define STUDENT_C

#define NAMESIZE 21
typedef struct name
{
	char lastname[NAMESIZE];
	char firstname[NAMESIZE];
} name;

typedef struct record
{
	name name;
	int score;
} record;

void add(record**, record*, int*, int*);

#endif

and finally the student.c file which includes the dynamic array

Code:

#include <stdlib.h>
#include <stdio.h>
#include "student.h"

void add(record **array, record *item, int *number_of_elements, int *array_size)
{
	if (*number_of_elements == *array_size) /*resize array*/
	{
		if (*array_size == 0)
		{
			*array_size = 1;
		}
		else
		{
			*array_size = *array_size * 2;
		}
		*array = (record*)realloc(*array, ((*array_size) * sizeof(record)));
		
		if (*array == 0)
		{
			printf("no space for memory");
			exit(1);
		}
	}
	*array[*number_of_elements] = *item;
	(*number_of_elements)++;
}

To recap, it compiles and runs fine, the dynamic array works fine for the first string I enter, but it segfaults when I enter the second one :S this random behavior kind of leads me to believe it has to do with something not being initalized or something along those lines.

I'm thinking maybe it is something to do with the main, but whatever it is I've been fooling around with it for hours and can't seem to figure it out.

TIA

**R.Stiltskin** · 04-08-2009

As I said last night, it wasn't necessary to make 'item' a pointer, but it's just as well that you did. It's more efficient to send the function a pointer to a struct rather than making an extra copy of the entire struct -- this would be important if you were dealing with a large structure.

Anyway, your segfault is due to one little detail in student.c. Access the array like this

Code:

    (*array)[*number_of_elements] = *item;

or

Code:

    *(*array + *number_of_elements) = *item;

By the way, I don't like the idea of reusing a typename to name a field in the structure, as in

Code:

name name;

although apparently the compiler doesn't mind. I just think that's looking for trouble/confusion.

Also, do you really want the arraysize to double every time it runs out of space? That may be OK going from 1 to 2 and 2 to 4, but do you really want to jump from 16 to 32, and so on?

**matsp** · 04-08-2009

Originally Posted by R.Stiltskin

Also, do you really want the arraysize to double every time it runs out of space? That may be OK going from 1 to 2 and 2 to 4, but do you really want to jump from 16 to 32, and so on?

Why not - certainly it would NOT be a good idea to grow it by a small constant value. Imagine if you have to grow it to 16000 or so, by steps of (say) 8? On average you'd have copied half the array 2000 times, instead of 11 times with a doubling scheme. [In fact, this very problem existed in a driver a former colleague of mine workd on, where it grew an array by 4KB every time it ran out of space - but in this particular "error report", the software product needed 16MB of space. So it would copy the data 4000 times, averaging 8MB per copy. That takes a while even on the latest greates processors! By doubling instead, it went from 0.01 fps to 5 fps or some such - it still wasn't a fantastic frame-rate, but it was good enough that the application wasn't deemed "broken" any more].

It obviously does depend on how large you expect the array to grow, and thus how many times you need to copy it. But growing by a factor other than 1 element more every time is certainly not a bad idea.

--
Mats

**R.Stiltskin** · 04-08-2009

Originally Posted by matsp

It obviously does depend on how large you expect the array to grow, and thus how many times you need to copy it. But growing by a factor other than 1 element more every time is certainly not a bad idea.

--
Mats

I certainly agree with that. I had in mind something more in line with the nature of the application.

As for your example, granted, doubling each time is an easy way out of estimating what's actually needed, but I'd be pretty annoyed if the next error report was only 4K and cost me 16MB to store it.

**matsp** · 04-08-2009

But doubling gives a maximum unused overhead of (number_of_elements / 2)-1 times the size of the element. And that's only if the number of elements is 2^n+1. For anything else it is less. Since 2^n grows exponentially, it also is exponentially less likely that you will hit that special case [unless it so happens that your data ALWAYS is powers of 2 + 1 of course].

But it certainly is a trade-off between "how many times do we copy this" and "how much space is wasted".

--
Mats

**laserlight** · 04-08-2009

Originally Posted by R.Stiltskin

I certainly agree with that. I had in mind something more in line with the nature of the application.

What do you propose as an alternative, given that we have no idea if Perogy needs to store the data of students who took a class test or the data of students who took a national examination?

Originally Posted by R.Stiltskin

As for your example, granted, doubling each time is an easy way out of estimating what's actually needed, but I'd be pretty annoyed if the next error report was only 4K and cost me 16MB to store it.

I do not see how a factor of 2 can possibly lead to 16 MB being used to store 4 KB of data. More likely it would be 8 KB, with a temporary total of 12 KB if realloc() needs to copy to a separate block in memory. Of course, one can choose a smaller factor like 1.5 and still get amortized constant time performance.

**R.Stiltskin** · 04-08-2009

Originally Posted by laserlight

What do you propose as an alternative, given that we have no idea if Perogy needs to store the data of students who took a class test or the data of students who took a national examination?

...which is exactly why I didn't propose an alternative. Presumably Perogy has some idea of his needs.

Originally Posted by laserlight

I do not see how a factor of 2 can possibly lead to 16 MB being used to store 4 KB of data. More likely it would be 8 KB, with a temporary total of 12 KB if realloc() needs to copy to a separate block in memory. Of course, one can choose a smaller factor like 1.5 and still get amortized constant time performance.

Just following the example given by matsp. If the previous "unusual" 16MB report filled the array, a subsequent 4K report would result in doubling the array size.

(Why are we arguing about this?)

**matsp** · 04-08-2009

Originally Posted by R.Stiltskin

(Why are we arguing about this?)

Because whilst you are arguing it shouldn't be done this way, I and Laserlight think that doubling is a fine thing to do.

Yes, if you need 16MB + 4K, then you get 32MB. Compared to the OVERALL needs of a program that uses 16MB of vertices, I'm sure it's peanuts, but I guess there's always someone who will complain about it.

--
Mats

**Perogy** · 04-08-2009

Originally Posted by R.Stiltskin

As I said last night, it wasn't necessary to make 'item' a pointer, but it's just as well that you did. It's more efficient to send the function a pointer to a struct rather than making an extra copy of the entire struct -- this would be important if you were dealing with a large structure.

Anyway, your segfault is due to one little detail in student.c. Access the array like this

Code:

    (*array)[*number_of_elements] = *item;

or

Code:

    *(*array + *number_of_elements) = *item;

By the way, I don't like the idea of reusing a typename to name a field in the structure, as in

Code:

name name;

although apparently the compiler doesn't mind. I just think that's looking for trouble/confusion.

Also, do you really want the arraysize to double every time it runs out of space? That may be OK going from 1 to 2 and 2 to 4, but do you really want to jump from 16 to 32, and so on?

ahhh that was it, thank you very much. Those precedence errors can really catch a C newbie like me off guard.

And just for clarification this is a school assignment which states we have to double the array, so that is why i chose to do it that way :-)

**laserlight** · 04-08-2009

Originally Posted by R.Stiltskin

...which is exactly why I didn't propose an alternative. Presumably Perogy has some idea of his needs.

That's a cop out

If we consider asymptotic running time, the choice is between allocating just the right amount of space, or increasing the space by some factor greater than unity. Increasing the space by a constant amount risks quadratic time performance, though realloc() may expand the block in-place where possible, so it would not necessarily be so bad. As such, having some idea of your needs means knowing exactly the amount of space needed, or at least knowing the upper bound on space (and thus also wasting space if the upper bound is not reached).

Originally Posted by Perogy

And just for clarification this is a school assignment which states we have to double the array, so that is why i chose to do it that way :-)

Sure, it is good to follow your assignment requirements, but also try to understand why it is done that way.

**R.Stiltskin** · 04-08-2009

Originally Posted by matsp

Because whilst you are arguing it shouldn't be done this way, I and Laserlight think that doubling is a fine thing to do.

Yes, if you need 16MB + 4K, then you get 32MB. Compared to the OVERALL needs of a program that uses 16MB of vertices, I'm sure it's peanuts, but I guess there's always someone who will complain about it.

--
Mats

I would like my money to grow exponentially (it doesn't), but I tend to think that I should avoid having my resource requirements grow exponentially absent some compelling reason. Call me stingy.

Edit: ...although I suppose it's not so bad if we consume only twice as much as we need in the worst case...

**laserlight** · 04-08-2009

Originally Posted by R.Stiltskin

I tend to think that I should avoid having my resource requirements grow exponentially absent some compelling reason

Memory is one resource, and processing time is another resource. Being able to allocate precisely the amount of space needed is indeed ideal. But if you do not know the size of the input before hand, then to achieve the insertion of an element at the back in amortized constant time, you have to increase space consumption by a factor. This space-time trade-off is often worth it, and of course one could resize to exact fit when done if necessary.

**R.Stiltskin** · 04-08-2009

Yes, that is certainly valid. Thanks laserlight and mats for pointing that out.

Thread: Dynamic array help

Thread Tools

Search Thread

Display

Similar Threads

Dynamic Array Resizing

need help with dynamic array syntax

Class Template Trouble

2D dynamic array problem

Dynamic array allocation and reallocation