# Thread: Doubt with struct and malloc

1. ## Doubt with struct and malloc

Good afternoon everyone,

I need a struct where its internal fields are dynamically allocated and the number of records is also dynamically allocated.

My question is that I have no idea how it is possible to implement this!

Imagine the struct below, each field will have a different size and will vary according to the input text.

Example: Field1 can receive 1000 bytes or 3000 bytes

The number of records in the struct will start at 0, and will increase as a record is registered.

Code:
```struct {
char
Field1[   20],
Field2[   20],
Field3[ 1001],
Field4[ 1001],
Field5[ 1001],
Field6[30000];
} Fields[50000];```
I know how to use malloc to increase each field.
Code:
`Field1 = malloc(101);`
But how do I increase the number of records in the struct?

I also don't know if I can have for example different sizes in each record, since in a simple struct of char field[100] this is true for all records.

Record 1 = Field1 = 100 bytes
Record 2 = Field1 = 600 bytes

Can anyone help?

2. Olá, meu compatriota!! Bem... vamos ficar no inglês pro povo não reclamar...

When you declare a structure its size is fixed on the sum of sizes of its types (there is some "alignment" side-effects, but lets get this out of discussion for now), so:
Code:
`struct S { char a[10]; char b[10]; };`
Declares a struct with 20 bytes (chars) in size. These sizes are fixed for a (10 chars) and b (10 chars).

If you want variable sizes you must allocate space dynamically with malloc or calloc, and later, free them with free. You can "reallocate" the space previously allocated using the realloc function.

Note this new structure, like this, for example:
Code:
```struct S {
char *a;    // pointers to dynamically allocated buffer.
char *b;

size_t a_size;  // to keep track of the sizes.
size_t b_size;
};```
Now we don't have contiguous bytes for a and b as the arrays provides, since what we are storing in the structure is the pointers of those dynamically allocated buffers. The need to keep the sizes is because the compiler don't know how many bytes will be allocated and pointed by these pointers and you'll, probably, need that info later.

Notice, as well, that your Fields example is 1.54 GiB long, at least (not counting any improbable aligment side-effects): (20*2 + 1001*3 + 30000)*50000 = 1652150000 bytes. In some systems this will not even compile.

Let's suppose you have to allocate a 20 char buffer. You do it easily like this:
Code:
```  char *p = malloc( 20 );
// we should test if p isn't NULL afterwards.```
And now you want to reallocate this buffer to 200 bytes. You should do this like:
Code:
`  char *q = realloc( p, 200 );`
Why the second pointer q? Because, like malloc, realloc can fail, returning NULL, but not changing the previous allocation. If you do:
Code:
`  p = realloc( p, 200 );`
And realloc fails, you'll have memory leakage in your hands. So, using a second pointer you have the opportunity to test for the failure without loosing the reference to the original buffer.

Is this enough to answer your question? (I can explain in pt-BR).

[]s
Fred

3. I know this answer will be a little bit over the top, but here is what I would do....

It's half way to an in-memory database

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define N_FIELDS 5

struct Record {
char *field[N_FIELDS];
};

struct Table {
struct Record *records;
int records_used;
int records_allocated;
int grow_by;
};

static struct Table *table_new(int grow_by) {
if(grow_by < 0) return NULL;
struct Table *table = malloc(sizeof(struct Table));
if(table == NULL) return NULL;

table->grow_by = grow_by;
table->records = NULL;
table->records_allocated = 0;
table->records_used = 0;
return table;
}

static int table_add_record(struct Table *table) {
// Grow the table of if needed
if(table->records_used == table->records_allocated) {
struct Record *new_records = realloc(table->records,sizeof(struct Record)*(table->records_allocated+table->grow_by));
if(new_records == NULL)
return -1; // Failed
table->records = new_records;
table->records_allocated += table->grow_by;
}

// Add the new record to the end (should really check for errors...)
int this_record = table->records_used;
for(int i = 0; i < N_FIELDS; i++)
table->records[this_record].field[i] = NULL;
table->records_used++;
return this_record;
}

static void table_set_field(struct Table *table, int rec_no, int field_no, char *value) {
if(field_no >= N_FIELDS) return;
if(table->records[rec_no].field[field_no] != NULL)
free(table->records[rec_no].field[field_no]);
if(value != NULL)
table->records[rec_no].field[field_no] = strdup(value);
else
table->records[rec_no].field[field_no] = NULL;
}

static void table_print(struct Table *table) {
printf("%d out of %d records used\n",table->records_used, table->records_allocated);
for(int i = 0; i < table->records_used; i++) {
printf("%3d:",i);
printf(", %s",table->records[i].field[0]);
for(int j = 1; j < N_FIELDS; j++) {
printf(", %s",table->records[i].field[j]);
}
printf("\n");
}
}

static void table_free(struct Table *table) {
for(int i = 0; i < table->records_used; i++) {
for(int j = 0; j < N_FIELDS; j++) {
free(table->records[i].field[j]);
}
}
free(table->records);
free(table);
}
//===========================================================================
int main(int argc, char *argv[]) {
struct Table *t = table_new(5);
// As a test, add 30 records
for(int i = 0; i < 28; i++ ) {
table_set_field(t, rec_no, 0, "a");
table_set_field(t, rec_no, 1, "b");
table_set_field(t, rec_no, 2, "dog");
table_set_field(t, rec_no, 3, "cat");
table_set_field(t, rec_no, 4, "mouse");
}
table_print(t);
table_free(t);
return 0;
}```

4. Originally Posted by flp1969
Code:
`char *q = realloc( p, 200 );`
Why the second pointer q? Because, like malloc, realloc can fail, returning NULL, but not changing the previous allocation. If you do:
Code:
`p = realloc( p, 200 );`
And realloc fails, you'll have memory leakage in your hands. So, using a second pointer you have the opportunity to test for the failure without loosing the reference to the original buffer.
Great tip!

Anyway,
Can I assign different bytes to each record without problems? As I asked here?
Code:
```Record 1 = Field1 = 100 bytes = Fields[0].Field1
Record 2 = Field1 = 600 bytes = Fields[1].Field1```
Notice that above I show 2 records but each of these records has different bytes! Because in a normal struct of type char[100] when I set the field size, I am setting it for all records that are created in the struct!

Another doubt, which I also mentioned in the question, how do I add 1 more record? Because I think that before I increase the field space, I need to increase the record qty!

Thanks

5. Originally Posted by marcelo.br
Great tip!
Anyway,
Can I assign different bytes to each record without problems? As I asked here?
No you can not. A 'struct' can only have on fixed size - the size returned by 'sizeof'.

6. Originally Posted by hamster_nz
No you can not. A 'struct' can only have on fixed size - the size returned by 'sizeof'.
Can I enter a size for each field here? example:
Code:
```char *Database[][6];
Database[0][0] = malloc 100 bytes
Database[1][0] = malloc 500 bytes
Database[2][0] = malloc 800 bytes```

7. Originally Posted by marcelo.br
Can I enter a size for each field here? example:
Code:
```char *Database[][6];
Database[0][0] = malloc 100 bytes
Database[1][0] = malloc 500 bytes
Database[2][0] = malloc 800 bytes```
That won't work. You want to define a structure to hold you fields, like this:

Code:
```#include <stdio.h>
#include <stdlib.h>

struct record {
char *fields[6];
};

int main() {
struct record *records;

// Create a table for 100 records
records = malloc(sizeof(struct record)*10);
for(int i = 0; i < 10; i++) {
// Give each field a semi-ramdom length
records[i].fields[0] = malloc(100+10*rand()%100);
records[i].fields[1] = malloc(100+10*rand()%100);
records[i].fields[2] = malloc(100+10*rand()%100);
records[i].fields[3] = malloc(100+10*rand()%100);
records[i].fields[4] = malloc(100+10*rand()%100);
records[i].fields[5] = malloc(100+10*rand()%100);
}

// How to release everything
for(int i = 0; i < 10; i++) {
for(int j = 0; j < 6; j++) {
free(records[i].fields[j]);
}
}
free(records);
return 0;
}```

8. Originally Posted by hamster_nz
That won't work.
Ok, if it doesn't work with struct and not with matrix.
Is there something else I could use in C that would work?

Because there are fields that are extremely huge, and others that take only a few bytes.

And it would be a huge waste of memory for me to allocate for example 500Mb of memory for all the records, since most of them will use 100 bytes!

Example: A database with 1000 records
Record 1:
Title field: Uses 100 bytes
Description Field: Using 500Mb

Record 2:
Title Field: Expends 90 Bytes
Description Field: Expending 200 Bytes

Just think, I will have to allocate 1000 x 500Mb bytes just because out of these 1000 records some 20 use 500Mb or some 980 use very few bytes!

If I can't use struct, nor matrix, how can I do this?

9. Why do you need to load all records in memory? What if you have 1000000 of records, each with 128 KiB? Do you really expect to allocate 128 GiB of memory?
Do each record have meaning other than as array of chars?

10. Originally Posted by marcelo.br
Ok, if it doesn't work with struct and not with matrix.
Is there something else I could use in C that would work?

Because there are fields that are extremely huge, and others that take only a few bytes.

And it would be a huge waste of memory for me to allocate for example 500Mb of memory for all the records, since most of them will use 100 bytes!

Example: A database with 1000 records
Record 1:
Title field: Uses 100 bytes
Description Field: Using 500Mb

Record 2:
Title Field: Expends 90 Bytes
Description Field: Expending 200 Bytes

Just think, I will have to allocate 1000 x 500Mb bytes just because out of these 1000 records some 20 use 500Mb or some 980 use very few bytes!

If I can't use struct, nor matrix, how can I do this?
If you look closely at that last code I suppled, it does let you do what you want.

* The memory for each field is mallc()ed, and can be any size.

* A record has 6 fields.

* You can malloc() (and maybe realloc()) storage for as many records as you require for your dataset.

So each record needs storage for 6 pointers (48 bytes), and the storage allocated for it's fields, plus 7 times the overhead required by your malloc() implementation (e.g. something like 8 bytes per allocation)

11. There are a couple of answers already mentioning dynamic allocation of strings, so I'll skip those.

You don't tell us what you are trying to do, so I'm going to focus on that for a bit. Here are some common scenarios:

1. You might be trying to parse an arbitrary file format, like TOML or HTML or something, where there is a tree-like object model. If this is the case, I encourage you to be less generic and more tree-like. Instead of having a record with thousands of fields, create a series of typed records (boldRecord, italicRecord, listRecord, arrayRecord, tableCellRecord) that are specific to the particular purpose you have. In many cases, yes, there will be text fields. But you will find that a small number of text records in a specific node is easier to think about than thousands of "line of text" entries in an array.

2. You might be trying to deal with "optional" fields, like the 2nd line of a US mailing address: name, address, 2nd-address-line-optional, city-state-zip, country-optional. In this case, the general rule is to use either an empty string or a null pointer. I prefer the null pointer, myself.

3. You might be trying to write something like a text editor. In this case, I suggest you create one main "text" field that stores the entire contents of the file, and then parse that field into lines in a separate step. It's much easier to read in an entire buffer and then parse it than to perform all the separate calls to malloc() for line-at-a-time processing.

4. You might really have thousands of unnamed fields to process. In that case, be aware of a dynamic array. You already know you can malloc() an area of memory and get a pointer, which you can use to store a string or other object. But that pointer is itself an "other object" that can be stored. So feel free to declare a char ** array-of-strings that has a variable side, and use calloc(nstrings, sizeof (char *)) to allocate space to hold pointers to however-many strings. Then you can call malloc(length-of-string1), malloc(length-of-string2), ... over and over to allocate storage for each field, and store the pointers into fields[0], fields[1], etc.

5. You may be conflating "fields" with "records." Make sure that you know the difference between "this is a field in the earlier record" and "this is the first field in a new record." It might make sense for you to decompose your data into records, instead of fields.

6. You don't seem to have any idea about names. Usually in a "record" scenario, the fields have names. If your fields don't have names, are they really fields? Or it is just an array of something? (You are allowed to have fields that are themselves arrays of things.) Maybe you need to think more about the structure.

7. You didn't mention delimiters. Are all these things going to be generated dynamically by your program, or are you reading them from a file? Will you have to parse the file text using scanf or some other parser you have written? Is there already a library to do this for you?

12. Originally Posted by aghast
You don't tell us what you are trying to do, so I'm going to focus on that for a bit. Here are some common scenarios:
I have attached images for everyone to understand

I don't mind using struct, matrix, or any other kind of resource that I don't know the name of!

What I don't want is to allocate 500Mb for a record field that is only using 10 bytes! This is because another record in this same field consumes 500Mb.

As I show in the question
Record1->Field0 = 10 bytes
Record2->Field0 = 500Mb
Record3->Field0 = 100Mb

As I show in the images there are 4 fields in each record as an example

Notice that in the 4th Field, Steve spent 1Gb in bytes, and I don't want to allocate 1Gb in Marcelo's register, for example, just because Steve spent more bytes.

This is a simple program, like, cadaster, tasks, i.e., a program where we have records and each record some fields!

I only know how to do this using a struct or a matrix, I don't know any other way to do this. But allocating a gigantic memory for example 100,000 records where some 50 records spend so many bytes and the other 99950 records spend up to a maximum of 200 bytes is a bit of a waste

Can you help?

13. Originally Posted by flp1969
Why do you need to load all records in memory? What if you have 1000000 of records, each with 128 KiB? Do you really expect to allocate 128 GiB of memory?
Do each record have meaning other than as array of chars?
The problem you mention is valid! And it will give problems in the future.
At the moment, until the end of the year, I won't have this problem. I am studying in the future to do a reading from a partial file.

Example: When I filter a search, only the result will be loaded in memory. The initial program load would not load any record.
But for this, I would have to change my search algorithm, which at the moment is searching in a struct

I would also have to work on my sort algorithm, because at the moment it also sorts in a struct
That's why I'm leaving this for later! When it really starts to give problems

Because I have to think calmly about working with an archive:
1) Sorting and changing the sort of records in a Grid, coming from the file is the same thing as me loading everything into memory. I have to think a lot about how I would do this

2) Even filtering records from the file, reading a file each time I filter will be very slow I believe, so this also needs a lot of thinking to solve all related problems

3) This database with 4 thousand records already takes some seconds to be loaded from the file, imagine with 100 thousand, this means that the task of working with files needs to be very well thought out!

4) When I save 1 record in file, it would also be slow, because I need to replicate the whole file in a new one with the new modified record, the same to delete, or edit, and Add, Edit and Delete is done every 2 minutes

14. This sounds like the sort of problems for which actual databases exist

But I still can't see how your problem hasn't already been answered - maybe I need to be more explicit.

Here is an example that allocates either 100 or 1000000 or 20000 for data to be held in field[2]:

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct record {
char *fields[3];
};

int main() {
struct record *records;
int records_used = 0;

// Create a table for upto 100 records
// (or more if you realloc() it later
records = malloc(sizeof(struct record)*100);

// Add a record, with field 2 being 100 bytes
records[records_used].fields[0] = strdup("I'm record 0");
records[records_used].fields[1] = strdup("100");
records[records_used].fields[2] = malloc(100);
records_used++;

// Add a record, with field 2 being 1000000 bytes
records[records_used].fields[0] = strdup("I'm record 1");
records[records_used].fields[1] = strdup("1000000");
records[records_used].fields[2] = malloc(1000000);
records_used++;

// Add a record, with field 2 being 2000000 bytes
records[records_used].fields[0] = strdup("I'm record 2");
records[records_used].fields[1] = strdup("2000000");
records[records_used].fields[2] = malloc(2000000);
records_used++;

// Print out the table
for(int i = 0; i < records_used; i++) {
printf("%d:  '%s', data is %d bytes\n", i, records[i].fields[0], atoi(records[i].fields[1]));
}

// How to release everything
for(int i = 0; i < records_used; i++) {
for(int j = 0; j < 3; j++) {
free(records[i].fields[j]);
}
}
free(records);
return 0;
}```
With it you could add new records until you run out of memory...

But it makes even more sense to do this - note the names and types in the 'record' struct:

Code:
```#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct record {
char *name;
size_t data_size;
char *data;
};

int main() {
struct record *records;
int records_used = 0;

// Create a table for upto 100 records
// (or more if you realloc() it later
records = malloc(sizeof(struct record)*100);

// Add a record, with field 2 being 100 bytes
records[records_used].name = strdup("I'm record 0");
records[records_used].data_size = 100;
records[records_used].data = malloc(100);
records_used++;

// Add a record, with field 2 being 1000000 bytes
records[records_used].name = strdup("I'm record 1");
records[records_used].data_size = 1000000;
records[records_used].data = malloc(1000000);
records_used++;

// Add a record, with field 2 being 2000000 bytes
records[records_used].name = strdup("I'm record 2");
records[records_used].data_size = 2000000;
records[records_used].data = malloc(2000000);
records_used++;

// Print out the table
for(int i = 0; i < records_used; i++) {
printf("%d:  '%s', data is %zd bytes\n", i, records[i].name, records[i].data_size);
}

// How to release everything
for(int i = 0; i < records_used; i++) {
free(records[i].name);
free(records[i].data);
}
free(records);
return 0;
}```
Both have the same output:

0: 'I'm record 0', data is 100 bytes
1: 'I'm record 1', data is 1000000 bytes
2: 'I'm record 2', data is 2000000 bytes

15. Originally Posted by hamster_nz
This sounds like the sort of problems for which actual databases exist

But I still can't see how your problem hasn't already been answered - maybe I need to be more explicit.

Here is an example that allocates either 100 or 1000000 or 20000 for data to be held in field[2]:
But that is exactly what I wanted to know!
If it was possible for me to have malloc with different sizes, as you showed in char *data.

I will now try to make a test of it here!