1. ## A question about performance and free()

While writing some code that is very heavy in calculations I came to think of 2 things today:

1: Performance of raw calculation vs. table lookup
Suppose I have a nested for loop:
Code:
```for(i=0;i<i_max;i++)
{
for(j=0;j<j_max;j++)
{
a=some_array[j]*2;
b=some_other_array[i];
c[i]=a+b;
}

}```
Would it be more efficient to calculate all the a's first in som other loop, and simply accessing them in memory? This does require that I malloc some space for another array to contain the a's. I guess it is a waste to calculate the a's over and over again. How much faster is it to access values in memory rather than doing explicit calculations (assuming all numbers are doubles)? Of course there are several unknowns in this question (most notably i_max and j_max), but can anything be said in general? I'd just like a rule of thumb since I try to write my programs as efficient as possible.

2. My second question is about the inner workings of free(). Suppose I have an array I don't need any more. Let's call it arr. The equivalence of arrays and pointers says that an array is the same as a pointer to the first element. Now when I call free(arr), how does free know when to stop? What prevents it from simply continuing to free up memory until it segfaults? Somehow free() ignores the equivalence between pointers to the first element and arrays. It always knows when the array ends.

I hope I have made myself clear enough, if not just ask and I will try to elaborate.

2. Originally Posted by The Physicist
Would it be more efficient to calculate all the a's first in som other loop, and simply accessing them in memory? This does require that I malloc some space for another array to contain the a's. I guess it is a waste to calculate the a's over and over again.
Makes sense and is generally considered true, yep. This is also why if the middle condition of a for loop involves a function call (eg, strlen) but the result is not expected to change, you should put the result of the call into a variable first and use that value instead, since otherwise strlen is called for every iteration. Obviously the same applies to function calls inside the loop. Some beginners get it in their head that "optimization" means using less code and packing lots of ops into one clever line of code, but that is a bad rule.

How much faster is it to access values in memory rather than doing explicit calculations (assuming all numbers are doubles)?
Going by literally that particular loop, the calculations are very minor and probably not much more expensive (or even less) than a lookup (I'm guessing, I dunno any assembly). Doing an arithmetic operation on values already in processor registers is faster than fetching something from memory, so it depends on how much arithmetic you are doing. Maybe some one knows of a ratio here (eg, beyond two or three add/subtracts, the fetch is faster).

2. My second question is about the inner workings of free(). Suppose I have an array I don't need any more. Let's call it arr. The equivalence of arrays and pointers says that an array is the same as a pointer to the first element. Now when I call free(arr), how does free know when to stop?
Stop right there. You cannot free an array of that sort (try it). A pointer which is malloc'd memory for an array is a pointer, not "an array". The equivalence of arrays and pointers is meant to refer to this kind of array:
Code:
`int array[32];`
This is stack memory and cannot be freed.

3. 1. Table lookup is in general faster, but usually less precise (for floating-point tables) and requires more memory. If you want to calculate the table at runtime, only do this if you know that there will be more lookups than entries in the table.

2. If you free a malloc'd array, free probably looks at some header before the array where the information on your array is stored (hidden). But if you free an actual (C) array, behavior is undefined. Free can't free arrays, only pointers (which may be arrays).

the array and the pointer are no longer strictly equivalent
For free, an array is a pointer outside of the heap, so any other pointer would result in similar behavior.

It seems I wasn't quite clear in my terminology in the second question. I know I cannot free an array such as int a[32], but what I really meant was an array like int *a=calloc(32,sizeof(int)). It still seems magical to me that once free reaches int[31] it simply stops freeing up memory instead of continuing. It is as if there is a table hidden somewhere that keeps account of the blocks that have been malloced during execution.

5. Originally Posted by The Physicist
It is as if there is a table hidden somewhere that keeps account of the blocks that have been malloced during execution.
Well, yeah. The runtime keeps track, for every call of malloc, how much memory was requested by that call so that free stops in the right place. (It has to not give you the same memory twice, it has to not give more memory than it has*, etc.)

--
* Actually my understanding of glibc is that it will happily give you more memory than exists on your system without complaint, until you try to actually use it all.

6. Originally Posted by tabstop
* Actually my understanding of glibc is that it will happily give you more memory than exists on your system without complaint, until you try to actually use it all.
This is a linux kernel option, you can control it with a switch at boot time (can't remember what it is tho).

7. Originally Posted by Brafil
1. Table lookup is in general faster, but usually less precise (for floating-point tables) and requires more memory. If you want to calculate the table at runtime, only do this if you know that there will be more lookups than entries in the table.
How can it be less precise if it is a lookup in a table? A double is still a double, regardless of calculation or table lookup?

Is it possible to access this table of malloc calls during runtime? Seems like a neat way to find out how big your arrays are. The information is there, it just has to be found.

8. Originally Posted by The Physicist
Is it possible to access this table of malloc calls during runtime? Seems like a neat way to find out how big your arrays are. The information is there, it just has to be found.
Good point, but not as far as I know. You can do the same thing yourself fairly easily, of course (keep a count).

9. Originally Posted by The Physicist
Code:
```for (i=0; i<i_max; i++)
{
for (j=0; j<j_max; j++)
{
a = some_array[j] * 2;
b = some_other_array[i];
c[i] = a+b;
}
}```
FTFY!
Unfortunately we cannot help you optimise your real code from that mock example. It's calculating each entry for the c array multiple times needlessly, and your real code almost certainly is not doing that. To show you what I mean, assuming that the final value of a and b are not required, this snippet has the same effect as the above one:
Code:
```for (i=0; i<i_max; i++)
{
c[i] = some_array[j_max-1] * 2 + some_other_array[i];
}```
Now if you cant perform the exact same optimisation on your real code, then you need to post the real code, so we can do some actual useful optimisations. Otherwise this discussion is moot.

10. How can it be less precise if it is a lookup in a table? A double is still a double, regardless of calculation or table lookup?
AFAIK double tables are mostly used for math calculation, so of course nobody can store all possible results in a table. I correct myself: for math applications, tables may be less precise if used for floating-point operations.

11. Originally Posted by The Physicist
How can it be less precise if it is a lookup in a table? A double is still a double, regardless of calculation or table lookup?
Nope. On many processors, including the x86, the internal precision of the wide floating point types is greater than the precision of a double. For example, a double in memory is 64 bits, but a double in the FPU is 80 bits. By storing intermediate results to memory you lose the 80 bit precision.

12. Originally Posted by MK27
Good point, but not as far as I know. You can do the same thing yourself fairly easily, of course (keep a count).
Yes it is fairly simple to keep track of the sizes of your arrays manually. However it would be neat, and perhaps also less error prone, to simply access the table. It's not a big thing, but I find it interesting nevertheless

I guess a nice way to easily keep track of the sizes of your arrays would be to simply declare them as structs containing a pointer which has all the data and then constant values such as rows, columns... and so on to store the dimensions of the array. In this way you could also think of these structs as vectors, matrices or tensors in a physical terminology depending on the dimensions.

Originally Posted by brewbuck
Nope. On many processors, including the x86, the internal precision of the wide floating point types is greater than the precision of a double. For example, a double in memory is 64 bits, but a double in the FPU is 80 bits. By storing intermediate results to memory you lose the 80 bit precision.
I didn't know this. This is a good thing to know, I will keep it in mind. Thank you very much.

I have noticed that if I make an array like char* arr=calloc(8,1) it is some times possible to write to arr[8] without getting a segfault. I assume this is because the program has reserved other memory from an other calloc call, and that this memory lies at the end of the array. Is it possible to tell the compiler, for debugging purposes, to spread out all the malloc'ed memory, thus forcing the program to segfault when accidently writing to a value that is beyond the end of the array? Or is there an other clever way to get the compiler to notice whenever you write beyond the end of an array? I'm using gcc, but I assume that the answer will be the same for all the most common compilers.

13. Originally Posted by The Physicist
I have noticed that if I make an array like char* arr=calloc(8,1) it is some times possible to write to arr[8] without getting a segfault. I assume this is because the program has reserved other memory from an other calloc call, and that this memory lies at the end of the array. Is it possible to tell the compiler, for debugging purposes, to spread out all the malloc'ed memory, thus forcing the program to segfault when accidently writing to a value that is beyond the end of the array? Or is there an other clever way to get the compiler to notice whenever you write beyond the end of an array? I'm using gcc, but I assume that the answer will be the same for all the most common compilers.
Actually this is more like "there is no way the system is only going to give you eight bytes", assuming you have some sort of normal-ish system (not, say, a chip controlling a wristwatch). You probably get 1K or so at a time, although you really should be nice and not go over what you claimed you needed.

14. Originally Posted by The Physicist
Or is there an other clever way to get the compiler to notice whenever you write beyond the end of an array?
There are ways available in other languages, and you could implement one for C I suppose, just it is not part of the standard. C is usually considered an "unsafe" language in the sense that if you don't know what you are doing, there's not much to stop you from making a real mess. However, once you get used to it, the boundaries here make sense -- nothing is left impossible, just very few things are automatic. The assumption is that you can take care of yourself.

15. Originally Posted by tabstop
Actually this is more like "there is no way the system is only going to give you eight bytes", assuming you have some sort of normal-ish system (not, say, a chip controlling a wristwatch). You probably get 1K or so at a time, although you really should be nice and not go over what you claimed you needed.
The CRT doesn't waste that much memory. Typically the CRT gets memory from the system in large chunks (perhaps 4K) and then malloc further allocates smaller chunks from within this if you make requests for smaller allocations. The smallest chunk the CRT will give you is basically 8 bytes on Windows.
This does not mean that you can assume that you have 8 bytes if you only asked for 2 say, of course.

Strictly speaking, if you go past the size you asked for it is undefined behaviour. It might not crash for quite a while after the end, or it might crash if you go one over. There are ways of configuring Windows place allocations within your program in a place where overrun will typically be detected the moment going over the end (provided your element size isn't too small e.g. one byte), but they waste a ton of memory to do so. It's called "Page heap allocation", and it usually involves using gflags. Linux probably has something similar.