A question about performance and free()

**Subsonics** · 05-22-2010

Regarding the precision, couldn't the doubles just be replaced with long doubles for 80 bit precision in a lookup table?

**Brafil** · 05-23-2010

They take 10 bytes each. That means, any meaningful table would eat a huge amount of resources. f.e., a sin table with 0.01 precision would take (PI * 2 / 0.01) * 10 = about 6 kilobytes. Also apply that to cos and tan and you'll quickly eat some resources and risk cache misses. And this is a very low precision if you're using long double.

Comparison: floats (precision 0.01) = 2.5 Kilobytes with a meaningful precision, doubles 5 Kilobytes.

**iMalc** · 05-23-2010

There isn't anything yet to suggest that a lookup table is remotely useful here. The question raised is about efficiency:

Would it be more efficient to calculate all the a's first in som other loop, and simply accessing them in memory?

How can one even entertain the notion of introducing a lookup table when there's a glaringly obvious optimisation that makes it redundant immediately?! If I can see it then perhaps the compiler can too. The whole question of the code efficiency hinges directly on what is stopping the compiler from making this optimisation itself. That's where the answer to this efficiency question lies. It's not as important whether it actually does that optimisation or not, it's more about how the presence of such other code completely alters the efficiency of this entire snippet. The code posted must be as close to the real code as possible, and the mock example we have so far just doesn't tell half of the real story.

As for number 2, one way or another it just knows, and it is likely different from one compiler, CRT, or OS to another. If you can imagine a way for the CRT to know that, then there's a reasonable chance that you're not far off for how at least one system does it.

**brewbuck** · 05-23-2010

Originally Posted by The Physicist

Is it possible to tell the compiler, for debugging purposes, to spread out all the malloc'ed memory, thus forcing the program to segfault when accidently writing to a value that is beyond the end of the array? Or is there an other clever way to get the compiler to notice whenever you write beyond the end of an array? I'm using gcc, but I assume that the answer will be the same for all the most common compilers.

There is no COMPILER option to do this, that I am aware of. However, your idea is very insightful, and something that I have implemented manually in the past.

I call it "memscatter," and the idea is to allocate thousands of blocks of memory of random sizes at program startup, then randomly select some fraction of these blocks to free up. The resulting heap fragmentation changes the behavior of the program and makes it easier to shake out memory corruption bugs (i.e. it crashes sooner and therefore makes it easier to find the problem). However, this is a last resort technique which is useful only when no other option is available.

You may want to investigate a tool called "electric fence" which overrides the default memory allocator to align all allocations to the beginning, or end, of a VM page, with invalid pages before and after the allocation. If the code goes outside the array, it will segfault and you can immediately see the bug.

Also, tools like Valgrind, which execute your program in a virtual machine and monitor memory accesses at the level of single bits, can easily find such bugs.

At work, we use Purify, which is an on-line instrumenter which changes the code in magical ways to automatically detect such problems.

**Subsonics** · 05-23-2010

Originally Posted by Brafil

They take 10 bytes each.

Actually they take 16 bytes each.

**Brafil** · 05-23-2010

Internally AFAIK the processor just treats them as 80 bit-doubles. If they take 16 bytes, then this is far less efficient.

**Subsonics** · 05-23-2010

Originally Posted by Brafil

Internally AFAIK the processor just treats them as 80 bit-doubles. If they take 16 bytes, then this is far less efficient.

It's a 128 bit datatype, where the mantissa is 80 bits. Efficacy would be relative, depending on your cpu, a new xeon have 12mb L2 cache for example. Not promoting the lookup table here, btw.

**The Physicist** · 05-26-2010

Thank you very much for your answers! They are greatly appreciated.

I have noticed something strange. My program seems to run much faster in Ubuntu than in Windows for some reason. Using the same settings my laptop with Ubuntu runs the program in 100 seconds, while my windows machine takes 189 seconds. To further test it I tried running it in Ubuntu in a virtual machine on my Windows machine, here it took 76 seconds. It is really strange since my Windows machine is by far the most powerful computer. The program is just a number-crunching console application, so it seems very weird to me. Any ideas for why this is the case?

Originally Posted by brewbuck

There is no COMPILER option to do this, that I am aware of. However, your idea is very insightful, and something that I have implemented manually in the past.

I call it "memscatter," and the idea is to allocate thousands of blocks of memory of random sizes at program startup, then randomly select some fraction of these blocks to free up. The resulting heap fragmentation changes the behavior of the program and makes it easier to shake out memory corruption bugs (i.e. it crashes sooner and therefore makes it easier to find the problem). However, this is a last resort technique which is useful only when no other option is available.

You may want to investigate a tool called "electric fence" which overrides the default memory allocator to align all allocations to the beginning, or end, of a VM page, with invalid pages before and after the allocation. If the code goes outside the array, it will segfault and you can immediately see the bug.

Also, tools like Valgrind, which execute your program in a virtual machine and monitor memory accesses at the level of single bits, can easily find such bugs.

At work, we use Purify, which is an on-line instrumenter which changes the code in magical ways to automatically detect such problems.

Great information! It seems Electric Fence and Valgrind are Linux only however, and I would like it to run on Windows as well. Purify is way out of my budget. Do you know any tools that do the same as these but run on both Windows and Linux?

**KCfromNC** · 05-26-2010

what compiler are you using on each OS? What command line options are used for each? Even if you're using the same compiler (i.e. gcc) on both, it's possible the defaults are set differently for each. That could mean that optimization is enabled in one case but not in the other.

**The Physicist** · 05-26-2010

gcc on both, -Wall and -O3 on both

**MK27** · 05-26-2010

Originally Posted by The Physicist

I would like it to run on Windows as well. Purify is way out of my budget. Do you know any tools that do the same as these but run on both Windows and Linux?

If I remember previous discussions of this sort, there are no free mem checkers for windows.

**The Physicist** · 05-26-2010

Actually I did find one called DUMA. It's based on Electric Fence. Couldn't get it to compile though. I did try Valgrind on my Linux machine, and I must say this is a VERY nice tool. It helped me correct quite a few memory leaks. I think I will just memory debug in my Linux environment from now on.

**MK27** · 05-26-2010

Yeah, valgrind is cool, it catches leaks and access errors and is dead simple to use. At least to do that much -- I haven't tried to do anything else with it.

Be nice if it reported errors by line, like a debugger, instead of by function tho. Actually a debugger that integrated something like valgrind would be pretty nice.

==2419== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==2419== malloc/free: in use at exit: 0 bytes in 0 blocks.
==2419== malloc/free: 5,999,977 allocs, 5,999,977 frees, 155,999,984 bytes allocated.
==2419== For counts of detected errors, rerun with: -v
==2419== All heap blocks were freed -- no leaks are possible.

Joy....

**brewbuck** · 05-26-2010

Originally Posted by MK27

Be nice if it reported errors by line, like a debugger, instead of by function tho. Actually a debugger that integrated something like valgrind would be pretty nice.

It does, but you have to build your program in debug mode. Unless you mean something else.

**MK27** · 05-26-2010

Originally Posted by brewbuck

It does, but you have to build your program in debug mode. Unless you mean something else.

By golly yer right!

==2483== Invalid read of size 8
==2483== at 0x401E19: cleartree (bayer-demo.c:78)
==2483== by 0x401F5D: main (bayer-demo.c:132)
==2483== Address 0x519d198 is 8 bytes inside a block of size 48 free'd
==2483== at 0x4C2509F: free (vg_replace_malloc.c:323)
==2483== by 0x401E14: cleartree (bayer-demo.c:77)
==2483== by 0x401F5D: main (bayer-demo.c:132)

Thread: A question about performance and free()

Thread Tools

Search Thread

Display

Similar Threads

free 64-bit C++ compiler on Windows

AVG Free 8.0 and false positives

Best C compiler (free)

Custom Allocation

Binary Search Trees Part III