I'm interested in doing some benchmarking, but am new to the concept of doing so. Any pointers? What do you guys use to measure the running time of your algorithms/functions in C? Do you use tools?
I'm interested in doing some benchmarking, but am new to the concept of doing so. Any pointers? What do you guys use to measure the running time of your algorithms/functions in C? Do you use tools?
I'm certainly no expert on the matter,
However, I will put my 2 cents in on how I do it (I only really need an approximation).
On Linux I use time to measure the running time of my program (usually I 'benchmark' algorithms in one program by themselves -- eliminates the need for 'internal timing'). Otherwise I'd use time() or some more accurate time function like setitimer(). However you should be aware of caching, for example if you time an application once, and then do it again the 2nd time is probably going to be quicker because it's cached.
I'm sure someone will reinforce what I've said, or shoot it out of the sky
The simple question leads to a long answer
Benchmarking and performance improvement is an iterative process, you have to go through the same steps many times.
The first thing to consider is:
* What are you trying to achieve?
Is the goal to improve the performance of your code, or something you can give to customers to measure the performance of their systems? The quality of the benchmark product would depend on the chosen goal.
* Micro or macro benchmark?
Do you want to measure the performance of the WHOLE application, or individual functions?
There are benefits in both. In the former, you probably need some sort of script or similar to "run" the application, or a suitable "typical" input file, perhaps.
In the latter case, you will have to write some code/script to perform specific functions (e.g. draw the same picture hundreds of times on the screen, output 100MB to a file, or whatever you want to achieve). It's also worth noting that some forms of micro-benchmarks can lead to "overoptimization" - waste of time, and sometimes also leading to overall reduction in performance, for example due to over-inlining, where some particular function gets too large, and throws out other bits of code from the cache, which harms the overall performance.
After all, your "customers" will never run your code function by function, they will run the overall product - and that's what they require to run sufficiently fast.
Tools:
First of all, you need something to tell you what the current performance is. In it's simplest form, it's just a stopwatch and timing the application from start to finish. For less dependency on the finger-speed of the benchmarker, adding some code to the application to show the time taken and perhaps "loops per second" or "bytes per second" or such will help a lot.
Once you have the "current performance data", you obviously will want to do something about it. First step is probably to make sure you have the right switches to the compiler. A debug build with no optimization can easily be 2-3x slower than a release build with full optimization. For large apps, it's also worth considering "Optimize for size" instead of "optimise for speed" - but measure on YOUR application, because there is no set rule here. Note also that once you have changed the code sufficiently, a different compiler option may turn out to be the best, compared to the first step.
During my experiments to improve things, I use Excel to record what I've done, what the result was and compare improvements.
The second step is to figure out where the code is spending time. Here is where a profiler comes in. There are two basic kinds:
1. Sampling profiler.
Interrupts in the system are instrumented to record where in the code it was currently executing when the interrupt was taken. Given sufficient runtime of the application, you should get a pretty good idea of where the app spent it's time. Vtune, CodeAnalyst and oprofile are tools in this range, as well as "writing your own" if you have an embedded system.
2. Compiler helped profiler.
This involves the compiler adding extra code to the application, which stores data about the execution time, and how many times it's hit, for each portion of code. I'm sure MS has such a feature, but I've never used it, but in gcc it's "-pg" to generate the extra code, and you use "gprof" to get the data presented in a readable form.
You can of course do this yourself too, and for "only some" functions - just create an array with entries for as many functions as you think you need, and use a macro to indicate start and end of the function call, adding up the time taken for the function.
Just be aware that adding this sort of code also slows the function down quite a bit.
So we now have some sort of histogram or "top-ten" list of where the time is spent. Take a look at the top one - it's quite often a LARGE proportion of time spent in one function, so it's usually easy. Now look at that function - how can it be improved for speed? Is it doing a linear search, where keeping a list sorted and using a binary search would make it 8, 20 or 100x faster?
Sometimes you find that the code isn't spending a lot of time anywhere, just small bits in lots and lots of functions. This makes it hard. It can be a hint that perhaps you have too many really small functions. Or that it's hard to optimize this code in general - perhaps it's already pretty well optimized?
Finally, you have to "stop somewhere". For large projects, it's almost always possible to continue "forever" to improve the performance, but you have to set a goal, or set a time-limit for your improvements.
I'm sure others have other ideas.
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
I would say the easiest way is to write a simple test 'stub' program which call the
algorithms/functions a set number of times and then simply time it with a clock or
wrist watch or whatever.
Then change the number of times it is called form say 10 to 100 to 1000 or 1,000,000
or whatever so you get a value you can easily measure, which for example takes about
10 minutes or so. Then just divide the time by the number of iterations.
Obviously there may be a slight overhead in loading the program but that will be constant
for evey run. So you might get results like
10 runs + constant = D1 seconds
100 runs + constant = D2 seconds
1000 runs + constant = D3 seconds
100,000 runs + constant = D4 seconds etc..
So it will be fairly simple to work out what the constant value is and hence the run time
accurately. Anyhow after 10 minutes the load time will be neglible anyway.
Also this is will help smooth out the results as the run time will be affected by other
activities your computer has to perform such as processing spurious network activity etc...
matsp, thanks for the overview. I've been meaning to profile my C++ code in Xcode on a mac. I know it can be done, but haven't dug into it yet. I'm again motivated. Thanks!
Todd
Mainframe assembler programmer by trade. C coder when I can.
On i686 and later:
This measures the clock cycles spent since computer boot.Code:uint64_t rdtsc() { uint32_t lo, hi; /* We cannot use "=A", since this would use %rax on x86_64 */ __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); return (uint64_t)hi << 32 | lo; }
Simply, subtract two consecutive calls and divide by your CPU frequency.
It doesn't get more accurate.
A typical example of ...cheap programming practices.Code:... goto johny_walker_red_label; johny_walker_blue_label: exit(-149$); johny_walker_red_label : exit( -22$);