![]() |
| | #1 |
| Registered User Join Date: Jan 2008
Posts: 69
| benchmarking? |
| cs32 is offline | |
| | #2 |
| Woof, woof! Join Date: Mar 2007 Location: Australia
Posts: 3,295
| I'm certainly no expert on the matter, However, I will put my 2 cents in on how I do it (I only really need an approximation). On Linux I use time to measure the running time of my program (usually I 'benchmark' algorithms in one program by themselves -- eliminates the need for 'internal timing'). Otherwise I'd use time() or some more accurate time function like setitimer(). However you should be aware of caching, for example if you time an application once, and then do it again the 2nd time is probably going to be quicker because it's cached. I'm sure someone will reinforce what I've said, or shoot it out of the sky |
| zacs7 is offline | |
| | #3 |
| Kernel hacker Join Date: Jul 2007 Location: Farncombe, Surrey, England
Posts: 15,686
| The simple question leads to a long answer ![]() Benchmarking and performance improvement is an iterative process, you have to go through the same steps many times. The first thing to consider is: * What are you trying to achieve? Is the goal to improve the performance of your code, or something you can give to customers to measure the performance of their systems? The quality of the benchmark product would depend on the chosen goal. * Micro or macro benchmark? Do you want to measure the performance of the WHOLE application, or individual functions? There are benefits in both. In the former, you probably need some sort of script or similar to "run" the application, or a suitable "typical" input file, perhaps. In the latter case, you will have to write some code/script to perform specific functions (e.g. draw the same picture hundreds of times on the screen, output 100MB to a file, or whatever you want to achieve). It's also worth noting that some forms of micro-benchmarks can lead to "overoptimization" - waste of time, and sometimes also leading to overall reduction in performance, for example due to over-inlining, where some particular function gets too large, and throws out other bits of code from the cache, which harms the overall performance. After all, your "customers" will never run your code function by function, they will run the overall product - and that's what they require to run sufficiently fast. Tools: First of all, you need something to tell you what the current performance is. In it's simplest form, it's just a stopwatch and timing the application from start to finish. For less dependency on the finger-speed of the benchmarker, adding some code to the application to show the time taken and perhaps "loops per second" or "bytes per second" or such will help a lot. Once you have the "current performance data", you obviously will want to do something about it. First step is probably to make sure you have the right switches to the compiler. A debug build with no optimization can easily be 2-3x slower than a release build with full optimization. For large apps, it's also worth considering "Optimize for size" instead of "optimise for speed" - but measure on YOUR application, because there is no set rule here. Note also that once you have changed the code sufficiently, a different compiler option may turn out to be the best, compared to the first step. During my experiments to improve things, I use Excel to record what I've done, what the result was and compare improvements. The second step is to figure out where the code is spending time. Here is where a profiler comes in. There are two basic kinds: 1. Sampling profiler. Interrupts in the system are instrumented to record where in the code it was currently executing when the interrupt was taken. Given sufficient runtime of the application, you should get a pretty good idea of where the app spent it's time. Vtune, CodeAnalyst and oprofile are tools in this range, as well as "writing your own" if you have an embedded system. 2. Compiler helped profiler. This involves the compiler adding extra code to the application, which stores data about the execution time, and how many times it's hit, for each portion of code. I'm sure MS has such a feature, but I've never used it, but in gcc it's "-pg" to generate the extra code, and you use "gprof" to get the data presented in a readable form. You can of course do this yourself too, and for "only some" functions - just create an array with entries for as many functions as you think you need, and use a macro to indicate start and end of the function call, adding up the time taken for the function. Just be aware that adding this sort of code also slows the function down quite a bit. So we now have some sort of histogram or "top-ten" list of where the time is spent. Take a look at the top one - it's quite often a LARGE proportion of time spent in one function, so it's usually easy. Now look at that function - how can it be improved for speed? Is it doing a linear search, where keeping a list sorted and using a binary search would make it 8, 20 or 100x faster? Sometimes you find that the code isn't spending a lot of time anywhere, just small bits in lots and lots of functions. This makes it hard. It can be a hint that perhaps you have too many really small functions. Or that it's hard to optimize this code in general - perhaps it's already pretty well optimized? Finally, you have to "stop somewhere". For large projects, it's almost always possible to continue "forever" to improve the performance, but you have to set a goal, or set a time-limit for your improvements. I'm sure others have other ideas. -- Mats
__________________ Compilers can produce warnings - make the compiler programmers happy: Use them! Please don't PM me for help - and no, I don't do help over instant messengers. |
| matsp is offline | |
| | #4 | |
| Fountain of knowledge. Join Date: May 2006
Posts: 662
| Quote:
I would say the easiest way is to write a simple test 'stub' program which call the algorithms/functions a set number of times and then simply time it with a clock or wrist watch or whatever. Then change the number of times it is called form say 10 to 100 to 1000 or 1,000,000 or whatever so you get a value you can easily measure, which for example takes about 10 minutes or so. Then just divide the time by the number of iterations. Obviously there may be a slight overhead in loading the program but that will be constant for evey run. So you might get results like 10 runs + constant = D1 seconds 100 runs + constant = D2 seconds 1000 runs + constant = D3 seconds 100,000 runs + constant = D4 seconds etc.. So it will be fairly simple to work out what the constant value is and hence the run time accurately. Anyhow after 10 minutes the load time will be neglible anyway. Also this is will help smooth out the results as the run time will be affected by other activities your computer has to perform such as processing spurious network activity etc... | |
| esbo is offline | |
| | #5 |
| Jack of many languages Join Date: Nov 2007 Location: Katy, Texas
Posts: 2,071
| matsp, thanks for the overview. I've been meaning to profile my C++ code in Xcode on a mac. I know it can be done, but haven't dug into it yet. I'm again motivated. Thanks! Todd
__________________ Mac and Windows cross platform programmer. Ruby lover. |
| Dino is offline | |
| | #6 |
| uint64_t...think positive Join Date: Feb 2008 Location: Pacem
Posts: 355
| On i686 and later: Code: uint64_t rdtsc()
{
uint32_t lo, hi;
/* We cannot use "=A", since this would use %rax on x86_64 */
__asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
return (uint64_t)hi << 32 | lo;
}
Simply, subtract two consecutive calls and divide by your CPU frequency. It doesn't get more accurate.
__________________ Code: ...
goto johny_walker_red_label;
johny_walker_blue_label: exit(-149$);
johny_walker_red_label : exit( -22$);
|
| xuftugulus is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Benchmarking | redruby147 | C Programming | 4 | 06-18-2009 07:44 AM |
| Benchmarking c compilers | freevryheid | C Programming | 6 | 01-23-2009 05:41 PM |
| Benchmarking and Optimisation Q's | studiesrule | C++ Programming | 11 | 10-19-2006 07:57 AM |
| benchmarking | codegeek | C Programming | 8 | 02-15-2005 03:30 PM |
| Benchmarking | RoD | Tech Board | 5 | 10-14-2002 02:04 PM |