# approx. # CPU cycles

• 01-04-2005
approx. # CPU cycles
I don't know how this works at all, but I need a very rough estimation of how many cycles it would take to complete these three things.
1) Comparison of a 106 digit variable and a 106 digit constant
2) Finding the remainder of the division between a 106 digit variable and a 212 digit constant
3) Addition of a 106 digit variable and 2
My friend told me it was in the order of 60 cycles but something about the length of those numbers and the division is making me really doubt the accuracy of that.

Thanks!
• 01-04-2005
Quantrizi
What processor speed? What processor? How much memory? How much RAM? How much ROM?

This question cannot be answered, since we know nothing of the specs (since all processors run at different rates).
• 01-04-2005
On a dual core machine with two pentium 3 1 ghzs, one to run the system and one to run that.
RAM - 128 megs
This may show how little I know, but most of those things don't really affect how many cycles it would take does it?
• 01-04-2005
Hunter2
That is VERY vague. I won't pretend to know what goes on under the hood, but there are about a million ways to:
-store the variables, and therefore to compare them also
-Find the remainder etc. etc.
-Add 2 to a large number

Depending on exactly how you plan on implementing each of these, the number of cycles required will likely vary enough to render any estimation useless. Anyway, there will be best and worst case scenarios for each, which will also have significant differences in CPU usage - say, if two numbers differ at the first digit, it will take vastly less time to compare for equality than if they are exactly equal. Etc. etc. etc.

Provide some more detail, and someone who knows more than me will help you out ;)

**EDIT**
And, the processor type will likely affect how many cycles it will take. Some processors are smarter than others, and will optimize away a bunch of operations before running them.

**EDIT2**
The CPU speed will have no effect on the number of cycles. Video card may have an effect depending on how and where the algorithm is run, though only in very rare and probably non-existent cases. Same with sound card. RAM probably won't make a difference, unless you have 1kb of RAM and you want your program to fit inside that space. I have no idea what ROM does, so no comment about that.
• 01-04-2005
Kybo_Ren
If you have a Pentium-class or better processor, you can use the RDTSC assembly instruction.

Here's some example code:
Code:

```typedef  struct _BinInt32 { __int32 i32[2]; } BigInt32; typedef  struct _BigInt64 {   __int64 i64; } BigInt64; typedef union _bigInt {   BigInt32 int32val;   BigInt64 int64val; } BigInt; long long benchmark() {   BigInt start_ticks, end_ticks;   _asm   {           CPUID       RDTSC       mov start_ticks.int32val.i32[0], eax       mov start_ticks.int32val.i32[4], edx   }//force serialization, start the tick count   myfunc();//replace this with your function name     _asm   {       CPUID       RDTSC       mov end_ticks.int32val.i32[0], eax       mov end_ticks.int32val.i32[4], edx   }//force serialization, end the tick count   return(end_ticks.int64val.i64-start_ticks.int64val.i64);   //return the total tick count }```
Please note that this code isn't entirely mine.
I took the BigInt part from some website, the serialization from the Intel manual, etc. I've found it quite useful, though, and thought I'd share it. Keep in mind it only works on some processors, though. Any Pentium or above or Athlon or above (not sure about K6-2s) will have it.

EDIT: If your compiler doesn't recognize __int64 or __int32, then I'd say just assume a long has 32 bits and long long has 64 bits (might not work on AMD64 and AMD FX processors, though).

EDIT2: I'd say it will take quite a bit longer than 60 cycles. Please note that the width of the processors' pipelines that this code will work on is 32 bits (and maybe 64 bits for AMD 64s and AMD FXs). You would need quite a few cycles to fit that whole number.
• 01-04-2005
[qoute]
And, the processor type will likely affect how many cycles it will take. Some processors are smarter than others, and will optimize away a bunch of operations before running them.
[/qoute]
I meant more I didn't think the amount memory, RAM, and ROM would matter.
Anyways, I'm realizing now that this kind of question is too difficult to be answered by anything other then say, programming an efficient implementation and running it for a couple hours and see how many ititerations it completes.

EDIT
The library for big ints I use is GMP, because supposdly it handles arithmitec really well. I'm hoping it turns out this can be completed in less 200 cycles for normal cases.

EDIT2
My compiler (Visual c++ 6) can handle _int32 and 64 but the problem is you can't do much with it. Like cout and everything don't allow it to be used.
• 01-04-2005
Kybo_Ren
VC6 will run that code just fine, though I forget if you have to #define the opcode for RDTSC, which I believe is 0x0F31.
• 01-05-2005
DougDbug
Quote:

programming an efficient implementation and running it for a couple hours and see how many iterations it completes.
YES! This is the easiest, quickest, and most practical way. The "answer" cannot be calculated before the code is written anyway. Then, it's easy to add a loop, counter, and timer. But, you don't need hours! :) If the "answer" is 200 cycles, your 3 GHz system could do 15 million iterations in one second!

Quote:

with two pentium 3 1 ghzs, one to run the system and one to run that.
If you're running a "normal" single-threaded application, Windows will probably do this for you. You don't normally have control over how the CPUs are used. The operating system decides how system resources are allocated. (It is possible to "take control away" from the operation system by writing a kernel mode program... Windows drivers are kernel-mode). This probably won't be an issue, because the "idle" operating system doesn't require much CPU time.

If you want to really take advantage of both processors, you could write a multi-threaded application. The operating system will then allocate the threads to the CPUs. That assumes that your algorithm can make use of multithreading... I think you can. You could run two identical threads... one comparing one set of numbers, and the other comparing another set of numbers.

EDIT
Heck, an easier way to take advantage of both processors would be to simply run two instances of your (normal single-thread) program simultaneously, on two different data-sets. …If that would accomplish whatever you’re trying to do. (?)

BACKGROUND-
You cannot know, in advance, how many CPU cycles a given C++ statement will require.
Two different compilers will generate different machine-code for the same statement.

Assembly language is different. Each assembly instruction translates precisely to a specific machine-language instruction. You can look-up the number of cycles required for each instruction. However, instructions that access memory can vary depending on memory speed, and more importantly, if the data is stored in cache, RAM, or disk. For memory access instructions, the processor’s user manual will give the minimum number of cycles.
• 01-05-2005
VirtualAce
You can use QueryPerformanceCounter() to do accurate timings from within Windows applications. There are certain performance registers and performance-based opcodes that are not available to programs running below an RPL of 1 or 2, which is why Windows provides ways to query these otherwise inaccessible counters.

It is very important to keep in mind, however, that processor timings are not all that easy to compute anymore given the complex nature of the architecture, branch predictions schemes, and caches. Very rarely anymore is a program ran through synchronously - in fact the fetch unit is used far less than you might think in a modern CPU. Many of the newer CPUs use branch prediction and cache schemes as well as parallel opcode processing to perform operations. So unless you can simulate exactly how the CPU, cache, and internal branch prediction registers will look - your timings are an estimation at best. Also given that you are in a multi-tasking environment inside of Windows timing becomes even more difficult. You can get a pretty good idea of timings - but even Intel does not provide specific instruction timings/cycle counts in their newest literature. This is because it is dependent on so many things it's nearly impossible to say how fast any one opcode will execute at a given time. There are a lot of factors involved.