Thread: approx. # CPU cycles

  1. #1
    Slave MadCow257's Avatar
    Join Date
    Jan 2005

    approx. # CPU cycles

    I don't know how this works at all, but I need a very rough estimation of how many cycles it would take to complete these three things.
    1) Comparison of a 106 digit variable and a 106 digit constant
    2) Finding the remainder of the division between a 106 digit variable and a 212 digit constant
    3) Addition of a 106 digit variable and 2
    My friend told me it was in the order of 60 cycles but something about the length of those numbers and the division is making me really doubt the accuracy of that.


  2. #2
    I am the worst best coder Quantrizi's Avatar
    Join Date
    Mar 2002
    What processor speed? What processor? How much memory? How much RAM? How much ROM?

    This question cannot be answered, since we know nothing of the specs (since all processors run at different rates).

  3. #3
    Slave MadCow257's Avatar
    Join Date
    Jan 2005
    On a dual core machine with two pentium 3 1 ghzs, one to run the system and one to run that.
    RAM - 128 megs
    This may show how little I know, but most of those things don't really affect how many cycles it would take does it?

  4. #4
    Carnivore ('-'v) Hunter2's Avatar
    Join Date
    May 2002
    That is VERY vague. I won't pretend to know what goes on under the hood, but there are about a million ways to:
    -store the variables, and therefore to compare them also
    -Find the remainder etc. etc.
    -Add 2 to a large number

    Depending on exactly how you plan on implementing each of these, the number of cycles required will likely vary enough to render any estimation useless. Anyway, there will be best and worst case scenarios for each, which will also have significant differences in CPU usage - say, if two numbers differ at the first digit, it will take vastly less time to compare for equality than if they are exactly equal. Etc. etc. etc.

    Provide some more detail, and someone who knows more than me will help you out

    And, the processor type will likely affect how many cycles it will take. Some processors are smarter than others, and will optimize away a bunch of operations before running them.

    The CPU speed will have no effect on the number of cycles. Video card may have an effect depending on how and where the algorithm is run, though only in very rare and probably non-existent cases. Same with sound card. RAM probably won't make a difference, unless you have 1kb of RAM and you want your program to fit inside that space. I have no idea what ROM does, so no comment about that.
    Last edited by Hunter2; 01-04-2005 at 06:27 PM.
    Just Google It. √

    (\ /)
    ( . .)
    c(")(") This is bunny. Copy and paste bunny into your signature to help him gain world domination.

  5. #5
    Registered User Kybo_Ren's Avatar
    Join Date
    Sep 2004
    If you have a Pentium-class or better processor, you can use the RDTSC assembly instruction.

    Here's some example code:
    typedef  struct _BinInt32
    __int32 i32[2];
    } BigInt32;
    typedef  struct _BigInt64
       __int64 i64;
    } BigInt64;
    typedef union _bigInt
       BigInt32 int32val;
       BigInt64 int64val;
    } BigInt;
    long long benchmark()
       BigInt start_ticks, end_ticks;
           mov start_ticks.int32val.i32[0], eax
           mov start_ticks.int32val.i32[4], edx
       }//force serialization, start the tick count
       myfunc();//replace this with your function name
           mov end_ticks.int32val.i32[0], eax
           mov end_ticks.int32val.i32[4], edx
       }//force serialization, end the tick count
       //return the total tick count
    Please note that this code isn't entirely mine.
    I took the BigInt part from some website, the serialization from the Intel manual, etc. I've found it quite useful, though, and thought I'd share it. Keep in mind it only works on some processors, though. Any Pentium or above or Athlon or above (not sure about K6-2s) will have it.

    EDIT: If your compiler doesn't recognize __int64 or __int32, then I'd say just assume a long has 32 bits and long long has 64 bits (might not work on AMD64 and AMD FX processors, though).

    EDIT2: I'd say it will take quite a bit longer than 60 cycles. Please note that the width of the processors' pipelines that this code will work on is 32 bits (and maybe 64 bits for AMD 64s and AMD FXs). You would need quite a few cycles to fit that whole number.
    Last edited by Kybo_Ren; 01-04-2005 at 06:33 PM.

  6. #6
    Slave MadCow257's Avatar
    Join Date
    Jan 2005
    And, the processor type will likely affect how many cycles it will take. Some processors are smarter than others, and will optimize away a bunch of operations before running them.
    I meant more I didn't think the amount memory, RAM, and ROM would matter.
    Anyways, I'm realizing now that this kind of question is too difficult to be answered by anything other then say, programming an efficient implementation and running it for a couple hours and see how many ititerations it completes.

    The library for big ints I use is GMP, because supposdly it handles arithmitec really well. I'm hoping it turns out this can be completed in less 200 cycles for normal cases.

    My compiler (Visual c++ 6) can handle _int32 and 64 but the problem is you can't do much with it. Like cout and everything don't allow it to be used.
    Last edited by MadCow257; 01-04-2005 at 06:33 PM.

  7. #7
    Registered User Kybo_Ren's Avatar
    Join Date
    Sep 2004
    VC6 will run that code just fine, though I forget if you have to #define the opcode for RDTSC, which I believe is 0x0F31.

  8. #8
    Hardware Engineer
    Join Date
    Sep 2001
    programming an efficient implementation and running it for a couple hours and see how many iterations it completes.
    YES! This is the easiest, quickest, and most practical way. The "answer" cannot be calculated before the code is written anyway. Then, it's easy to add a loop, counter, and timer. But, you don't need hours! If the "answer" is 200 cycles, your 3 GHz system could do 15 million iterations in one second!

    with two pentium 3 1 ghzs, one to run the system and one to run that.
    If you're running a "normal" single-threaded application, Windows will probably do this for you. You don't normally have control over how the CPUs are used. The operating system decides how system resources are allocated. (It is possible to "take control away" from the operation system by writing a kernel mode program... Windows drivers are kernel-mode). This probably won't be an issue, because the "idle" operating system doesn't require much CPU time.

    If you want to really take advantage of both processors, you could write a multi-threaded application. The operating system will then allocate the threads to the CPUs. That assumes that your algorithm can make use of multithreading... I think you can. You could run two identical threads... one comparing one set of numbers, and the other comparing another set of numbers.

    Heck, an easier way to take advantage of both processors would be to simply run two instances of your (normal single-thread) program simultaneously, on two different data-sets. …If that would accomplish whatever you’re trying to do. (?)

    You cannot know, in advance, how many CPU cycles a given C++ statement will require.
    Two different compilers will generate different machine-code for the same statement.

    Assembly language is different. Each assembly instruction translates precisely to a specific machine-language instruction. You can look-up the number of cycles required for each instruction. However, instructions that access memory can vary depending on memory speed, and more importantly, if the data is stored in cache, RAM, or disk. For memory access instructions, the processor’s user manual will give the minimum number of cycles.
    Last edited by DougDbug; 01-05-2005 at 04:33 PM.

  9. #9
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    You can use QueryPerformanceCounter() to do accurate timings from within Windows applications. There are certain performance registers and performance-based opcodes that are not available to programs running below an RPL of 1 or 2, which is why Windows provides ways to query these otherwise inaccessible counters.

    It is very important to keep in mind, however, that processor timings are not all that easy to compute anymore given the complex nature of the architecture, branch predictions schemes, and caches. Very rarely anymore is a program ran through synchronously - in fact the fetch unit is used far less than you might think in a modern CPU. Many of the newer CPUs use branch prediction and cache schemes as well as parallel opcode processing to perform operations. So unless you can simulate exactly how the CPU, cache, and internal branch prediction registers will look - your timings are an estimation at best. Also given that you are in a multi-tasking environment inside of Windows timing becomes even more difficult. You can get a pretty good idea of timings - but even Intel does not provide specific instruction timings/cycle counts in their newest literature. This is because it is dependent on so many things it's nearly impossible to say how fast any one opcode will execute at a given time. There are a lot of factors involved.
    Last edited by VirtualAce; 01-05-2005 at 04:43 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. pthread question
    By quantt in forum Linux Programming
    Replies: 7
    Last Post: 04-07-2009, 01:21 AM
  2. questions on multiple thread programming
    By lehe in forum C Programming
    Replies: 11
    Last Post: 03-27-2009, 07:44 AM
  3. Upgrading my old CPU (for another old one!)
    By foxman in forum Tech Board
    Replies: 16
    Last Post: 01-11-2008, 05:41 PM
  4. CPU temp
    By PING in forum Tech Board
    Replies: 5
    Last Post: 01-28-2006, 06:25 AM
  5. How many CPU cycles (ARM) for add vs if..then
    By Pea in forum C Programming
    Replies: 11
    Last Post: 01-26-2005, 02:40 PM