Thread: Questions on Speed of Execution

  1. #16
    Unregistered
    Guest
    Actually, I've found if/else to be more useful than switch. Mostly because it's easier to read and can be slightly more compact.
    Code:
    switch(foo){
      case 1:
        bar();
        break;
      case 2:
        baz();
        break;
      case 3:
        foob();
        break;
      default:
        printf("Bad input\n");
    }
    Isn't as short or easy to follow as
    Code:
    if(foo == 1)
      bar();
    else if(foo == 2)
      baz();
    else if(foo == 3)
      foob();
    else
      printf("Bad input\n");
    Though the if/else may be less efficient in that several comparisons are made if foo is the last option. I wouldn't mind seeing which is actually more efficient in this case but most of the time you don't need your program to run as fast as can be done and readability is moer important. I've also seen in the ISO C standard that break is not recommended for good code because it severely hampers the flow of the program, this may be correct or not, but I think if/else is easier to both code and read.

  2. #17
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    I take it back - my last set of figures are bogus

    I checked the assembly code of the optimised version, and there was no trace of the case statement. The optimiser has determined that since 'a' is only calculated and not used, there was no point in having the code in there to start with.

    To make 'a' used, I added this at the end
    Code:
    printf( "Result=%f\n", a );
    The revised figures are
    gcc cpr-test.c
    a.exe
    Program took 2.20 secs for 10000000 loopsResult=1000.000000
    a.exe
    Program took 2.09 secs for 10000000 loopsResult=1000.000000
    a.exe
    Program took 2.09 secs for 10000000 loopsResult=1000.000000

    gcc -O2 cpr-test.c
    a.exe
    Program took 2.20 secs for 10000000 loopsResult=1000.000000
    a.exe
    Program took 2.20 secs for 10000000 loopsResult=1000.000000
    a.exe
    Program took 2.31 secs for 10000000 loopsResult=1000.000000

    Be careful optimising test code - it may remove some of it (and give you the wrong impression).

  3. #18
    Registered User wavering's Avatar
    Join Date
    Dec 2001
    Posts
    26
    Unregistered,

    Firstly, concerning the original code I posted I should have pointed out that each of the eight cases of the switch IS visited as j%8 ( ie j Mod 8 ) cycles between 0 to 7 and back down to zero again. The code thus alternately sets "a" to a*1.01 and a/1.01 so "a" remains the same - who said computer programs had to be useful?

    The problem with replacing switch with if/else is that if you are looking at 8 alternatives it means that on average you do 4 separate tests. Even so I have found from time trials in the past that if/else is a bit quicker than switch. What I have found fastest to date is using indirect jmp in asm ( this was shown in a previous thread of mine about inline asm - "inline asm - I love it" ) if you are interested.

    Another way if you were choosing between say 16 case is to divide into < and > 8 and then < and > 4 and so on. This cuts down the 8 tests to about 4 or 5. If you try this in asm with a lot of cases ( as I have ) it gets confusing as asm only allows 127 byte conditional jumps and so you have to get round that by using unconditional jmps as well....

    Here is an example of an 8 way switch in asm where the short jump problem is not a limitation:

    Code:
         bits = code[liner][1];   //RHS
          _asm{
    	       mov  ax,bits
    	       cmp  ax,3         ;bits is 0 to 7 incl
    	       jl   int11        ;jump if bits is 0, 1 or 2
    	       jg   int12        ;jump if bits is 4, 5, 6 or 7
    	       je   asm13        ;jump out as bits = 3
    	int11: cmp  ax,1         ;bits is 0, 1 or 2
    	       jl   asm10        ;jump out as bits = 0
    	       je   asm11        ;jump out as bits = 1
    	       jmp  asm12        ;jump out as bits = 2
    	int12: cmp  ax,5         ;bits is 4, 5, 6 or 7
    	       jl   asm14        ;jump out as bits = 4
    	       je   asm15        ;jump out as bits = 5
    	       cmp  ax,6         ;bits is 6 or 7
    	       je   asm16        ;jump out as bits = 6
    	       jmp  asm17        ;jump out as bits = 7
          }
          asm10:
    	r1 = a;
    	goto ext2;
          asm11:
    	r1 = b;
    	goto ext2;
          asm12:
    	r1 = c;
    	goto ext2;
          asm13:
    	r1 = d;
    	goto ext2;
          asm14:
    	r1 = x;
    	goto ext2;
          asm15:
    	r1 = 1;
    	goto ext2;
          asm16:
    	r1 = 2;
    	goto ext2;
          asm17:
    	r1 = 3;
    
          ext2: bits = code[liner][2];            //RHS 2
    In this kind of example you could replace the whole lot with an array so that r1 = array[j]; but that is slower still ....

  4. #19
    Sayeh
    Guest
    There are other things you can do in order to optimize--

    Remember, C is a syntax, and as such is "converted" to some form of assembler-- which is _not_ ideal for performance.

    If you want to make your program run faster, in the C code portion (to help the compiler compile it more effeciently, and the processor work with it easier), then try these things--

    Use the 'register' keyword on your variables. Otherwise, your variables are always referred from the stackframe with an offset.

    Another thing is to make all your variables 32-bits (even if you only count from 1-10). This way the compiler won't have to mask and/or shift.

    Anything that you calculate that generates the same results everytime the program is run (such as "j%8"), you should put in a table-- a table can be accessed faster than the math can be calculated.

    Finally, eliminate testing & calculations. Anytime you can copy a code block 'n' times in a row, instead of using an if or a case or a do or a while, you will increase speed because you're not doing a test. Anytime you have calculations, try to code your logic such that you do the calculation _once_ at the beginning of your algorithm, so that calculations aren't being done in the middle of the run.

    enjoy.

  5. #20
    Registered User wavering's Avatar
    Join Date
    Dec 2001
    Posts
    26
    Thanks sayeh

    >Use the 'register' keyword on your variables.

    Yup - done that already

    >Another thing is to make all your variables 32-bits (even if you only count from 1-10). This way the compiler won't have to mask and/or shift.

    Now that I did NOT realise. Do you think that applies in asm as well? Should I use eax instead of ax etc?

    >Anything that you calculate that generates the same results everytime the program is run (such as "j%8"), you should put in a table-- a table can be
    accessed faster than the math can be calculated.

    Yes. Have already gone to great lengths to avoid calculations BUT have found that reference to arrays ( is that what you mean by tables or is there something else here? ) is pretty slow ( as are a lot of C built in functions such as fabs - much quicker to cover both the positive and negative case with ifs )

    >Finally, eliminate testing & calculations.

    Yes - have gone to great lengths to avoid - my code looks like beginners BASIC with lots of repeated code and gotos! But quicker ... I guess there is a price to pay for elegance.

    Concerning compilers am now using MS Visual C++ compiler ( nice IDE ) which copes with "ordinary" C and my benchmark figures for the original code I posted are now:

    266 Mhz Machine 2.2 secs
    800 MHz Machine 0.8 secs

    So is now in line with other compilers people have mentioned. Have not got my main program working as this compiler is an awful lot fussier that QuickC and the thing is festooned with warnings and has crashed so far ... but I live in hope

  6. #21
    Banned Troll_King's Avatar
    Join Date
    Oct 2001
    Posts
    1,784
    Trying to measure speed of execution on a professional operating sytem is a guessing game. You can get an idea, but that's about it.

  7. #22
    Sayeh
    Guest
    > Now that I did NOT realise. Do you think that applies
    > in asm as well? Should I use eax instead of ax etc?

    yes, it can. I used the word 'compiler', I meant CPU. The CPU _always_ prefers to work in its native size-- which in most systems today is 32-bits.

    > ...avoid calculations BUT have found that reference
    > to arrays ( is that what you mean by tables...

    Don't use conventional arrays. The compiler can access these things in a very sloppy manner sometimes.

    Just put your data in a 'long' word array. Get the base address and add the offset yourself. If you can find away to eliminate the offset, so much the better. This method is _always_ faster than any tests (such as 'if').

    Also, _do not_ use local variables. These are allocated on the stack and referenced via offset on the stack. This can create a varying performance hit.

  8. #23
    Registered User wavering's Avatar
    Join Date
    Dec 2001
    Posts
    26

    Smile

    Thanks everybody for your help.

    I don't know how many people are interested in speed of execution and/or inline asm but for those who are, here is a part disassembly of the switch statement courtesy of the MS VC++ compiler. The lines numbered 924: etc are C code and the stuff following each of these lines is the asm equivalent.

    Code:
    924:        switch(code[liner][4]){
    0040382B   mov         ecx,dword ptr [ebp-6Ch]
    0040382E   and         ecx,0FFFFh
    00403834   shl         ecx,4
    00403837   xor         edx,edx
    00403839   mov         dx,word ptr code+8 (00434f9c)[ecx]
    00403840   mov         dword ptr [ebp-0C0h],edx
    00403846   cmp         dword ptr [ebp-0C0h],0Fh
    0040384D   ja          $L53487+39h (00403b9e)
    00403853   mov         eax,dword ptr [ebp-0C0h]
    00403859   jmp         dword ptr [eax*4+403CDCh]
    925:          case 0:
    926:            ax=rhs;
    00403860   mov         ecx,dword ptr [ebp-48h]
    00403863   mov         dword ptr [ebp-10h],ecx
    00403866   mov         edx,dword ptr [ebp-44h]
    00403869   mov         dword ptr [ebp-0Ch],edx
    927:            liner++;
    0040386C   mov         ax,word ptr [ebp-6Ch]
    00403870   add         ax,offset $L53439+12h (00403872)
    00403874   mov         word ptr [ebp-6Ch],ax
    928:            break;
    00403878   jmp         $L53487+59h (00403bbe)
    929:          case 1:
    930:            if (ax==rhs) liner=(unsigned short)(code[liner][6] + 1);
    0040387D   fld         qword ptr [ebp-10h]
    00403880   fcomp       qword ptr [ebp-48h]
    00403883   fnstsw      ax
    00403885   test        ah,40h
    00403888   je          $L53440+2Bh (004038a8)
    0040388A   mov         ecx,dword ptr [ebp-6Ch]
    0040388D   and         ecx,0FFFFh
    00403893   shl         ecx,4
    00403896   xor         edx,edx
    00403898   mov         dx,word ptr code+0Ch (00434fa0)[ecx]
    0040389F   add         edx,1
    004038A2   mov         word ptr [ebp-6Ch],dx
    931:              else
    004038A6   jmp         $L53440+37h (004038b4)
    932:                liner++;
    004038A8   mov         ax,word ptr [ebp-6Ch]
    004038AC   add         ax,offset $L53440+31h (004038ae)
    004038B0   mov         word ptr [ebp-6Ch],ax
    933:            break;
    004038B4   jmp         $L53487+59h (00403bbe)
    934:          case 2:
    935:            if (ax<rhs) liner=(unsigned short)(code[liner][6] + 1);
    004038B9   fld         qword ptr [ebp-10h]
    004038BC   fcomp       qword ptr [ebp-48h]
    004038BF   fnstsw      ax
    004038C1   test        ah,1
    004038C4   je          $L53444+2Bh (004038e4)
    004038C6   mov         ecx,dword ptr [ebp-6Ch]
    004038C9   and         ecx,0FFFFh
    004038CF   shl         ecx,4
    004038D2   xor         edx,edx
    004038D4   mov         dx,word ptr code+0Ch (00434fa0)[ecx]
    004038DB   add         edx,1
    004038DE   mov         word ptr [ebp-6Ch],dx
    936:              else
    004038E2   jmp         $L53444+37h (004038f0)
    937:                liner++;
    004038E4   mov         ax,word ptr [ebp-6Ch]
    004038E8   add         ax,offset $L53444+31h (004038ea)
    004038EC   mov         word ptr [ebp-6Ch],ax
    938:            break;
    004038F0   jmp         $L53487+59h (00403bbe)
    939:          case 3:
    .........
    lots more!
    .........
    
    1001:         default:
    1002:           conloc(1,22);
    00403B9E   push        16h
    00403BA0   push        1
    00403BA2   call        @ILT+90(_conloc) (0040105f)
    00403BA7   add         esp,8
    1003:           puts("Error 1 in scorer - default case!");
    00403BAA   push        offset string "Error 1 in scorer - default case"... (00430274)
    00403BAF   call        puts (00404b60)
    00403BB4   add         esp,4
    1004:           exit(0);
    00403BB7   push        0
    00403BB9   call        exit (00405080)
    1005:       }
    1006:       if(liner>length)
    00403BBE   mov         ecx,dword ptr [ebp-6Ch]
    00403BC1   and         ecx,0FFFFh
    00403BC7   xor         edx,edx
    00403BC9   mov         dx,word ptr [length (00434f92)]
    00403BD0   cmp         ecx,edx
    00403BD2   jle         $L53487+71h (00403bd6)
    1007:         goto success;             // .. acceptable end of "prog"
    00403BD4   jmp         success (00403be2)
    1008:
    1009:     }                             //end of "while" loop ( from old progrun )
    00403BD6   jmp         scorer+112h (00403612)
    1010:     return(FALSE);                //.. is looping too long
    00403BDB   xor         eax,eax
    00403BDD   jmp         success+0E6h (00403cc8)
    1011:
    1012:     success:
    From the above you can see ( I think ) that the heart of the switch() statement is:

    00403859 jmp dword ptr [eax*4+403CDCh]

    which is an indirect jump. I am also surprised to see that:

    liner++; translates to:

    0040386C mov ax,word ptr [ebp-6Ch] //Put the value of liner into the ax register
    00403870 add ax,offset $L53439+12h (00403872) // and add one ...
    00403874 mov word ptr [ebp-6Ch],ax // put the value of ax back into liner

    The use of "offset $L53439+12h (00403872)" instead of just adding on a bit appears cumbersome to say the least but there is no doubt a good reason.

    As for my goto in line 1007 the excuse is that I am exiting from a loop ... Incidentally, I use variables ax, bx, cx and dx as C variables so don't confuse these with registers.

    Hope this is of interest

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Execution speed test in c++
    By nick2price in forum C++ Programming
    Replies: 1
    Last Post: 03-12-2009, 04:23 PM
  2. Execution Time - Rijandael encryption
    By gamer4life687 in forum C++ Programming
    Replies: 5
    Last Post: 09-20-2008, 09:25 PM
  3. I am very new . . . :(
    By Eternalglory47 in forum C++ Programming
    Replies: 6
    Last Post: 09-05-2008, 11:29 AM
  4. A very long list of questions... maybe to long...
    By Ravens'sWrath in forum C Programming
    Replies: 16
    Last Post: 05-16-2007, 05:36 AM
  5. Flight Simulator Wind Speed!!
    By Dilmerv in forum C++ Programming
    Replies: 6
    Last Post: 03-20-2006, 12:40 AM