-
Actually, I've found if/else to be more useful than switch. Mostly because it's easier to read and can be slightly more compact.
Code:
switch(foo){
case 1:
bar();
break;
case 2:
baz();
break;
case 3:
foob();
break;
default:
printf("Bad input\n");
}
Isn't as short or easy to follow as
Code:
if(foo == 1)
bar();
else if(foo == 2)
baz();
else if(foo == 3)
foob();
else
printf("Bad input\n");
Though the if/else may be less efficient in that several comparisons are made if foo is the last option. I wouldn't mind seeing which is actually more efficient in this case but most of the time you don't need your program to run as fast as can be done and readability is moer important. I've also seen in the ISO C standard that break is not recommended for good code because it severely hampers the flow of the program, this may be correct or not, but I think if/else is easier to both code and read.
-
I take it back - my last set of figures are bogus
I checked the assembly code of the optimised version, and there was no trace of the case statement. The optimiser has determined that since 'a' is only calculated and not used, there was no point in having the code in there to start with.
To make 'a' used, I added this at the end
Code:
printf( "Result=%f\n", a );
The revised figures are
gcc cpr-test.c
a.exe
Program took 2.20 secs for 10000000 loopsResult=1000.000000
a.exe
Program took 2.09 secs for 10000000 loopsResult=1000.000000
a.exe
Program took 2.09 secs for 10000000 loopsResult=1000.000000
gcc -O2 cpr-test.c
a.exe
Program took 2.20 secs for 10000000 loopsResult=1000.000000
a.exe
Program took 2.20 secs for 10000000 loopsResult=1000.000000
a.exe
Program took 2.31 secs for 10000000 loopsResult=1000.000000
Be careful optimising test code - it may remove some of it (and give you the wrong impression).
-
Unregistered,
Firstly, concerning the original code I posted I should have pointed out that each of the eight cases of the switch IS visited as j%8 ( ie j Mod 8 ) cycles between 0 to 7 and back down to zero again. The code thus alternately sets "a" to a*1.01 and a/1.01 so "a" remains the same - who said computer programs had to be useful?
The problem with replacing switch with if/else is that if you are looking at 8 alternatives it means that on average you do 4 separate tests. Even so I have found from time trials in the past that if/else is a bit quicker than switch. What I have found fastest to date is using indirect jmp in asm ( this was shown in a previous thread of mine about inline asm - "inline asm - I love it" ) if you are interested.
Another way if you were choosing between say 16 case is to divide into < and > 8 and then < and > 4 and so on. This cuts down the 8 tests to about 4 or 5. If you try this in asm with a lot of cases ( as I have ) it gets confusing as asm only allows 127 byte conditional jumps and so you have to get round that by using unconditional jmps as well....
Here is an example of an 8 way switch in asm where the short jump problem is not a limitation:
Code:
bits = code[liner][1]; //RHS
_asm{
mov ax,bits
cmp ax,3 ;bits is 0 to 7 incl
jl int11 ;jump if bits is 0, 1 or 2
jg int12 ;jump if bits is 4, 5, 6 or 7
je asm13 ;jump out as bits = 3
int11: cmp ax,1 ;bits is 0, 1 or 2
jl asm10 ;jump out as bits = 0
je asm11 ;jump out as bits = 1
jmp asm12 ;jump out as bits = 2
int12: cmp ax,5 ;bits is 4, 5, 6 or 7
jl asm14 ;jump out as bits = 4
je asm15 ;jump out as bits = 5
cmp ax,6 ;bits is 6 or 7
je asm16 ;jump out as bits = 6
jmp asm17 ;jump out as bits = 7
}
asm10:
r1 = a;
goto ext2;
asm11:
r1 = b;
goto ext2;
asm12:
r1 = c;
goto ext2;
asm13:
r1 = d;
goto ext2;
asm14:
r1 = x;
goto ext2;
asm15:
r1 = 1;
goto ext2;
asm16:
r1 = 2;
goto ext2;
asm17:
r1 = 3;
ext2: bits = code[liner][2]; //RHS 2
In this kind of example you could replace the whole lot with an array so that r1 = array[j]; but that is slower still ....
-
There are other things you can do in order to optimize--
Remember, C is a syntax, and as such is "converted" to some form of assembler-- which is _not_ ideal for performance.
If you want to make your program run faster, in the C code portion (to help the compiler compile it more effeciently, and the processor work with it easier), then try these things--
Use the 'register' keyword on your variables. Otherwise, your variables are always referred from the stackframe with an offset.
Another thing is to make all your variables 32-bits (even if you only count from 1-10). This way the compiler won't have to mask and/or shift.
Anything that you calculate that generates the same results everytime the program is run (such as "j%8"), you should put in a table-- a table can be accessed faster than the math can be calculated.
Finally, eliminate testing & calculations. Anytime you can copy a code block 'n' times in a row, instead of using an if or a case or a do or a while, you will increase speed because you're not doing a test. Anytime you have calculations, try to code your logic such that you do the calculation _once_ at the beginning of your algorithm, so that calculations aren't being done in the middle of the run.
enjoy.
-
Thanks sayeh
>Use the 'register' keyword on your variables.
Yup - done that already
>Another thing is to make all your variables 32-bits (even if you only count from 1-10). This way the compiler won't have to mask and/or shift.
Now that I did NOT realise. Do you think that applies in asm as well? Should I use eax instead of ax etc?
>Anything that you calculate that generates the same results everytime the program is run (such as "j%8"), you should put in a table-- a table can be
accessed faster than the math can be calculated.
Yes. Have already gone to great lengths to avoid calculations BUT have found that reference to arrays ( is that what you mean by tables or is there something else here? ) is pretty slow ( as are a lot of C built in functions such as fabs - much quicker to cover both the positive and negative case with ifs )
>Finally, eliminate testing & calculations.
Yes - have gone to great lengths to avoid - my code looks like beginners BASIC with lots of repeated code and gotos! But quicker ... I guess there is a price to pay for elegance.
Concerning compilers am now using MS Visual C++ compiler ( nice IDE ) which copes with "ordinary" C and my benchmark figures for the original code I posted are now:
266 Mhz Machine 2.2 secs
800 MHz Machine 0.8 secs
So is now in line with other compilers people have mentioned. Have not got my main program working as this compiler is an awful lot fussier that QuickC and the thing is festooned with warnings and has crashed so far ... but I live in hope
:)
-
Trying to measure speed of execution on a professional operating sytem is a guessing game. You can get an idea, but that's about it.
-
> Now that I did NOT realise. Do you think that applies
> in asm as well? Should I use eax instead of ax etc?
yes, it can. I used the word 'compiler', I meant CPU. The CPU _always_ prefers to work in its native size-- which in most systems today is 32-bits.
> ...avoid calculations BUT have found that reference
> to arrays ( is that what you mean by tables...
Don't use conventional arrays. The compiler can access these things in a very sloppy manner sometimes.
Just put your data in a 'long' word array. Get the base address and add the offset yourself. If you can find away to eliminate the offset, so much the better. This method is _always_ faster than any tests (such as 'if').
Also, _do not_ use local variables. These are allocated on the stack and referenced via offset on the stack. This can create a varying performance hit.
-
Thanks everybody for your help.
I don't know how many people are interested in speed of execution and/or inline asm but for those who are, here is a part disassembly of the switch statement courtesy of the MS VC++ compiler. The lines numbered 924: etc are C code and the stuff following each of these lines is the asm equivalent.
Code:
924: switch(code[liner][4]){
0040382B mov ecx,dword ptr [ebp-6Ch]
0040382E and ecx,0FFFFh
00403834 shl ecx,4
00403837 xor edx,edx
00403839 mov dx,word ptr code+8 (00434f9c)[ecx]
00403840 mov dword ptr [ebp-0C0h],edx
00403846 cmp dword ptr [ebp-0C0h],0Fh
0040384D ja $L53487+39h (00403b9e)
00403853 mov eax,dword ptr [ebp-0C0h]
00403859 jmp dword ptr [eax*4+403CDCh]
925: case 0:
926: ax=rhs;
00403860 mov ecx,dword ptr [ebp-48h]
00403863 mov dword ptr [ebp-10h],ecx
00403866 mov edx,dword ptr [ebp-44h]
00403869 mov dword ptr [ebp-0Ch],edx
927: liner++;
0040386C mov ax,word ptr [ebp-6Ch]
00403870 add ax,offset $L53439+12h (00403872)
00403874 mov word ptr [ebp-6Ch],ax
928: break;
00403878 jmp $L53487+59h (00403bbe)
929: case 1:
930: if (ax==rhs) liner=(unsigned short)(code[liner][6] + 1);
0040387D fld qword ptr [ebp-10h]
00403880 fcomp qword ptr [ebp-48h]
00403883 fnstsw ax
00403885 test ah,40h
00403888 je $L53440+2Bh (004038a8)
0040388A mov ecx,dword ptr [ebp-6Ch]
0040388D and ecx,0FFFFh
00403893 shl ecx,4
00403896 xor edx,edx
00403898 mov dx,word ptr code+0Ch (00434fa0)[ecx]
0040389F add edx,1
004038A2 mov word ptr [ebp-6Ch],dx
931: else
004038A6 jmp $L53440+37h (004038b4)
932: liner++;
004038A8 mov ax,word ptr [ebp-6Ch]
004038AC add ax,offset $L53440+31h (004038ae)
004038B0 mov word ptr [ebp-6Ch],ax
933: break;
004038B4 jmp $L53487+59h (00403bbe)
934: case 2:
935: if (ax<rhs) liner=(unsigned short)(code[liner][6] + 1);
004038B9 fld qword ptr [ebp-10h]
004038BC fcomp qword ptr [ebp-48h]
004038BF fnstsw ax
004038C1 test ah,1
004038C4 je $L53444+2Bh (004038e4)
004038C6 mov ecx,dword ptr [ebp-6Ch]
004038C9 and ecx,0FFFFh
004038CF shl ecx,4
004038D2 xor edx,edx
004038D4 mov dx,word ptr code+0Ch (00434fa0)[ecx]
004038DB add edx,1
004038DE mov word ptr [ebp-6Ch],dx
936: else
004038E2 jmp $L53444+37h (004038f0)
937: liner++;
004038E4 mov ax,word ptr [ebp-6Ch]
004038E8 add ax,offset $L53444+31h (004038ea)
004038EC mov word ptr [ebp-6Ch],ax
938: break;
004038F0 jmp $L53487+59h (00403bbe)
939: case 3:
.........
lots more!
.........
1001: default:
1002: conloc(1,22);
00403B9E push 16h
00403BA0 push 1
00403BA2 call @ILT+90(_conloc) (0040105f)
00403BA7 add esp,8
1003: puts("Error 1 in scorer - default case!");
00403BAA push offset string "Error 1 in scorer - default case"... (00430274)
00403BAF call puts (00404b60)
00403BB4 add esp,4
1004: exit(0);
00403BB7 push 0
00403BB9 call exit (00405080)
1005: }
1006: if(liner>length)
00403BBE mov ecx,dword ptr [ebp-6Ch]
00403BC1 and ecx,0FFFFh
00403BC7 xor edx,edx
00403BC9 mov dx,word ptr [length (00434f92)]
00403BD0 cmp ecx,edx
00403BD2 jle $L53487+71h (00403bd6)
1007: goto success; // .. acceptable end of "prog"
00403BD4 jmp success (00403be2)
1008:
1009: } //end of "while" loop ( from old progrun )
00403BD6 jmp scorer+112h (00403612)
1010: return(FALSE); //.. is looping too long
00403BDB xor eax,eax
00403BDD jmp success+0E6h (00403cc8)
1011:
1012: success:
From the above you can see ( I think ) that the heart of the switch() statement is:
00403859 jmp dword ptr [eax*4+403CDCh]
which is an indirect jump. I am also surprised to see that:
liner++; translates to:
0040386C mov ax,word ptr [ebp-6Ch] //Put the value of liner into the ax register
00403870 add ax,offset $L53439+12h (00403872) // and add one ...
00403874 mov word ptr [ebp-6Ch],ax // put the value of ax back into liner
The use of "offset $L53439+12h (00403872)" instead of just adding on a bit appears cumbersome to say the least but there is no doubt a good reason.
As for my goto in line 1007 the excuse is that I am exiting from a loop ... Incidentally, I use variables ax, bx, cx and dx as C variables so don't confuse these with registers.
Hope this is of interest
:)