I'm back with my little off-topic.
The test i did was done using MSVC++ 6.0, with build configuration set to "Release" and optimization set to "maximize speed" (default with release configuration). Couldn't do the test using gcc since i don't know how to write assembly for it. I redid the test in 3 separate runs (4 runs for each version) and got similar results:
Note that it's possible to tweak the assembly version even more by writing doubleword instead of byte at a time. (but you could also do that in C)
Classic (for loop) : 2213, 2173, 2163, 2233
Assembly : 260, 250, 250, 250
Classic (while loop): 2203, 2133, 2133, 2143
If you want, you could test it on your own comp. Would be curious of the result if someone does so. But i'm not really surprise of the result, since the IA-32 instruction set comes with some really efficient string instruction that i bet compiler make only minor use (because of their really specific nature).
I mean, just take a look at the code generated by the compiler (i added comments):
193: for (i = 0; i < TAILLE; i++)
0040C6AA C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 // i = 0
0040C6B1 EB 09 jmp main+5Ch (0040c6bc)
0040C6B3 8B 45 F0 mov eax,dword ptr [ebp-10h] // EAX = i
0040C6B6 83 C0 01 add eax,1 // EAX++
0040C6B9 89 45 F0 mov dword ptr [ebp-10h],eax // i = EAX
0040C6BC 81 7D F0 00 E1 F5 05 cmp dword ptr [ebp-10h],5F5E100h // if (i >= TAILLE)
0040C6C3 7D 0B jge main+70h (0040c6d0) // jump out of the loop
195: tab[i] = VALEUR;
0040C6C5 8B 4D F4 mov ecx,dword ptr [ebp-0Ch] // ECX = tab
0040C6C8 03 4D F0 add ecx,dword ptr [ebp-10h] // ECX = ECX + i
0040C6CB C6 01 1E mov byte ptr [ecx],1Eh // [ecx] = 12
0040C6CE EB E3 jmp main+53h (0040c6b3) // jmp at start of the loop
I think it's pretty clear why it's so fast.