So does assembler. Pardon me but what are you talking about?? How do you suggest we execute programs without opcodes??? What do you think a compiled program is???Reason being, is that any HLL will just use the processor op codes.
All assembler, all C/C++, all of any language uses opcodes/instructions -> they are synonymous. The opcode is the hexadecimal equivalent of the instruction or vice versa. The instruction is the English equivalent of the hexadecimal opcode. The hexadecimal opcode is the base 16 version of the binary opcode.
False.These numbers are off the top of my head, but for the Pentium 4, addition, subtraction and multiplication are around 5, which isn't going to get faster for the most part.
And how do you know how long each instruction takes?? Intel stopped releasing and publishing clock timing information for opcodes after the 486 manuals. Timing is not constant in all cases and cannot be relied upon - hence they offer performance timers to the programmer so that they can clock the code.
No timing information is given for these opcodes in the IA32 or IA64 manuals.
False. Actually on newer FPUs floating point operations are faster than integer ops. Perhaps your statement was true in years past but it is no longer so. There are floating point operations that actually run faster than their floating point counterparts due to the gains in technology in FPUs and the on-chip implentation of floating point math. Profile it.These number are on integers, and as vBladeRunnervE said, floating point numbers are uglier.
False. You cannot bit shift any floating point data type. Bit shifting is not supported in floating point registers and should not be relied upon. Floating point is represented in such a way that certain numbers simply cannot be represented accurately in floating point. In other words there is no binary equivalent for the exact value. For more information consult the IA32 manual - specifically concerning numerical applications and the formula for converting binary to floating point.Sometimes you can get away with using some of your own bit operations in an HLL to lower these counts. However, one can use use bit rotations that are allowed in asm in order to get really quick calculations. Bit rotation cannot be done quickly without using asm commands.
fld [floatingpointvalue]
shr ?????
Since SHR and SHL both operate on integral registers there is no way to bit shift the local real that lies in ST(0). Bit shifting is not defined for floating point and never wil be.
Also in C this:
unsigned int X2=x>>1;
is the same as:
mov eax,[x]
shr eax,1
No difference at all. The compiler will automatically see the integral data type and use a bit shift if possible in assembly. If it cannot then it will check to see if it can use multiple bit shifts added together. If not...then it will do a DIV. For instance value*320 can be done as a sequence of bit shifts like this (value<<6)+(value<<8). Since bit rotations are allowed both by C/C++ and in assembly language I have no idea what you are talking about.
C was designed to be able to do anything and everything you could do in assembler yet gain the benefits of being a structured language and thereby reducing development time and increasing productivity - at least one of the reasons it was designed. But the statement that C does not allow bit shifts is an extremely uninformed one. The days of gaining 30 to 40 FPS simply by using integer data types are gone my friends. The bottleneck today is getting the vertex data to the GPU. What you want is a burst of data sent once to the GPU per frame. Very limited numbers of DrawPrimitive calls in D3D, complex meshes that contain nearly all of the geometric data for the level, etc. Cut down on state changes and state block changes - such as:
D3D offers a state block mechanism in which blocks of states can be created, instantiated, and destroyed. This allows a very compact method of changing the render states. They also offer effect files which will cut down on render states. Also the pixel/shader assembler language has been replaced by HLSL or high level shader language and OpenGL has an equivalent high level language for this as well.Code://Render all alphablended surfaces Device->SetRenderState(D3DRS_ALPHABLENDENABLE,true); .....render all alpha blend objects //Device->SetRenderState(D3DRS_ALPHABLENDENABLE,false);
Saying that assembly is always faster than C would be like saying vertex/pixel shaders written in pure opcodes will always be faster than those in HLSL - simply not true.
There are places where hand tuned assembly will yield better results than C/C++ but unless you have:
For optimization methods and tricks I would recommend reading several books over at amazon.com. There are lots of them.
- Profiled the code and located the bottleneck
- Tried different algos and code structures as well as code logic
- Ensured there are no memory leaks or hidden bugs(pointers, exceptions, etc) causing frame rate and/or execution slow downs
- Have enabled all the optimizations in the compiler
- Have ensured you are building with no compiler run-time error checking
Then you have no business coding the algo in asm. Assembly is not always the answer. It is, again, a tool that can be used and abused. All of the ideas you have suggested concerning optimizations are not what needs to be optimized on modern day systems. These are not the bottleneck and hence a perfect illustration of why people should not just rely on assembler for optimization.
For more information consult Randall Hyde's books, IA32 and IA64 Itanium manuals, x86 AMD manuals, MMX manual (Intel corportation), SSE/SSE2 manuals (compiled into volume 3/4, instruction set reference of Intel's IA32 and IA64 tech refs).
Before you begin to discuss pros/cons between two languages it might be a good idea to research them prior to making statements about either of them.
And since I know Ev he would never say that you cannot do bit shifts in C or suggest that a floating point divide/multiply could be done by a bit shift and I'm sure he realizes that floating point on todays FPUs is slamming fast.
Finally:
False. Instruction execution depends a lot upon the context/state of the processor before the instructions are executed. The timings are not identical in every case which is probably why Intel no longer leaves a column for each opcode's cycle count. It is probably undefined w/o known register states.This concept is true for all cases unless you're testing different processors. Some processors are just plain faster, and some may work better with certain algorithms just because the curcuitry is built differently. However, that
curcuitry is a giant algorithm on its lonesome.