assembly language...the best tool for game programming?

**VirtualAce** · 06-21-2004

Reason being, is that any HLL will just use the processor op codes.

So does assembler. Pardon me but what are you talking about?? How do you suggest we execute programs without opcodes??? What do you think a compiled program is???

All assembler, all C/C++, all of any language uses opcodes/instructions -> they are synonymous. The opcode is the hexadecimal equivalent of the instruction or vice versa. The instruction is the English equivalent of the hexadecimal opcode. The hexadecimal opcode is the base 16 version of the binary opcode.

These numbers are off the top of my head, but for the Pentium 4, addition, subtraction and multiplication are around 5, which isn't going to get faster for the most part.

False.
And how do you know how long each instruction takes?? Intel stopped releasing and publishing clock timing information for opcodes after the 486 manuals. Timing is not constant in all cases and cannot be relied upon - hence they offer performance timers to the programmer so that they can clock the code.
No timing information is given for these opcodes in the IA32 or IA64 manuals.

These number are on integers, and as vBladeRunnervE said, floating point numbers are uglier.

False. Actually on newer FPUs floating point operations are faster than integer ops. Perhaps your statement was true in years past but it is no longer so. There are floating point operations that actually run faster than their floating point counterparts due to the gains in technology in FPUs and the on-chip implentation of floating point math. Profile it.

Sometimes you can get away with using some of your own bit operations in an HLL to lower these counts. However, one can use use bit rotations that are allowed in asm in order to get really quick calculations. Bit rotation cannot be done quickly without using asm commands.

False. You cannot bit shift any floating point data type. Bit shifting is not supported in floating point registers and should not be relied upon. Floating point is represented in such a way that certain numbers simply cannot be represented accurately in floating point. In other words there is no binary equivalent for the exact value. For more information consult the IA32 manual - specifically concerning numerical applications and the formula for converting binary to floating point.

fld [floatingpointvalue]
shr ?????

Since SHR and SHL both operate on integral registers there is no way to bit shift the local real that lies in ST(0). Bit shifting is not defined for floating point and never wil be.

Also in C this:

unsigned int X2=x>>1;

is the same as:
mov eax,[x]
shr eax,1

No difference at all. The compiler will automatically see the integral data type and use a bit shift if possible in assembly. If it cannot then it will check to see if it can use multiple bit shifts added together. If not...then it will do a DIV. For instance value*320 can be done as a sequence of bit shifts like this (value<<6)+(value<<8). Since bit rotations are allowed both by C/C++ and in assembly language I have no idea what you are talking about.

C was designed to be able to do anything and everything you could do in assembler yet gain the benefits of being a structured language and thereby reducing development time and increasing productivity - at least one of the reasons it was designed. But the statement that C does not allow bit shifts is an extremely uninformed one. The days of gaining 30 to 40 FPS simply by using integer data types are gone my friends. The bottleneck today is getting the vertex data to the GPU. What you want is a burst of data sent once to the GPU per frame. Very limited numbers of DrawPrimitive calls in D3D, complex meshes that contain nearly all of the geometric data for the level, etc. Cut down on state changes and state block changes - such as:

Code:

//Render all alphablended surfaces
Device->SetRenderState(D3DRS_ALPHABLENDENABLE,true);
.....render all alpha blend objects
 
//Device->SetRenderState(D3DRS_ALPHABLENDENABLE,false);

D3D offers a state block mechanism in which blocks of states can be created, instantiated, and destroyed. This allows a very compact method of changing the render states. They also offer effect files which will cut down on render states. Also the pixel/shader assembler language has been replaced by HLSL or high level shader language and OpenGL has an equivalent high level language for this as well.
Saying that assembly is always faster than C would be like saying vertex/pixel shaders written in pure opcodes will always be faster than those in HLSL - simply not true.

There are places where hand tuned assembly will yield better results than C/C++ but unless you have:

Profiled the code and located the bottleneck
Tried different algos and code structures as well as code logic
Ensured there are no memory leaks or hidden bugs(pointers, exceptions, etc) causing frame rate and/or execution slow downs
Have enabled all the optimizations in the compiler
Have ensured you are building with no compiler run-time error checking

For optimization methods and tricks I would recommend reading several books over at amazon.com. There are lots of them.

Then you have no business coding the algo in asm. Assembly is not always the answer. It is, again, a tool that can be used and abused. All of the ideas you have suggested concerning optimizations are not what needs to be optimized on modern day systems. These are not the bottleneck and hence a perfect illustration of why people should not just rely on assembler for optimization.

For more information consult Randall Hyde's books, IA32 and IA64 Itanium manuals, x86 AMD manuals, MMX manual (Intel corportation), SSE/SSE2 manuals (compiled into volume 3/4, instruction set reference of Intel's IA32 and IA64 tech refs).

Before you begin to discuss pros/cons between two languages it might be a good idea to research them prior to making statements about either of them.

And since I know Ev he would never say that you cannot do bit shifts in C or suggest that a floating point divide/multiply could be done by a bit shift and I'm sure he realizes that floating point on todays FPUs is slamming fast.

Finally:

This concept is true for all cases unless you're testing different processors. Some processors are just plain faster, and some may work better with certain algorithms just because the curcuitry is built differently. However, that
curcuitry is a giant algorithm on its lonesome.

False. Instruction execution depends a lot upon the context/state of the processor before the instructions are executed. The timings are not identical in every case which is probably why Intel no longer leaves a column for each opcode's cycle count. It is probably undefined w/o known register states.

**~~vNvNation~~** · 06-22-2004

Originally Posted by skorman00

I'm glad you asked that question!
The reason why is because there are many math calculations that can be done quicker in assembly than using an high level language (HLL from here on out).

Originally Posted by skorman00

The only time when one should resort to assembly is when the language you are using will not allow you to use a specific algorithm, and assembly will.

You were talking about algorithms that *can't be implemented in C/C++*, now you are saying they can be implemented in C/C++, just that they might be slower. Congrats on being inconsistent.

edit

There are floating point operations that actually run faster than their floating point counterparts due to the gains in technology in FPUs and the on-chip implentation of floating point math. Profile it.

???
Okay...

I still stick to the stance that nobody on these forums is really getting all that much out of assembly

**Jeremy G** · 06-22-2004

Originally Posted by vNvNation

You were talking about algorithms that *can't be implemented in C/C++*, now you are saying they can be implemented in C/C++, just that they might be slower. Congrats on being inconsistent.

edit

???
Okay...

I still stick to the stance that nobody on these forums is really getting all that much out of assembly

I get quite a bit out of it. When I'm coding for handhelds running on an x68 dragonball processor. Assembly is the only way to go on those bad boys.

Assembly reared its ugly useful head when palm pilots were introduced to the market. But now that technology is making the damn things more powerful, and more memory etc. Other languages are supported on the damn things. Like java for instance.

I'm expecting laptop watches real soon--I'm talkin about word processing capable LCD display watches. I'm thinking it will be a bit of a trend. I wouldn't be surprised if the first series will all be programmed with good ol assembly.

**VirtualAce** · 06-22-2004

There are floating point operations that actually run faster than their floating point counterparts due to the gains in technology in FPUs and the on-chip implentation of floating point math. Profile it.
???
Okay...

I agree. Made a lot of sense eh?

Was supposed to say that there are floating point operations that actually run faster than or as fast as their integer counterparts.

Sorry for the typo.

::bonk::

**~~bludstayne~~** · 06-22-2004

You know how much RAM would be taken up loading Vvardenfell all at once? I hope that's not what you're saying. No matter what, you'd have to stop and load somewhere. I mean, unless you want HDD operations going on while you are playing your game. I just hope you have a supercomputer for that.

**skorman00** · 06-22-2004

My knowledge of cycle counts on operations is quite outdated, so thank you for pointing that out Bubba. I thought that those numbers were for the Pentium 4, which I was way off. It was for the 486 dx!
I also did not mean to suggest that floating point numbers can be easily handled with bit shifting and such. I apologize to anyone who thought that.

Originally Posted by Bubba

The compiler will automatically see the integral data type and use a bit shift if possible

If you have a good compiler yeah, but I never trust my compiler to do things like that for me. Every now and then, I unroll small for loops just to be safe. But that's just me.

Originally Posted by vNvNation

You were talking about algorithms that *can't be implemented in C/C++*, now you are saying they can be implemented in C/C++, just that they might be slower

I did not say that they can be implimented, I said that C/C++ can do things that will have the same effect, but with a different algorithm.

Originally Posted by Bubba

How do you suggest we execute programs without opcodes??? What do you think a compiled program is???

Well of course you would still use opcodes. But like I said, some of them, the square root in particular, can be avoided. C++ will just say "call that sqrt opcode!" I have seen some algorithms that calculated the square root pretty quickly over on the GD boards, and they used bit rotation. I admit, this situation is very specific, but it's still a burden on some projects.

I'll agree with everybody saying that the gain from using assembly is pretty much NULL now, unless you're working with handhelds, or trying to do things that are "revolutionary". Nobody commented on my example of a memory manager when developing on a console, I would be interested to here all of your opinions on that (do not take that as a "ha ha, I'm right," it's more of "perhaps I am wrong? what do these guys think")

Let me throw this question out there:
Does anybody here think it's pointless to know assembly?

I for one don't, as I stated earlier.

Thread: assembly language...the best tool for game programming?

Thread Tools

Search Thread

Display

Similar Threads

Learning Assembly

Language of choice after C++

Assembly Help PLease!

C,C++,Perl,Java

Visual J#