1. ## Raycasting question

Just got done coding a small raycaster like the old Castle Wolfenstein - well, not the oldest Castle Wolfenstein. First one was in text mode and use the ASCII table for graphics and the PC speaker for digital sound- anyone remember this great game - best game I had on my PCjr. Well, nuff nostalgia.

I'm wondering how to raycast windows and doors with windows on one pass. I know how to floor map so can I use the same function to determine what y coord I'm on in my raycaster?

Riight now it only casts rays at one level, but I don't need it to be a voxel-type caster I just need an algorithm that will tell me what y texel I'm on. If the texel is transparent, continue casting the ray, if not - stop since we've hit a wall.

For voxels I use a 2D caster with heightmaps:

Code:
```int v1=HeightMap[dry>>8+drx];
int v2=HeightMap[dry>>8+(drx+1)];
int v3=HeightMap[(dry+1)>>8+drx];
int v4=HeightMap[(dry+1)>>8+(drx+1)];

//Bilinear filter the heights based on distance in cell on x and z
int y=BI(v1,v2,v3,v4,fx,fz);

int ViewHeight=-y+ManHeight;  //ManHeight is height of viewer
int screeny=((y+ViewHeight)*PerspScale)/dist+halfscreenheight;```

But I don't really need wall heights in my code since it is not a DOOM style caster. And I don't want to cast in 3 dimensions either (dry=-tan(VFOV/2); if (dry<VoxelHeight) ...) cuz it's too slow.

Anyone have any suggestions?

2. Yes, here is how it is done. this will allow you to put webbing, or any other semi-transparent wall into your dungeon.

-----

For our purposes, we will assume you are using texture maps that are 64 x64 pixels in size. 8-bit color (256 colors). You could be using 128 x 128 pixel textures, the principle still applies.

-----

Since this is a piece of performance code, here is how you optimize it--

1) Once your start drawing, don't make function calls-- wastes stack time
2) Do as few test as possible (if's, loops, whiles, etc.)
3) Don't recurse (wastes stack time)
4) Use inline assembly if you can
5) Use 'register' variables (code won't be installed to move your var into a register, then)
6) Use 'long' variables for any indexes or counters, etc. (processor won't have to shift and/or mask to work with it)

Okay, here is how the algorithm works--

TEXTURES
--------------
You don't use the last row of your textures (or the first-- whichever is easier for you). This row (not a column) is reserved for 'flagbits' to tell you whether or not a column has an transparent bits in it).

So you only use 63 rows of your 64-row texture for display data.

COLORS
-----------
Since you are working with "indexed-color", you choose one color from your palette that you will never draw in. For example, let's say that color #117 on your palette is set to Vibrant Orange (by you). This is your "transparent" color. Whenever your drawing algorithm sees this color, it won't draw a pixel.

FLAGS
--------
In each texture (64 x 64), if a column (aka a "slice") is completely opaque and solid, the 64th pixel in the column (which is never drawn) is set to '1'.

If a column has transparent "see-thru" pixels in it (like in the middle of a window in a door), the 64th "flagbit" in the column is set to '0'.

ALGORITHM
---------------
You write a function called "DrawWall()", or whatever fits into your naming logic. Once inside the function, the first thing you do is run all the calculations to figure out where your source texture pixels start addresswise, and where your destination pixels' addresses are, so you know where to put data.

In otherwords, you precalc as much stuff as possible, so you don't have to do calcs, testing, and thinking while you're drawing. Such things use up too much time.

Now it gets interesting.

You, using an unrolled loop, look at the flagbit (bit 64) in each column and see what it is. If it's a solid wall, you know very little testing will be performed, so things will draw fast. If it's a wall with any transparency (a cobweb, a window, a doorway with a window, etc.) things will go slower because you have to test each pixel in the slice.

If it's an opaque wall, you've done your calculations and then you simply copy one pixel from your source texture, to a memory address in the destination pixmap. If you have 63 pixels to copy (you would), then you have 63 lines that say copy from address 'A' to address 'B'. Brute force. No loop, no test-- just very fast.

If it's a transparent wall, you've done your calculations and you goto a second part of the same function. This one just copies pixels from address 'A' to address 'B', too-- but with a difference.

It looks at the index-color of each pixel in the source pixmap. If it sees a 117 (vibrant orange) or whatever you choose, it doesn't draw it. By default, you simply go to the next pixel and do the 'test & copy' over, and so on until the slice is drawn.

GOTOs
----------
Here is where the professionals are divided from the amateurs. Yes you use GOTOs in this function. Why? because you don't want to call another function because you are in the middle of a high-speed operation and you don't want to waste processor time on allocating and deallocating a stackframe to handle the call and subsequent cleanup of the stack.

Here is pseudo-code:
==================

DrawWall()
{
register long sliceDex; /* what slice are we on? */

sliceDex = 0; /* init vars */

/* precalc everything that we can */

if(pixel_#_63 in column[sliceDex]) /* draw the type of slice */
goto DrawSolidWall; /* it's a wall */
else
goto DrawTransparentWall; /* it's a see-thru */

Cleanup:
goto Done;

DrawSolidWall:
...

goto Cleanup:

DrawTransparentWall:

...

goto Cleanup:

Done:
}

===============

Yes, the second part is a little slower because it must test at each pixel for transparency, but it's worth it! You can do cobwebs, etc. This is also how you draw your creatures, fire, etc.

enjoy.

3. One other thing-- you need to calculate your wall heights at a distance and your increments. Don't do this calculation on the fly, put it in a table so your program can just look it up.

For example, when you are drawing a wall-slice that is 25 feet away, you might not draw every other row/pixel. Or every third, etc.

This is how you draw your textures at different heights. You check this for every slice you draw. that way, if you're standing beside a wall in a hallway, each slice in the wall is not drawing different rows. And hence, the makes each slice shorter.

This is how you foreshorten your walls.

enjoy.

4. Thanks a million. Why did you rail on me so bad in my post about DJGPP assembly. Here you tell me to use inline assembly (which I can), but there you just about killed me for suggesting it.

I don't know why I did not think of that. So basically I check the 64th or 1st texel, if its flag is set I know the whole column is transparent and the ray continues. If it's not, I'm going to have to walk the column pixel by pixel to test which ones are transparent. Since I really do not have a memory shortage I could probably write a function that hardcoded the offsets of transparent texels in a bitmap. That way I could merely jump to the transparent texel instead of testing each texel. It would require iterating two arrays at the same time but it might be faster than testing each texel. After all, if I know that the current offset in the bitmap does not have a trans texel, there is no need to test it and vice versa.

Here is what I'm thinking about for swinging doors (not sliding)
Rotate the screen coords (top x,y and bottom x,y of the slice) using a matrix multiply. Then render between the new points. This could not be done on one pass, though because the left to right render scheme would not work as the new pos of the door in the buffer would be overwritten. Doors done this way would probably have to have a second pass, but only on doors that are closing or opening, which would only be a few, if any, per frame.

By the way, everything in the engine including screen addresses is pre-calculated. Any ideas on how to implement lighting - I've searched gamedev.net but not much on raycasting since it is old school. Fading in the distance is done, but no light sources within the dungeon just ambient.

Instead of the goto, couldn't I just insert the code at the labels in between the if else. Both the transparent code and opaque code will be done in assembly.

if (Transparent)
{
//code here - transparent
}
else
{
//code here - opaque
}

5. > Why did you rail on me so bad in my post about DJGPP assembly. Here you tell me to use inline assembly (which I can), but there you just about killed me for suggesting it.

Uh, are you sure? I don't remember this and I can't recall ever telling anyone to _not_ use inline assembly. Maybe somebody used my sign on....

---

I like your 'texel' idea, because anytime you can eliminate tests-- you want to. To give you an idea of the amount of work your program is doing, your math calculations will probably be executed millions of times of a few seconds....

---

As for doors, raycasting engines aren't suited very well for anything but doors that lift up, slide sideways, or slide back. It's because you are tied to corners on the floor. The model isn't vector based, which gives you simple control over planes at any angle.

---

Lighting-- raycasting usually precludes vector-based lighting because then you are mixing to models and sometimes the math may not line up after fish-eye calcs are done.

Typically, lighting is handled by drawing your wall textures as if they have light cast on them. For example, a ceiling texture might have a white bulb in the middle of the tile. The walls on either sides will be normal, except they will have a brighter area inside the parabolic 'cone of light' and a darker area of the imaged outside of the light-cone.

---

Waterwheels, water, flame...

To do these effects, you simply need to draw alternate tiles in. For example, you might have a waterwheel inset into a wall that is turning. You just simply keep redrawing the tile at a given clock rate with each succeeding tile so it looks as if the wheel is turning. Same thing with water/lava or flame.

6. Bubba, I found the thread you referred to. I did not rail against you specifically, I railed against the concept that using 'goto' was a bad thing.

I will be 68 this year. I have been doing this stuff, including writing compilers and O/Ss since long before any professor you probably ever knew was introduced to programming.

Schools usually are not accurate representations of how things are actually done in the real world. Schools are pedagogal, theoretical environments.

The problem with what you and Vulcan are saying about not using goto is that you are missing the point. Of course you could code without using it, but the primary

*reason*

you use 'goto' is for performance. It eliminates a test. If you use 'while', 'do', 'for', or 'if' instead, you cause additional code to be generated that must load registers with variables, mask and shift them, perform math on them and then compare the results to zero. This takes a _lot_ more processor time than a simple jump mnemonic

In fact, judicious use of goto in the right code can mean the difference between C and being forced to use inline assembly, which in essence forces you to use a 'goto'.

----

Okay? The problem with saying you should never use a goto is that people who say that rarely understand what the CPU has to go through if you don't. That's why I'm always saying you need to learn how to work *with* the machine, not against it.

I have used goto probably less than 20 times in my career. But again, I use it for performance situations where I am trying literally to gain processor _cycle time_.

My statement still stands-- you don't throw a tool away just because you don't know how to use it properly. That's like throwing a basic freedom away because you've never needed to exercise the right to do something.

I am so sick of educational institutions educating people to the point of ignorance. Learn to think for yourselves and stop being brainwashed by people who quite honestly, can't get a real job. That's why they teach.

7. Well, that's not the discussion I was referring to. I was referring to the one about DJGPP assembly syntax ills. You gave me this huge list of questions to answer like I had not done my homework and did not need to use inline assembly.

My problem was that I had a voxel engine with a cellsize larger than 1 to eliminate all the blockiness. For this reason I had to call a bilinear height filter inside the loop or at least do the filter in the loop to eliminate the function call overhead. If you go back to that thread I answered every one of your questions very thoughtfully. I have to have the bilinear filter in ASM, the framerate difference is clearly noticeable. With the FPU code, the renderer flies, and w/o it is quite chunky.

The 'goto' issue is probably something we will all just agree to disagree on. I've never set foot inside of a college classroom for computer programming, only in high school for BASIC courses which I blew out of the water. I'm self taught at C/C++, assembly, VB, and some Pascal. I do hold a degree in another field not related to computers and yes you are right that most professors seem to have lost their grip on the 'real' world cuz they have been in their education bubble too long. Out in the real world their neat little heuristic boxes did nothing for me and their theories did not help a bit. Practics is not something that colleges do very well. Lots of theory but very little in the way of practical application.

My reasoning is that most of the time (you even said yourself you don't use goto that much) you can do without the goto. In the code snippet you gave above, the goto can be eliminated by placing the actual code in the if-else structure. From what you are saying it sounds like a goto is more like a jmp or jx instruction than an actual call - which I agree should not be done inside of speed critical loops due to the pushing and popping incurred. If you do not use goto that frequently then we are really all saying the same thing, but you are just proving that you don't toss the baby out with the bathwater and I agree. Since we both rarely use the goto it is a mute point to argue so much over so very little.

I've read Intel's manuals on their processor and have studied and do study the instruction timing and other aspects of my assembly. Those books have helped me optimize my C code and my assembly code as well as their chapters on machine architecture.

8. The jmp family of instructions should be avoided whenever possible. They will cause the pipeline to stall. On x86 architectures, especially the new PIII and PIV CPUs, these and cache misses are probably the two largest single performance hits. In your code, Sayeh, you do a couple of things that add unnecessary jumps and therfore slow down the code unnecessarily. For example, you frequently do the following:

goto Cleanup;

What is at Cleanup?

goto Done;

So essentially you've got two jumps, neither of which are necessary. The best way to exit a function is not to jump to the end of it, but to return. You do know, I hope, that you can exit a void function with a simple "return;" command!

Also, you say that using gotos eliminate tests. However, if all you do with gotos is jump around code in the same manner regardless of circumstances, you might as well just write your code in that order! Gotos are almost invariably used in conjunction with IF statements. This is in fact what you did. This is inefficient. Why? Let's look at the following code:

Code:
```if(x == 2)
{
x++;
}
else
{
x--;
}```
This will break down into ASM something like this:

Code:
```mov eax, x
cmp eax, 2
jne notequal
inc eax
jmp done
notequal:
dec eax
done:
mov x, eax```
Using your method, however, you write C++ code like this:

Code:
```goto start
cleanup:
goto done;
start:
if(x == 2)
goto xisequal;
else
goto xnotequal;

xisequal:
x++;
goto cleanup;

xnotequal:
x--;
goto cleanup;

done:```
This code will break down into the following:
Code:
```jmp start
cleanup:
jmp done
start:
mov eax, x
cmp eax, 2
je xisequal
jmp xnotequal
xisequal:
inc eax
jmp cleanup
xnotequal:
dec eax
jmp cleanup
done:
mov x, eax```
Not only are the latter C++ and ASM versions longer and slower; they're harder to read.

I sympathize with your disallusionment with acedemic programming instruction, Sayeh, but sometimes what is said is indeed practical. There may, I admit, be times when GOTO should be used. The hard-and-fast rules of professors can conflict with the (ideally) more important rule of efficiency. But in this example they do not. Academics say you should avoid GOTO; in this instance, they're right.

9. >> Well, that's not the discussion I was referring to. I was referring to the one about DJGPP assembly syntax ills.

Bubba
--------

I just really don't think it was me. I tried to search for every thread you and I were in... It just doesn't sound like the kinda of thing I'd rail on. I mean, inline assembly is great. I use it all the time. And for a fact I haven't been on the board for almost year, up until recently. I had to get away for awhile.

TerranFury
--------------
Don't waste time cutting the example I gave up, it was not "ideal". I could write performance code you couldn't even understand without a thorough explanation.

I was more interested in discussing how an opaque slice would be drawn, versus a slice with transparency. The only reason I even covered the 'cleanup' and 'done' labels was for balanced completeness so a process flow could be seen from start to end. If I'd wanted to show you specifically how to code the algorithm I would have dug out my source and posted the specific code.

You can also ignore pipeline 'stall' issues in this case. All it shows is that you've been reading things you don't fully understand. 'Stalls' are unavoidable. You cannot entirely eliminate jumps from any program. A jump instruction is handled by the processor like this:

The effective address is pushed into the program counter and the processor simply continues execution at the new effective address.

This is just how your processor works. Period. The only reason there is a pipeline stall is because one aspect of the pipeline is 'predictive processing'-- (aka 'Fetch' and 'Decode'). When you jump, all current predictions become invalid.

Look-- trust me. I have no reason to lie or mislead you. And I really don't think you'd understand it if I dropped down even another level (or's, nand's, gates, diodes, transistors, bus transfers, operand write-timing, register post-indexing, clock, signal, and self-modifying code...) and explained the ttl logic to you.

---

To conclude, 'goto' is not a function. it is not a test. It is a jmp mnemonic/instruction and disassembles as such. It is faster than any other form of jump, such as branching, because it does not test. All I was trying to say was that in performance code, you don't use a test or a branch where you can use a goto.

Bare in mind gentlmen that I used to design CPUs. Motorola and Intel processors are familiar to me like the skin on my body. I've been at this a little while-- I helped work the bugs out of Intel's first real processor, the 4004 (which was a 4-bit microprocessor) before it's release in '71.

enjoy.

10. Conditionals invariably involve jumps; I understand that. Why then add the additional jump of a goto? You say that conditionals involve tests whereas gotos do not, and you're right. However, without any conditionals, the gotos will result in the same program flow every time, so why not just write the code in that order? And if you do want them to change the flow based on certain conditions, then the gotos will result in redundant jumps that are part of the conditionals anyway. That is my point. In the vast majority of cases (though I wouldn't say all), the same effect can be achieved just as well without gotos as with them, and usually in a more clear and concise way.

It seems, however, that for the most part, we agree. You said "you don't use a test or a branch where you can use a goto;" I was just taking it a step further, adding to the end of that sentence, "...and if you didn't need a test or a branch, then chances are you don't need a goto either."

11. Well, Sayeh, what you are saying must be true. My first processor was the 8088 on the IBM PCjr, but I do remember Intel's 4004 processor. I met some friends at college who used to work for Intel and they have told me some of the same things that you are. I guess in high level language it is easy to get away from the pure basics of how the processor works. I've often wondered myself if classes and objects are truly the fastest way to go about things, or if pure assembler is the best.

In modern times it seems that we have foregone doing things the fastest, most efficient way because of all of the APIs available and because modern bus speeds make up the difference. So, modern graphic engines and programs may not actually be that efficient when you look at them. Perhaps we have become spoiled by all this speed and have gotten a bit lazy in our optimizations and in our coding. What did the 4004 run at? The 8088 ran at 4.77 MHz. In order to program for that machine you had to use every optimization trick in the book. SubLogic programmed the first Flight Simulator on it and actually got an acceptable framerate. Pretty impressive for the time.