Thread: DJGPP assembly syntax ills...

  1. #1
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607

    DJGPP assembly syntax ills...

    I recently downloaded DJGPP. So far I've been able to access the LFB and double buffer in all modes (that support it). Very nice. But, my bilinear interpolation function really needs to be in assembly. I cannot write the thing in TASM or MASM because DJGPP does not read OBJs (very stupid).

    This is not one of those - I've not read the FAQ and cannot get assembly to work in DJGPP questions. I have read the FAQ, and more FAQs, and the guide to GNU assembler.

    The result: mass confusion, information overload, hate At&T syntax.

    I'm wanting to access the FPU in DJGPP. It can be done, but the syntax is killing me. I also read Brennan's tutorial. Also, there was a blurb about GAS accepting the Intel syntax, but all links to it were broken or messed up. Is there a way to get the inline assembler to accept Intel syntax, or am I stuck with AT&T?

    This is very annoying because I should not have to re-learn a syntax just to use assembly on an Intel/AMD platform. The problem is not related to my knowledge of assembler, it is related to syntax which, to me, is very annoying. And, if I do re-learn the syntax, when I go back to Intel syntax it will be a huge mess. There needs to be a standard of some type on cross-platform compilers. If AT&T is the standard and not Intel (probably the case), I'm in for some very long nights.

    Here is my Intel inline asm: linear interpolation

    Code:
    unsigned int LI(unsigned int v1,unsigned int v2,double f1)
    {
      unsigned int rval=0;
      asm
      {
         INITFPU:
           finit
         INTERP:
           fild v2
           fisub v1
           fmul f1
           fiadd v1
           fistp rval
       }
        return rval;
    }
    Simple. I normally do not ask others to re-write my code or provide extensive snippets for me to copy. But, in this case, I need someone to write the code for me so I can cross-compare the two and perhaps really learn something.

    I know that it should be something like:

    "fild %1":"<register>"(v1)

    I don't know what <register> should be or maybe I'm way off.


    Help. I'm going out of my mind.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > But, my bilinear interpolation function really needs to be in assembly.
    Why?

    Have you finished the code?
    Have you profiled the code?
    Have you analysed the profile results to help you reconsider the implementation of specific hot spots?
    Have you researched your algorithms to find better ones?
    Have you researched your data structures to find better ones?
    Have you turned on the compilers optimisations?

    If you've answered no to any of those, then you're probably not ready to resort to assembler.

    Basically, if an easy fix somewhere else prevents the need to do something in ASM, then you've saved yourself some grief now and in the future.

    > Is there a way to get the inline assembler to accept Intel syntax, or am I stuck with AT&T?
    I always thought it was AT&T only.

    > This is very annoying because I should not have to re-learn a syntax just to use assembly on an Intel/AMD platform.
    Tough - but that's life.
    Mixing C and assember is always far too machine/compiler specific to make it generally useful.

    There's plenty to read in the meantime
    http://www.web-sites.co.uk/nasm/
    http://www.delorie.com/djgpp/doc/
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459
    >
    Have you finished the code?
    Have you profiled the code?
    Have you analysed the profile results to help you reconsider the implementation of specific hot spots?
    Have you researched your algorithms to find better ones?
    Have you researched your data structures to find better ones?
    Have you turned on the compilers optimisations?
    <

    oh wise one... may i be graced with your prodecure ingrained in my forehead...

    that sounds like something you'd have to have memorized in a college course... by coincidence am i right? good call... that basically solves every optimization problem that exists... that, and bubba, i'd assume, would already have consulted the majority of those excess... we've PM'ed about it quite a bit...
    hasafraggin shizigishin oppashigger...

  4. #4
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Have you finished the code?
    Have you profiled the code?
    Have you analysed the profile results to help you reconsider the implementation of specific hot spots?
    Have you researched your algorithms to find better ones?
    Have you researched your data structures to find better ones?
    Have you turned on the compilers optimisations?
    Addressed in order: (very long, but though explaining would help)
    Have you finished the code?
    For the terrain part of the code, yes I have finished it. W/o assembly it is obvious (by the framerate) that the assembly function is faster. Terrain has to be very fast for the rest of the engine to even be possible.

    Have you profiled the code?
    Cannot profile the code as I do not have a profiler. I will profile it in DJGPP, since I have one for that compiler. Wish I did have a profiler as it would help a great deal.

    Have you analysed the profile results to help you reconsider the implementation of specific hot spots?
    Since I have not profiled the code, no I do not know for sure where the hotspots are. However, since in my renderer I must call the bilinear on every map point to smoothen the terrain map and get rid of the blockiness I'm assuming that is where the problem is. When I changed the bilinear to assembly using the FPU, the renderer flew. All other functions such as putting pixels to the screen are also called, but all are in assembly as well and all are about 5 lines long. I've looked at several assembly books and Intel's documentation on the PIII to find ways to optimize the code. I have an AMD, but I cannot find out how to get similar docs for their processors. Pixels are drawn to the screen not based on x,y but on current offset. Offset is computerd once at start of render. From there, the offset is just decremented (since I render bottom to top for my voxels). This elminates figuring out the offset on every pixel. Retrieving the data from the height map and color map has to be done and all are in linear arrays which, according to my research, access faster than two dimensionals. So, I think I've eliminated the heightmap, colormap retrievals (they should not take too long - does require a register load - could store segment in constant, but still have to load the es,fs or gs register). Also, the pixel plotting in the renderer does not call any other functions, it is solely responsible for plotting the pixels thus eliminating any function call overhead.

    Have you researched your algorithms to find better ones?
    I've look into several sources about interpolation. Currently, I'm using the fastest ones that I've found. There is an essay on voxel rendering for DirectX (concept is same, implementation of 3D is diff due to DirectX) and he uses the same bilinear interpolation (which also is a real-time non-static renderer). He uses fixed point, I'm using the FPU. I could now use fixed point given that I have DJGPP and access to 32-bit registers. But again, this would require some assembly. Have written function in fixed point, floating (using doubles) point, and assembly. Fastest one is the assembly using the FPU. Had a function that used floats but was the slowest of them all.

    Have you researched your data structures to find better ones?
    My data structures should not be a problem. In the fixed point version they are longs (or ints in DJGPP), floating is doubles, and assembly is 2 ints and 2 doubles. Only one push is required in the FPU code and all operations use st(0). So, per interpolation I incur one push and one pop to push a value into st(0) and pop (st(0) when I'm done. As of right now I've not used any classes or structs to represent colors, voxels, etc. Plan on doing this, but wanted to get the C version working first. All sin, cos, and tan values are pre-computed and stored in tables. Angle increments are linear across the screen. Also have a version that takes a value from the left and from the far right angles. To get other values, I linear interpolate between those two, which eliminates some overhead. Don't think data structures is a
    problem.
    Example of data structures
    screenwidth/HFOV = angle increment per vertical line
    360/angleinc = number of angles within screenwidth

    Code:
    for (int i=0;i<numangles;i++)
    {
      SN[i]=sin((i/angleinc)*PI/180);
      CS[i]=cos((i/angleinc)*PI/180);
      ...tangents here - did not include - longer due to invalid tangents
      such as 90 and 270
    }
    So SN[160] on a 320 width screen is the sin of 30 degrees.
    All major values such as 0,30,60,90,180,270,360 are stored in ANG0,ANG30,ANG60,ANG90,ANG180,ANG270,ANG360 for speed.

    So: ViewerAngle-ANG30 = starting angle with a HFOV of 60

    Have you turned on the compilers optimisations?
    Yes, all compiler optimizations are on. Without an actual profiler it is very hard to determine the speed difference with optimizations on and with them off. DJGPP does report that most of the time is spent (based on the call frame traceback and other utilities in DJGPP) in the bilinear function. I have also pre-bilinear filtered the map, but to get smoothness instead of chunks due to the cellsize in my map I have to bilinear interpolate. Cellsize can only be so small before image detoriates into huge mess. Cellsize can only be so large, or view distance will be affected.
    Also, I'm filtering to white beyond a certain distance for effect. This is done using linear interpolations on the three RGB components.


    Conclusion:
    So, all in all, the bilinear has to be the culprit. It is called for the height values (1 bilinear,1 linear for local heightmap), the three RGB values (incurs 3 bilinears) and texture inside of each cell (3 linears). This is 4 bilinears and 2 linears per pixel. Most voxel engines use these same methods as I've done a lot of research on several voxel engines - some not so good, and some very good.

    I have done my homework on this one - and I need the bilinear in assembly.

  5. #5
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459
    fyi the profiler for DJGPP is grof.exe but you need to have complied with the -pg switch...
    hasafraggin shizigishin oppashigger...

  6. #6
    Linguistic Engineer... doubleanti's Avatar
    Join Date
    Aug 2001
    Location
    CA
    Posts
    2,459
    fyi [2]... to analyze crash dumps for DJGPP... use symify... again, with the -pg switch...
    hasafraggin shizigishin oppashigger...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. more then 100errors in header
    By hallo007 in forum Windows Programming
    Replies: 20
    Last Post: 05-13-2007, 08:26 AM
  2. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 05:45 PM
  3. Using VC Toolkit 2003
    By Noobwaker in forum Windows Programming
    Replies: 8
    Last Post: 03-13-2006, 07:33 AM
  4. Connecting to a mysql server and querying problem
    By Diod in forum C++ Programming
    Replies: 8
    Last Post: 02-13-2006, 10:33 AM
  5. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM