Thread: Tile-map performance

  1. #31
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Hmm, do you have each tile in a separate texture?
    Yes. They are extracted from larger textures.

    Do you update all the vertex positions when you do the scrolling?
    Yes. This should be apparent from the posted code.

    Those two things look a little fishy to me, perhaps that's why you have had problems with speed?
    This revised function was not the one suffering from speed.

    You don't understand *pSVB++?

    Now that's fishy.

  2. #32
    Sweet
    Join Date
    Aug 2002
    Location
    Tucson, Arizona
    Posts
    1,820
    No, He was saying why are you putting that in a loop when you can just add vertexCount to it.
    Woop?

  3. #33
    Registered User
    Join Date
    Apr 2006
    Posts
    43
    But without you saying how well it's running it's hard to know if it's running good or not...It doesn't look to me like nearly the best way if you want it to run fast, but without any profiling feedback from you it's pointless to try to dig any more in it.

    I'm afraid that this might sound a little mean, but I think you have lots of things you could improve in your code, since you're evidently trying hard to make it run, you could at least be a little more open for helpful suggestions.

    No, as Sang-drax said, I don't understand why you use a loop for something that looks like it should be a one line addition assignment and the 2 points I raised are things that I think reduces the speed of your rendering (you are doing unnessesary many texture binds and poking around in vertex data that could equally well lie untouched in the graphics card memory if you used a transformation before rendering it instead), however don't poke in it without profiling, D3d is not my thing.

    I hope you take this as constructive critisism and nothing else, I'm a very nice person, promise, feels like I'm been jumping up and down on you on this board. Keep on coding...

  4. #34
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The reason for the *pSVB++ is that each triangle in the code is 6 vertices. I'm not using indexed buffers and I'm not using a hardware buffer.

    DrawPrimitiveUP is not in hardware which means you don't have to lock it to access it. Yes, you take a little bit of a hit with it but it's not too bad. It takes more of a hit with locking and unlocking a Video RAM vertex buffer than a software system RAM buffer.

    So my point in doing 6 *pSVB++ is to increment the pointer, change the variables, and increment the pointer.

    The actual first version was pSVB[vert].x+=irelScrollX.

    But in profiling this was actually slower than *pSVB+=vertexCount which was my second version. The optimized version actually does 6 vertices at a time by incrementing a pointer which means the loop actually is faster. So instead of changing variables 1 at a time in a single increment loop, I change 6 at a time and increment by 6. I just unrolled the loop.

    I tried to also unroll the loop on the vertexCount loop inside of the render function, but it was causing issues that I couldn't quite figure out so I changed it to what worked.
    I'm trying to process 6 vertices in one loop, or one complete quad in one loop.

    Now the reason for NOT using indexed buffers. First, this is in software RAM so memory is not an issue. Second, it is much easier to do special effects on a per-quad basis when you are not using indexed buffers. So for screen dissolves, wipes, fades, and/or other special effects like screen shattering, etc, it's much easier to use 6 vertices per quad. In this way I can literally tear apart the screen and do some rotational effects on each triangle without having to lock a hardware vertex buffer, without having to de-couple the index buffer from the vertex buffer, etc., etc.

    If this was for a 3D rendering system then, yes, what you have said would be correct. In 3D I would never do any of this and don't, but for 2D I just need more control of the screen and of the sprites than Direct3D's and D3DX's API allows for.

    Even ID3DXSprite has 1 complete glaring oversight in that you can not access the vertex buffer. This means that bounding boxes cannot be computed post-transformation for sprites and you pretty much have to compute that for yourself. I'm not sure why this is so or why they don't at least let you specify a bounding volume and allow you access to those. The way I do bounding boxes is to first create the bounding volume. Second, after the object is transformed I re-compute the bounding box from the transformed vertices. Just transforming the bounding box does not work correctly as explained in Real Time Rendering and Game Coding Complete. Moller and Haines provide an extremely detailed explanation of why this does not work.

    There is a method to the madness.

    Whats with all the offsets?
    The reason for all of the offsets are so I can linearilly traverse the map in memory. I detest doing any multiplies in a loop such as that so to prevent it I pre-compute the offset. But in order to do this properly you must have two variables.
    Here is why.

    Code to traverse a 2D array (pArray) in linear fashion.
    This is using array indexing and not manual pointer addition.
    Slower, but it works.

    The row and col counters are so I can track the width and height without using (dwOffset % width), which is slower since it is a divide.
    [code]

    DWORD dwOffset=0;
    DWORD dwStartOffset=dwOffset;

    int row=0,col=0;

    int value=0;

    do
    {
    pArray[dwOffset]=value;
    value++;
    dwOffset++;
    col++;

    if (col>ArrayWidth)
    {
    //Start offset is the beginning of the current line
    //dwOffset is at the end of the current line
    //Increment dwOffset by ArrayWidth puts dwOffset at the END
    //of the next line, instead of the START of the next line
    //So we increment StartOffset and then set dwOffset to
    //equal that value
    dwStartOffset+=ArrayWidth;
    dwOffset=dwStartOffset;

    //Reset column counter
    col=0;
    //Increment row counter
    row++;
    }
    } while (row<ArrayHeight);
    [code]

    So for this loop we increment a 2D array in linear fashion and we have this:
    Code:
      pArray[dwOffset]=value;
      value++;
      dwOffset++;
      col++;
    1 indexed array access //add esi,dwOffset
    3 integral additions //inc [ebp+stack_position_of_var]

    Code:
    if (col>ArrayWidth)
    I'm not sure exactly how the compiler will treat this w/o looking at the asm source. I would do it this way which is prob not the fastest way. I would use ecx as the column counter.

    1 comparison with a parameter on the stack.

    mov eax,[ebp+stack_pos_of_ArrayWidth]
    cmp eax,ecx
    ja MOVEDOWNROW
    jmp STARTOFLOOP

    Code:
         dwStartOffset+=ArrayWidth;
         dwOffset=dwStartOffset;
         
         //Reset column counter
         col=0;
         //Increment row counter
         row++;
    Again we have (in order)

    1 Addition
    1 Assignment
    1 Assignment
    1 Addition

    Code:
    } while (row<ArrayHeight);
    Here is 1 comparison with a possible 2 register loads for row and ArrayHeight, again, depending on register/stack usage.

    So the entire loop comes down to:

    1 indexed array access
    1 integral addition
    1 integral addition
    1 integral addition
    1 comparison with a parameter on the stack.
    1 integral addition
    1 assignment
    1 assignment
    1 integral addition
    1 comparison

    So:

    1 indexed array access
    5 integral additions
    2 comparisons

    As opposed to something like this:

    Code:
    for (int i=0;i<ArrayHeight;i++)
    {
      for (int j=0;j<ArrayWidth;j++)
      {
         Array[i][j]=value;
      }
    }
    2 loops.
    1 inherent multiply executed once each iteration of the nested loop.
    1 Array access inside of nested loop.
    2 inherent comparisons, increments

    Array[i][j] works out to:

    Array[i*width+j]

    Which is what I want to avoid as well as the nested loops.
    So I unroll the loop and eliminate the multiply.

    Attached is the profile of this code - profiled for function timing.

    Here is the render function:
    222.486 0.8 500.899 1.8 173 CScreenGrid::RenderSoftware(float) (cscreengrid.obj)
    Here is the scroll function:
    305 0.0 4.305 0.0 34 CScreenGrid::ScrollSoftware(int,int) (cscreengrid.obj)

    This was computed by running the app, selecting new project, and then opening a bitmap and adding several tiles. Then I scrolled the map left and right 3 or 4 times.

    Keep in mind this is inside of MFC where most of the time is spent in the msg loop and rendering is not constantly updated. It is inly updated when:

    Invalidate() is called which calls
    CZeldaEditorView::OnDraw() which calls
    CEditor3D::RenderMainView() which calls
    CScreenGrid::RenderSoftware().

    A pure Direct3D app would not be this convoluted, but I'm using my engine DLL inside of an MFC CWnd object so it get's messy.
    Last edited by VirtualAce; 04-16-2006 at 09:50 AM.

  5. #35
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    So my point in doing 6 *pSVB++ is to increment the pointer, change the variables, and increment the pointer.
    That's not the part of code they're trying to point out Bubba, they're looking here:
    Code:
          //If it is, draw the batch of primitives that share lastID's texture
          if (bDraw)
          {
            
            m_pDevice->SetTexture(0,m_pTexManager->GetTexture(dwLastID));
            m_pDevice->DrawPrimitiveUP(D3DPT_TRIANGLELIST,primCount,pSVB,  sizeof(TLVertex));
            
            //Increment the vertex buffer pointer
            for (int v=0;v<vertexCount;v++)
            {
              *pSVB++;
            }
            
                    
            //Reset state
            primCount=0;
            vertexCount=0;
            bDraw=false;
            dwLastID=dwTexID;
            
          }
    This is in the first code snippet you posted most recently.

  6. #36
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Well I tried pSVB+=vertexCount and it fired off an error. I thought you could increment a pointer this way as well. Perhaps something else caused the error.

    Changed the loop to pSVB+=vertexCount. I think I might have tried *pSVB+=vertexCount which is incorrect.
    Last edited by VirtualAce; 04-16-2006 at 07:02 PM.

  7. #37
    S Sang-drax's Avatar
    Join Date
    May 2002
    Location
    Göteborg, Sweden
    Posts
    2,072
    Are you using "*pSVB++;" instead of the equivalent "pSVB++;" to emphasize that it is a pointer being incremented? You are already doing that with hungarian notation.
    Last edited by Sang-drax : Tomorrow at 02:21 AM. Reason: Time travelling

  8. #38
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    No I incorrectly used *psvb+=<some_value> which didn't work. Then I used the loop.

    However I have now changed it to pSVB+=vertexCount.

  9. #39
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    Incase you haven't already done so, shouldn't your last loop (the unrolled one) also have the same changes?

  10. #40
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Yes it should.

    You gotta give me some credit here. I've taken quite a bit of criticism, some helpful, some not and developed my own algo that runs faster than all the suggestions.

    I like my RLE-type rendering algo.



    The pointer suggestions - many thanks.
    The graphic suggestions...thanks but no thanks.

    Unless you code your own tile renderer in Direct3D you probably will not understand the issues I'm encountering which is understandable. Some of you that know Direct3D probably do realize the issue at hand and I think I've come up with an efficient feasible solution.
    Last edited by VirtualAce; 04-17-2006 at 11:26 PM.

  11. #41
    vae victus! skorman00's Avatar
    Join Date
    Nov 2003
    Posts
    594
    I have made a tile based rendering system, a few actually, and each time I did it a bit differently =) May the force be with you for whichever method you choose.

    As far as order your rendering order based on texture: that would speed any code up using DirectX 9. in DX10, might not be an issue depending on how you use the pipeline. I mention that because I know you're big into forward movement Bubba (or at least think so).

    And what's one more bit of criticism on the code?

    Code:
          //Is this ID the same as the last?
          if (dwTexID==dwLastID)
          {
            bDraw=false;
          } else bDraw=true;
    bDraw is only used to control the next if block, so having it only confusimicates things (unless you're planning for future features)

  12. #42
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    I just got your PM Skorman. The board said it was sent 4/15 but I was never notified. Quite odd.

    I was interested to see that ptr method actually is faster, even though it uses more asm. It most certainly comes down to the fact that the multiply on a non power of 2 array to retrieve the memory offset is slower than the added instructions used in the ptr method.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Smooth walking on tile based map system
    By abraham2119 in forum C Programming
    Replies: 8
    Last Post: 07-10-2009, 10:33 AM
  2. Polynomials and ADT's
    By Emeighty in forum C++ Programming
    Replies: 20
    Last Post: 08-19-2008, 08:32 AM
  3. File map performance
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-04-2008, 04:18 AM
  4. Need testers for editor
    By VirtualAce in forum Game Programming
    Replies: 43
    Last Post: 07-10-2006, 08:00 AM
  5. Tile map loading/saving
    By sand_man in forum Game Programming
    Replies: 16
    Last Post: 04-23-2005, 09:38 PM