Thread: Some optemizing questions

  1. #1
    Registered User
    Join Date
    Oct 2006
    Location
    UK/Norway
    Posts
    485

    Some optemizing questions

    Hallo,

    I bought a book recommended to me by someone on this forum (Tricks of the game programming gurus) and it has some interesting pages about optimizing some of the functions.

    My first question is about using bit shifting instead of multiplication. I was always under the impression that doing bit shifting was a bit faster then using multiplication. So I made a small program to test how much faster it was. To my surprise I found multiplication to have almost the exact same speed as bit shifting. Here is the program I made, with the different speed results on the top.

    Code:
    // Dev C++ 4.9.9.2
    
    // Multiplication
        // First computer
        // 4531 first
        // 4516 second
        // 4516 third
        
        // Second computer, more itterations
        // 11485
        // 11485
        // 11469
        
    // Bit shift
        // First computer
        // 4500 first
        // 4469 second
        // 4469 third
        
        // Second computer, more itterations
        // 11438
        // 11437
        // 11563
    
    #include <windows.h>
    #include <conio.h>
    #include <iostream>
    
    using namespace std;
    
    int main()
    {
        char *buffer = new char[512 * 512 * 4]; 
        
        cout << "Starting.." << endl;
        
        double startTime = timeGetTime();
        
        for(int i = 0; i < 1000; i++)
        {
                for (int y = 0; y < 512; y++)
                {
                     for (int x = 0; x < 512; x++)   
                     {
                         // The two different ways of doing the multiplication, 
                         // comment out one.
                         //int offset = ((512 * y) + x ) * 4;
                         //int offset = (((y << 9)) + x) << 2;
                     }
                }
        }
        
        double endTime = (timeGetTime() - startTime);
        
        cout << "Ending..." << endl;
        cout << endTime << endl;
        
        delete [] buffer;
        
        system("pause");
        return 0;    
    }
    Am I doing something wrong, or is << almost the same as * when it comes to speed?


    Also, the book talks about how to draw bitmaps to the screen using logical operators, but it does not really go into any details about how to use it. In the short paragraph it mentions that you can use OR to draw the bitmap. The advantage of using it is that I dont have to use an if statement for each pixel. The disadvantage is that you loose some colour and that it only works on 16bit colours. Does anyone know of where I can read up on this as I dont really understand it and I want to use 32bit colours.


    Also, is there a way to get rid of the temp variable in this function:
    Code:
    // Colour is a array of 4 values
    void setPixel(BYTE &screenBuffer, BYTE &colour, int x, int y)
    {
       int offset = (( SCREENWIDTH * y) + x) * 4;
       
       BYTE *temp = screnBuffer + offset;
       
       memcpy(temp, colour, 4);    
    }
    Or im I better of using this:
    Code:
    // Colour is a array of 4 values
    void setPixel(BYTE &screenBuffer, BYTE &colour, int x, int y)
    {
       int offset = (( SCREENWIDTH * y) + x) * 4;
       
       screenBuffer[offset++] = colour[0];   // R
       screenBuffer[offset++] = colour[0];   // G
       screenBuffer[offset] = colour[0];     // B
    }
    Thanks for your time.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > To my surprise I found multiplication to have almost the exact same speed as bit shifting.
    Probably because most modern compilers are perfectly capable of performing these (and many other) optimisations all the time, rather than just when you think about it.

    Also, if you fail to use the results (as you do in your simple test), then the compiler may well remove the whole nested loop construct because none of the results are used.

    > In the short paragraph it mentions that you can use OR to draw the bitmap
    > The disadvantage is that you loose some colour and that it only works on 16bit colours
    It works only where you have a paletted display.
    For example, if you set all the entries in the pallete which match say 0001XXXX (all 16 combinations to the same colour), then whatever you draw in the other 4 bits will always have that colour.
    You end up with 15 colours for foreground objects (one must be transparent), and 16 colours for background objects. You now have an effective palette of 31 colours rather than the full 256.

    This was a popular technique on 1980's home micros, but modern accelerated graphics cards can probably redraw the whole lot from scratch much quicker anyway, and in a full colour depth.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Oct 2006
    Location
    UK/Norway
    Posts
    485
    > To my surprise I found multiplication to have almost the exact same speed as bit shifting.
    Probably because most modern compilers are perfectly capable of performing these (and many other) optimisations all the time, rather than just when you think about it.

    Also, if you fail to use the results (as you do in your simple test), then the compiler may well remove the whole nested loop construct because none of the results are used.
    Do the compiler use do it for "harder" values, where you might need to shift two or three times? How many shifts are ok to do, or should I always use multiplication?

    You was right, the compiler optimized it away. Added so that I am using it now, but the difference between the two are still a lot smaller then what I thought it would be.

    > In the short paragraph it mentions that you can use OR to draw the bitmap
    > The disadvantage is that you loose some colour and that it only works on 16bit colours
    It works only where you have a paletted display.
    For example, if you set all the entries in the pallete which match say 0001XXXX (all 16 combinations to the same colour), then whatever you draw in the other 4 bits will always have that colour.
    You end up with 15 colours for foreground objects (one must be transparent), and 16 colours for background objects. You now have an effective palette of 31 colours rather than the full 256.
    So it will not work to good if I try to use that to draw a sprite of a car or something onto something else?

    This was a popular technique on 1980's home micros, but modern accelerated graphics cards can probably redraw the whole lot from scratch much quicker anyway, and in a full colour depth.
    All my graphic routines are old school. Im just using software. Thats why I am trying to optimize everything as much as I can, as drawing is so incredible expensive right now.

    Thanks for your reply Salem.

  4. #4
    The larch
    Join Date
    May 2006
    Posts
    3,573
    As for multiplication - are you using compiler optimizations? These tend to make the largest difference.

    It also looks like you are looking for optimization possibilities in the wrong place. Rather than wondering whether one way of doing multiplication is faster than another, you should be looking for ways to do less multiplications in the first place. For example

    Code:
        int max = 512 * 512;
        for(int i = 0; i < 1000; i++)
        {
            for (int y = 0; y < max; ++y) {
                int offset = y * 4;
            }
        }
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  5. #5
    Registered User
    Join Date
    Oct 2006
    Location
    UK/Norway
    Posts
    485
    As for multiplication - are you using compiler optimizations? These tend to make the largest difference.
    Well, I am still playing a bit with that. Still not to good with visual studio, last time I tried turning on optimization the compiler optimized it a bit to well. It optimized several of my functions away (Hey, who needs input functions in a game?) Is there any setting or anything you can recommend? So far, the only "optimizing" I am using is to change from debug mode to release.

    It also looks like you are looking for optimization possibilities in the wrong place. Rather than wondering whether one way of doing multiplication is faster than another, you should be looking for ways to do less multiplications in the first place. For example
    Well, I think I have minimized the multiplication as much as possible. Or at lest as good as I can.

    I know I am starting to look for optimization in places where there are almost nothing to be gained, but how expensive is assignment?

    In one of my draw routines I create 4 new variables which I dont really really need. (Just make the code easier to read) Is there anything to gain from removing them?
    The 4 assignments are done for each pixel.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > So it will not work to good if I try to use that to draw a sprite of a car or something onto something else?
    Well you could, if you create more "layers" in your palette, but the number of colours goes down in response. A background, and non-overlapping foreground characters is about the limit of the technique.

    > I know I am starting to look for optimization in places where there are almost nothing to be gained, but how expensive is assignment?
    That's about the simplest thing there is.
    What you really need to do is get hold of a profiler to actually tell you where the problem areas are, rather than guessing. Also, you need to look at your algorithms, not micro-optimisations which the compiler will manage perfectly well to do itself.

    In fact, if you get too "clever" about it, you might just make it worse rather than better, as the compiler will no longer be able to figure out what you're really trying to do.

    The compiler contains the distilled wisdom of programmers with a combined total of hundreds, if not thousands of years of programming experience. If there's a trick, you can be pretty sure they know about it, and hence the compiler probably does as well, if you make your intent clear enough to being with.

    For example, no amount of micro-optimisation-inline-assembler-yada-yada would ever make a bubble sort better than a quick sort. But you as the programmer would look at the profile data, see that it was taking far too long, then do some research into better techniques. This is where you'll find all the 10x improvements.

    > It optimized several of my functions away (Hey, who needs input functions in a game?)
    In a correct program, optimisation should not change the functionality of the program. If the functionality changed, then I'd suspect bugs in your code. Many "problems" such as using uninitialised variables or minor buffer overruns are easy to gloss over in debug mode, but are then exposed when in release mode. Similarly, people sometimes complain it works in release mode but not debug mode for pretty much the same reasons.

    > int offset = (( SCREENWIDTH * y) + x) * 4;
    If you're calling this for incrementing x and y values all through the 2D array, then you can definitely improve things by moving this outside the function. Making it static and inline may give the compiler enough of a clue to achieve the kind of code motion which would reduce this to just adding 4 each time around the loop.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Oct 2006
    Location
    UK/Norway
    Posts
    485
    > I know I am starting to look for optimization in places where there are almost nothing to be gained, but how expensive is assignment?
    That's about the simplest thing there is.
    What you really need to do is get hold of a profiler to actually tell you where the problem areas are, rather than guessing. Also, you need to look at your algorithms, not micro-optimisations which the compiler will manage perfectly well to do itself.
    I am not really sure what I can do to improve my algorithm. I have looked online at other blitting functions, and they are almost the same. A few are a bit faster, but they are using evil assembly hacks which I dont understand.

    In fact, if you get too "clever" about it, you might just make it worse rather than better, as the compiler will no longer be able to figure out what you're really trying to do.

    The compiler contains the distilled wisdom of programmers with a combined total of hundreds, if not thousands of years of programming experience. If there's a trick, you can be pretty sure they know about it, and hence the compiler probably does as well, if you make your intent clear enough to being with.
    I have seen that a few times. Makes me feel bad, when the compiler better knows what I want to do then I know myself.

    > It optimized several of my functions away (Hey, who needs input functions in a game?)
    In a correct program, optimisation should not change the functionality of the program. If the functionality changed, then I'd suspect bugs in your code. Many "problems" such as using uninitialised variables or minor buffer overruns are easy to gloss over in debug mode, but are then exposed when in release mode. Similarly, people sometimes complain it works in release mode but not debug mode for pretty much the same reasons.
    I was expecting it to be my fault, I just found it amusing at the time.

    > int offset = (( SCREENWIDTH * y) + x) * 4;
    If you're calling this for incrementing x and y values all through the 2D array, then you can definitely improve things by moving this outside the function. Making it static and inline may give the compiler enough of a clue to achieve the kind of code motion which would reduce this to just adding 4 each time around the loop.
    The function is just for drawing more random like things, say a star field.

    I know(or I have read in the game programming book) that if you have a special case happening over and over again, you should make a special blitting function for that case. I am not really sure how to do this, but maybe someone can point me in the right direction?
    My player and enemy sprites are always 32x32 and always have alpha channel. Almost all of the center of the sprite if fully coloured(no alpha) while the edges have alpha. Is there any way I can use that to make a faster blitter?

    > So it will not work to good if I try to use that to draw a sprite of a car or something onto something else?
    Well you could, if you create more "layers" in your palette, but the number of colours goes down in response. A background, and non-overlapping foreground characters is about the limit of the technique.
    If I have to use more then one logic operator, it is better to use an if, as one if statement take about the same time as two logic operators.

    What you really need to do is get hold of a profiler to actually tell you where the problem areas are, rather than guessing.
    I have tried to use one, but I could not really get it to work very well. Can anyone recommend a free one that work on Vista? (I am using Visual studio)

    Here is the blitting function for alpha, it is probably not perfect, but it is as good as I can get it without any help
    Code:
    void Sprite::drawAlphaSprite(BYTE *screenData, Rectangle2D &screenRec, int xPos, int yPos)
    {
    	// Figure out which part of the texture to draw
    	Rectangle2D newTextureRec = calculateClipping(screenRec, textureRec, xPos, yPos);
    
    	// A pointer to the memory we want to copy to
    	BYTE *screenDataPnt = screenData + (xPos + yPos * screenRec.getWidth()) * 4;
    
    	// A pointer to the memory we want to copy from
    	BYTE *textureDataPnt = spriteData + (newTextureRec.left + newTextureRec.top * textureRec.right) * 4;
    
    	// Calculate the offset after each line
    	int endScreenOffset = (screenRec.getWidth() - newTextureRec.getWidth()) * 4;
    	int endTextureOffset = (textureRec.getWidth() - newTextureRec.getWidth()) * 4;
    
    	// Do the drawing
    	for (int y = 0; y < newTextureRec.getHeight(); y++)
    	{
    		for (int x = 0; x < newTextureRec.getWidth(); x++)
    		{
    			BYTE blue = textureDataPnt[0];
                BYTE green = textureDataPnt[1];
                BYTE red = textureDataPnt[2];
                BYTE alpha = textureDataPnt[3];
    
                // Calculate how much the alpha value is going to effect the pixel
                float alphaMod = alpha / 255.0;
                
                if(alphaMod != 0)
                {
                    // Assign the correct colour to the current pixel, the fast way
                    screenDataPnt[0] = screenDataPnt[0] + (( alpha * (blue - screenDataPnt[0] )) >> 8);
                    screenDataPnt[1] = screenDataPnt[1] + (( alpha * (green - screenDataPnt[1] )) >> 8);
                    screenDataPnt[2] = screenDataPnt[2] + (( alpha * (red - screenDataPnt[2] )) >> 8);
                }
                
    			// Move to the next pixel on the screen
                screenDataPnt += 4;
    
                // Move to the next pixel on the sprite
                textureDataPnt += 4;
    		}
    	
            // Move to the next line on the screen and the next line of the sprite
            screenDataPnt += endScreenOffset;
            textureDataPnt += endTextureOffset;
    	}
    }
    And I use this one when there is no alpha. This one is a lot faster
    Code:
    void Sprite::drawNoAlphaSprite(BYTE *screenData, Rectangle2D &screenRec,int xPos, int yPos)
    {
    	// Figure out which part of the texture to draw
    	Rectangle2D newTextureRec = calculateClipping(screenRec, textureRec, xPos, yPos);
    
    	// Pointer to the memory we want to copy to
    	BYTE *screenDataPnt = screenData + (xPos + yPos * screenRec.getWidth()) * 4;
    
    	// Pointer to the memory we want to copy from
    	BYTE *textureDataPnt = spriteData + (newTextureRec.left + newTextureRec.top * textureRec.getWidth()) * 4;
    
    	// Copy one line of the sprite to the screen for each loop
    	for (int y = newTextureRec.top; y < newTextureRec.bottom; y++)
    	{
    		// Copy the line
    		memcpy(screenDataPnt,textureDataPnt, newTextureRec.getWidth() * 4);
    
    		// Move to the next line of the texture
    		textureDataPnt += textureRec.getWidth() * 4;
    
    		// Move to the next line of the screen
    		screenDataPnt += screenRec.getWidth() * 4;
    	}
    }
    Last edited by h3ro; 04-05-2008 at 07:09 AM.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Here's some suggestions that may gain you a little bit of an improvemnt. Mostly moving stuff out of the loop that can be moved out.

    Code:
                    // Assign the correct colour to the current pixel, the fast way
                    screenDataPnt[0] +=  (( alpha * (blue - screenDataPnt[0] )) >> 8);
                    screenDataPnt[1] +=  (( alpha * (green - screenDataPnt[1] )) >> 8);
                    screenDataPnt[2] +=  (( alpha * (red - screenDataPnt[2] )) >> 8);
    This change makes the code easier to read - it won't affect speed at all, but easier to read code is always better.

    Code:
                float alphaMod = (lpha / 255.0);
    alphaMod is only used in the if-statement immediately after, so changing that if-statement to use alpha == 0 instead would do a small improvement.

    Code:
            int height = newTextureRec.getHeight();
            int width = newTextureRec.getWidth();
    	for (int y = 0; y < height ; y++)
    	{
    		for (int x = 0; x < width; x++)
    		{
    This avoids a potential function call for every loop iteration - I'm assuming the texture is not changing size during the drawing.

    Code:
    void Sprite::drawNoAlphaSprite(BYTE *screenData, Rectangle2D &screenRec,int xPos, int yPos)
    {
            int scWidth = screenRec.getWidth();
            int texWidth = screenRec.getWidth();
    	// Figure out which part of the texture to draw
    	Rectangle2D newTextureRec = calculateClipping(screenRec, textureRec, xPos, yPos);
    
    	// Pointer to the memory we want to copy to
    	BYTE *screenDataPnt = screenData + (xPos + yPos * scWidth) * 4;
    
    	// Pointer to the memory we want to copy from
    	BYTE *textureDataPnt = spriteData + (newTextureRec.left + newTextureRec.top * texWidth) * 4;
    
           scWidth *=4;
           texWidth *= 4;
    
            int newTexWidth = newTextureRec.getWidth() * 4;
    
    	// Copy one line of the sprite to the screen for each loop
    	for (int y = newTextureRec.top; y < newTextureRec.bottom; y++)
    	{
    		// Copy the line
    		memcpy(screenDataPnt,textureDataPnt, newTexWidth);
    
    		// Move to the next line of the texture
    		textureDataPnt += texWidth;
    
    		// Move to the next line of the screen
    		screenDataPnt += scWidth;
    	}
    }
    Again, avoiding function (and multiplies) calls inside the for-loop.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. questions....so many questions about random numbers....
    By face_master in forum C++ Programming
    Replies: 2
    Last Post: 07-30-2009, 08:47 AM
  2. A very long list of questions... maybe to long...
    By Ravens'sWrath in forum C Programming
    Replies: 16
    Last Post: 05-16-2007, 05:36 AM
  3. Several Questions, main one is about protected memory
    By Tron 9000 in forum C Programming
    Replies: 3
    Last Post: 06-02-2005, 07:42 AM
  4. Trivial questions - what to do?
    By Aerie in forum A Brief History of Cprogramming.com
    Replies: 23
    Last Post: 12-26-2004, 09:44 AM
  5. questions questions questions.....
    By mfc2themax in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 08-14-2001, 07:22 AM