Thread: Make it fast !

  1. #1
    Registered User
    Join Date
    Nov 2010
    Posts
    11

    Question Make it fast !

    I have this scanning routine:

    Code:
    int DLL_EXPORT scansig(
    	BYTE file[],
    	int fileLen,
    	short sig[],
    	int sigLen)
    {
        int i, j;
        for(i = 0; i < fileLen; ++i) //loop through whole file
        {
            if(file[i + sigLen] != sig[sigLen]) //better speed ?
                continue;
            for(j = 0; j < sigLen; j++)
            {
                if(sig[j] < 0) //wildcard, all wildcard shorts are less than 0
                    continue;
                if(file[i + j] != sig[j]) //compare
                {
                    i += j;
                    break;
                }
                if(j == sigLen - 1) //if were at the end, we found a match, return index.
                    return i;
            }
        }
    
        return -1;
    }
    I have been struggling to make this algorithm faster.
    Do any of you guys have any suggestions ?

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Where's a sample data set? There are faster ways to compare different things.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    For a single signature, the fastest algorithm you could use is probably Boyer-Moore. If you'll be scanning for many many signatures, it's not very efficient to scan for each of them one by one. You'd probably want to reduce the entire set of signatures down to a single state machine, kind of like a regular expression.

    I assume you're writing some kind of virus scanner or similar..
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    Registered User
    Join Date
    Nov 2010
    Posts
    11
    whiteflags,

    BYTE file[] would be a standard PE file. Ranging from 150 bytes to 20mb in length.

    short sig[] would be a pattern of shorts, including wildcards. Such as

    Code:
    short sig[] = { 0x20, 0xE6, 0xEA, 0x19, 0xBE, 0x28, -1, -1, -1, -1, 0x13,
     -1, 0x20, 0x39, 0xEA, 0x19, 0xBE, 0x28, -1, -1, -1, -1, 0x13, -1, 0x11,
     -1, 0x11, -1, 0x16, 0x1F, 0x40, 0x28, -1, -1, -1, -1, 0x26 };

  5. #5
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Personally, I'd

    1) Look at using pointers to control the loops, rather than array indices.

    2) Break the inner loop into a separate function.

    3) Find ways to avoid using any break/continue statements. This makes code more readable, and also can make it more efficient (by reducing unnecessary complexity that the compiler must deal with). Bear in mind that
    Code:
       if (a != b) continue;
       stuff();
    is equivalent (in a loop body) to
    Code:
        if (a == b) stuff();
    4) Avoid computing the same thing repeatedly in a loop (your code is doing this - accessing array[index] involves a computation - this is also one reason why controlling your loops with pointers can help)

    Short of that, look at using a better algorithm. An algorithm that speeds things up by an order of magnitude is better than a few tweaks that shave a microsecond here and there.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  6. #6
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
                if(j == sigLen - 1) //if were at the end, we found a match, return index.
                    return i;
    Why do you loop for j being less than sigLen if you're just going to add another if check every loop to see if it's sigLen -1 ?


    Quzah.
    Hope is the first step on the road to disappointment.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by eros View Post
    whiteflags,

    BYTE file[] would be a standard PE file. Ranging from 150 bytes to 20mb in length.

    short sig[] would be a pattern of shorts, including wildcards. Such as

    Code:
    short sig[] = { 0x20, 0xE6, 0xEA, 0x19, 0xBE, 0x28, -1, -1, -1, -1, 0x13,
     -1, 0x20, 0x39, 0xEA, 0x19, 0xBE, 0x28, -1, -1, -1, -1, 0x13, -1, 0x11,
     -1, 0x11, -1, 0x16, 0x1F, 0x40, 0x28, -1, -1, -1, -1, 0x26 };
    Firstly, ditto to everything said so far.

    Secondly, what is this for?!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by iMalc View Post
    Secondly, what is this for?!
    Think of it as a strcmp with wild card support. The hex are the values of the characters in the string he wants to find, the -1 are wild cards.


    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Registered User
    Join Date
    Nov 2010
    Posts
    11
    Slightly faster...

    Code:
    int DLL_EXPORT ScanSig(
    	BYTE *file,
    	int fileLen,
    	short *sig,
    	int sigLen)
    {
        int i, j;
        for(i = 0; i < fileLen; ++i)
        {
            if(file[i + sigLen] == sig[sigLen])
    		{
    			for(j = 0; j <= sigLen; ++j)
    			{				
    				if(sig[j] >= 0 && file[i + j] != sig[j])
    				{
    					i += j;
    					break;
    				}
    				if(j == sigLen)
    					return i;				
    			}
    		}
        }
    
        return -1;
    }
    But im out of ideas.

  10. #10
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Why are you checking to see if j is sigLen still, when it's your loop control?
    Code:
    for( j = ... )
    {
    }
    if( j == sigLen )
        return i;
    Put it outside the loop, because you don't need to check to control the loop and inside the loop too. You're checking twice for something you only need to check once.


    Quzah.
    Hope is the first step on the road to disappointment.

  11. #11
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Personally, I would scan the file for the first byte of the signature only. When I found that I would call a routine to forward check for the signature and return true or false...

  12. #12
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Are you tied to using an array of bytes for the signature? You might consider modifying your signature structure to something like:
    Code:
    struct signature_chunk {
        char *bytes;  // null here signifies end of list
        int offset;
        int len;
    };
    Then, each signature is an array of these elements, like so:
    Code:
    struct signature_chunk sig[] = {
        {{0x20, 0xE6, 0xEA, 0x19, 0xBE, 0x28}, 0, 6},
        {{0x13}, 10, 1},
        {{0x20, 0x39, 0xEA, 0x19, 0xBE, 0x28}, 12, 6},
        {0x13}, 22, 1},
        {{0x11}, 24, 1},
        {{0x11}, 26, 1},
        {{0x16, 0x1F, 0x40, 0x28}, 28, 4}
        {{0x26}, 36, 1},
        {NULL, -1, -1}  // end of list marker
    };
    You can use Tater's method of finding possible starts by scanning your input for sig[0].bytes[0]. Once you've found a possible start, you can loop through each chunk of your signature and do something like
    Code:
    loop through each byte of file  // index i
        if the current byte in file matches the first byte of the first chunk of our signature
            for each chunk in our signature  // index j
                if (memcmp(file[i + sig[j].offset], sig[j].bytes, sig[j].len)) {
                    // memcmp failed, so this doesnt match the signature
    This saves you in at least two ways. First, you don't waste time checking for "wildcard" bytes in the signature to tell you to ignore that byte in file. You just don't check them. Second, you are leveraging the optimized library routine memcmp instead of writing your own, less optimal version. Combine this with some tips from other posters, like using pointer arithmetic instead of array indexes, and caching values you will use multiple times throughout your function, and you should be in decent shape. You should be able to extend this to multiple signatures if need be by creating an array of arrays of signature chunks.

    Since I'm guessing you actually have a number of signatures, you may want to make an array of
    Last edited by anduril462; 01-04-2011 at 01:38 AM. Reason: EDIT: clarified loop indexes

  13. #13
    Registered User
    Join Date
    Nov 2010
    Posts
    11
    Thanks for the idea anduril462, it would be very hard to impliment in my situation. But ill give it a shot soon.

    This is what i got from Taters method, it is faster:

    Code:
    BOOL valid(BYTE *file, short *sig, int offset, int len)
    {
    	int i,j;
    	int end = offset + len; //compute end before hand, so we dont have to do it every time
    
    	for(i = offset, j=0; i < end; ++i, ++j)	
    		if(sig[j] >= 0 && file[i] != sig[j]) //wildcard and check
    			return FALSE;
    
    	return TRUE;
    }
    
    BOOL DLL_EXPORT ScanSig(BYTE *file, int fileLen, short *sig, int sigLen)
    {
        int i;
        for(i = 0; i < fileLen; ++i)
        {
            if( ( file[i] == sig[0] ) && ( file[i + sigLen] == sig[sigLen] ) ) // check if plausable match
    			if( valid(file, sig, i, sigLen) ) // check if whole match
    				return TRUE;
        }
    
        return FALSE;
    }
    Is it bad to be passing the "file" and "sig" for every check ? Would not passing it speed it up?
    Last edited by eros; 01-04-2011 at 02:15 AM. Reason: reason

  14. #14
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I suppose it would but you'd be talking about replacing a function call with a macro... not really safe.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Code:
    BOOL DLL_EXPORT ScanSig(BYTE *file, int fileLen, BYTE *sig, int sigLen)
    {  int i, x, y;
        y = (fileLen - sigLen) + 1;
        for(i = 0; i <  y; ++i)
           { if ( file[i] == sig[0] )
               { for( x = 0, (x < sigLen) && (file[i + x] == sig[x]), x++);
                  if ( x > sigLen)
                    return TRUE;  } }
        return FALSE; }
    Last edited by CommonTater; 01-04-2011 at 05:16 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. HELP!wanting to make full screen game windowed
    By rented in forum Game Programming
    Replies: 3
    Last Post: 06-11-2004, 04:19 AM
  2. I want to make a fast scroll
    By Gustaff in forum C Programming
    Replies: 3
    Last Post: 04-29-2004, 07:40 PM
  3. make all rule
    By duffy in forum C Programming
    Replies: 9
    Last Post: 09-11-2003, 01:05 PM
  4. Question about atheists
    By gcn_zelda in forum A Brief History of Cprogramming.com
    Replies: 160
    Last Post: 08-11-2003, 11:50 AM
  5. Replies: 6
    Last Post: 04-20-2002, 06:35 PM