self modifying code

**chacham15** · 09-05-2008

So, I have an application which is VERY processor intensive. This application has a core which is general enough to work for all intended purposes. However, becuase of this generality, in certain situations it makes unnecessary intensive function calls. I could wrap these function calls in if cases with global variables to switch off those functions but I would rather not have the check at all (This code is being called billions of times). This point I dont want to argue. Therefore, Im trying to figure out how to copy the code into memory and then replace the call with a NOP. I've figured out how to copy the code into memory, and find the memory addresses right before and after the call. The problem comes in when I try to execute the code. I get a SEGFAULT. Any ideas?

**chacham15** · 09-05-2008

Heres what I have so far

Code:

void p(){
	printf("Called.\n");
}

void *start;
void *end;

void mecallp(){
	//Why do I do this address stuff here?
	//	The answer is that labels only have scope in the function
	end = &&mecallp_end;
	start = &&mecallp_start;
mecallp_start:
	printf("About to call p.\n");
mecallp_before:
	p();
mecallp_after:
	printf("Came back from p.\n");
mecallp_end: 
	;
}

int main(int argc, char *argv[]){
	mecallp();
	start = &mecallp;
	printf("Start = 0x&#37;x\nEnd   = 0x%x.\n", start, end);
	int size = end-start;
	void *programdata = malloc(size);
	memcpy(programdata, start, size);
	printf("Data copied.\n");
	void (*FN)() = programdata;
	(*FN)();
}

**C_ntua** · 09-05-2008

Just a quick note (don't have the time atm) you could see where the seg fault is, by debugger or simply putting a printf("OK\n") throughout the code and seeing exactly in what line you get a seg fault

**dwks** · 09-05-2008

I'm pretty sure you won't have much luck doing that . . . .

You don't have to have a test if you want to branch based on the value of a variable. You could use an array of function pointers instead, for example.

**slingerland3g** · 09-05-2008

I kind of see what you are doing with the 32 address markers with start - end, but I believe the code issue is

int size = end-start;
void *programdata = malloc(size); <---------What do you believe size is?

**tabstop** · 09-05-2008

I didn't even know you could do && (and of course you can't, except in gcc), so I had to look it up. The manual says this:

Originally Posted by GCC Manual 3.4.6

You may not use this mechanism to jump to code in a different function. If you do that, totally unpredictable things will happen. The best way to avoid this is to store the label address only in automatic variables and never pass it as an argument.

(Yes, old manual, but it says the same in 4.3.2, so there.) Apparently in this circumstance "totally unpredictable" = "segfault".

**chacham15** · 09-06-2008

slingerland3g, you are right. I changed the size to 8000, which is wayy too large and it at least now prints out "About to call p". What should size be set to? I tried to use the labels to measure the size of the code, but I guess that that doesnt work. Stepping into the instructions, I've determined that the problem is that the call is a relative call so since the program code is in a different location the relative leads to invalid memory hence the segfault. I dont know much about the bit representation of these instructions, so does anyone know of a way that I can fix these relative jumps?

Thanks,
chacham15

Code:

(gdb) disassemble mecallp
Dump of assembler code for function mecallp:
0x004010e2 <mecallp+0>: push   &#37;ebp
0x004010e3 <mecallp+1>: mov    %esp,%ebp
0x004010e5 <mecallp+3>: sub    $0x8,%esp
0x004010e8 <mecallp+6>: movl   $0x40111b,0x403050
0x004010f2 <mecallp+16>:        movl   $0x4010fc,0x403030
0x004010fc <mecallp+26>:        movl   $0x402009,(%esp)
0x00401103 <mecallp+33>:        call   0x40128c <printf>
0x00401108 <mecallp+38>:        call   0x4010ce <p>
0x0040110d <mecallp+43>:        movl   $0x40201b,(%esp)
0x00401114 <mecallp+50>:        call   0x40128c <printf>
0x00401119 <mecallp+55>:        leave
0x0040111a <mecallp+56>:        ret
End of assembler dump.

The thing that I dont get though is that if call is a relative jump, why did the printf work?

**matsp** · 09-06-2008

Calls in x86 are relative to the current location, so the call to printf and p would call a "random" location [the same distance from the malloc'd memory as your mecallp is from the respective functions]. Although it may be that printf is called in a different way - check the exact binary code of each call instruction.

Also, malloc() if your system supports the NX (No Execute) for memory management, will not be executable. You need some sort of system specific memory allocation to do that (where you can specify that you want executable memory).

As to the right solution for your problem, I don't think you are on the right pat at all - you will get better performance if you write several functions that do almost the same thing but skipping a few things when they are not necessary [I presume you KNOW under which circumstances you need what functionality, otherwise you would also not be able to patch out those function calls in your modified code].

--
Mats

**Salem** · 09-06-2008

This "works" with cygwin, but YMMV for the reasons matsp states.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define USE_PRINTF  pic_printf
int (*pic_printf)(const char *p, ... );

void adder ( void **start, void **end ) {
    if ( start ) {
        *start = &&labstart;
        *end = &&labend;
    } else {
labstart:
        USE_PRINTF("In adder\n");
labend:;
    }
}
void subber ( void **start, void **end ) {
    if ( start ) {
        *start = &&labstart;
        *end = &&labend;
    } else {
labstart:
        USE_PRINTF("In subber\n");
labend:;
    }
}

size_t  calcSize ( void *start, void *end ) {
    unsigned char *s = start;
    unsigned char *e = end;
    return e - s;
}

void copyCode ( char *buff, size_t *dest_offset, size_t dest_size, void *from, size_t from_size ) {
    if ( *dest_offset + from_size <= dest_size ) {
        memcpy( buff + *dest_offset, from, from_size );
        *dest_offset += from_size;
    } else {
        fprintf( stderr, "oops: Offset=&#37;lu, dest_size=%lu, copy_size=%lu\n",
            *dest_offset, dest_size, from_size );
    }
}

void dummy ( void ) {
}
#define DUMMY_SIZE      5
#define PROLOGUE_OFFSET 0
#define PROLOGUE_LENGTH 3
#define EPILOGUE_OFFSET 3
#define EPILOGUE_LENGTH 2
#define DUMMY_PTR(x)    (void*)(((unsigned char *)dummy)+(x))

int main ( ) {
    struct {
        void    *start;
        void    *end;
        size_t  size;
    } info[2];
    char    *buff;
    size_t  buff_offset = 0;
    size_t  buff_size;
    void    (*fnp)(void);
    
    pic_printf = printf;
    
    adder( &info[0].start, &info[0].end );
    info[0].size = calcSize( info[0].start, info[0].end );
    subber( &info[1].start, &info[1].end );
    info[1].size = calcSize( info[1].start, info[1].end );

    buff_size = info[0].size * 2 + info[1].size + DUMMY_SIZE;    /* add, sub, add */
    buff = malloc( buff_size );

    copyCode ( buff, &buff_offset, buff_size, DUMMY_PTR(PROLOGUE_OFFSET), PROLOGUE_LENGTH );
    copyCode ( buff, &buff_offset, buff_size, info[0].start, info[0].size );
    copyCode ( buff, &buff_offset, buff_size, info[1].start, info[1].size );
    copyCode ( buff, &buff_offset, buff_size, info[0].start, info[0].size );
    copyCode ( buff, &buff_offset, buff_size, DUMMY_PTR(EPILOGUE_OFFSET), EPILOGUE_LENGTH );

    fnp = (void(*)(void))buff;
    fnp();
    free( buff );

    return 0;
}

In order to call the thing from C, you need to have the standard prolog/epilog code copied as well.

Further, you could probably have local variables, but ALL your patched in code would need the same set. Plus any locals at all change prolog/epilog as well.
One way would be to use a 'struct' for all locals in each of the stub functions, then in dummy have a union of all those structs to get the worst-case size out of it.

Also demonstrated is pic_printf, which helps fix the "relative addressing" problem.

**Mole42** · 09-06-2008

Using labels to find the start and end address of a function is very bad practice, and is not compatible with all compilers and OSs. For example, what happens if another compiler decides that part of another function can be executed by just jumping to a specific place in the function you have just modified? This is a usual compiler size optimisation, and in your case, it could cause strange bugs.

**chacham15** · 09-06-2008

Thanks Salem. matsp the reason that I dont want to write 8 different functions (although youre right that I could) is for maintainability purposes. For each new switch that occurs the number of necessary functions doubles. Then if a change needs to be made...I pity the person that has to go though thousands of nearly alike functions. Salem, is there anyway to simply replace the relative jumps with absolute jumps? Also, if I were interested in creating a patch for an executable are there any guides about this?

Thanks!
chacham15

**iMalc** · 09-06-2008

I think this attempt at low-level optimisation is just plain wrong.
There are ways of writing code that allows for all the genericity that you need without the speed hits of doing redundant stuff.
I strongly suggest googling for "Policy-based design".

The last thing you should be doing is resorting the extreme's you're embarking on at the moment. Do you not understand how incredibly risky what you're doing is? One false move and bam your app is dead. Not just now, but for every tiny little change in that area that is made later. It's a maintenance nightmare!
Not to mention, it wont run at all on Vista unless you do what's required to satisfy DEP.

**matsp** · 09-06-2008

Aside from the above suggestions, which are all good....

Ok, so if having multiple versions of the function is a nightmare, how is it made easier by having code that needs to know exactly what the function looks like and then patch the function in various ways to remove functions - to me, that IS a maintenance nightmare.

It wouldn't be difficult to use multiple-compilation of the same source to achieve variants of a function:

Code:

int funcVar1()
{
#define OPTION1
#include "myfunc.h"
#undef OPTION1
}

int funcVar2()
{
#define OPTION2

// myfunc.h:
... bits of code you always need.
#ifdef OPTION1
   some code you only want sometimes. 
#endif
.... more code that is always used.
#ifdef OPTION2 
   some other code that only is needed occasionaly. 
#endif

#if defined(OPTION1) || defined(OPTION2)
  ... Some code that is needed in option1 or 2. 
#else
   ... Some other code when not Opt1 or Opt2. 
#endif
...

Of course, your option variability could be named differently, and you can use integer values if you prefer. Even bitwise flags, eg.

Code:

#if FLAGS & 3 
 ... some code
#else
  .. code you want if both of flag bits 0 and 1 are not set. 
#endif

Now you can just choose which variant you want from your main code, and you get more optimal code than patching out calls [even if NOP's are pretty much "does nothing", they still take up space in the pipeline up to the point where the processor decides that it's a NOP and can be discarded], and it's at least somewhat maintainable even if you move to another compiler, OS, or processor architecture.

--
Mats

**iMalc** · 09-07-2008

Another possibility is that in many cases you perform a test inside a loop and you know in advance that this test will always give a certain result. In that case you should probably look into some loop unswitching.

By all means there is some tiny chance that what you're doing is the best approach. I highly doubt it, but you're in a better position to know than we are. However if you don't know about other techniques mentioned by posters here, then even you cannot be sure that you're going down the right path. You owe it to yourself to seek out and find the best solution.

**chacham15** · 09-07-2008

Originally Posted by iMalc

I think this attempt at low-level optimisation is just plain wrong.
There are ways of writing code that allows for all the genericity that you need without the speed hits of doing redundant stuff.
I strongly suggest googling for "Policy-based design".

The last thing you should be doing is resorting the extreme's you're embarking on at the moment. Do you not understand how incredibly risky what you're doing is? One false move and bam your app is dead. Not just now, but for every tiny little change in that area that is made later. It's a maintenance nightmare!
Not to mention, it wont run at all on Vista unless you do what's required to satisfy DEP.

Oddly enough, it works on Vista... (maybe thats because UAC is off). All that Im doing is overwriting the function call with NOPs. Those other methods dont work because, they A. require templating, B. do not allow me to dynamically turn a function on or off.

Thread: self modifying code

Thread Tools

Search Thread

Display

self modifying code

Similar Threads

Proposal: Code colouring

Updated sound engine code

Self modifying code

Interface Question

Who will map the scan code (inserted by VKD_Force_keys) to virtual key code?