Thread: Utilizing genetic algorithms for reverse engineering

1. Utilizing genetic algorithms for reverse engineering

Please correct me if I'm wrong in any part or sense.

Genetic algorithms are most commonly used to optimize special tasks, products etc. This is done by testing different approaches towards a solution in multiple "generations", combining some generations and mutating others. (I read the article about genetic algorithms back at the main site)

I came to think about various areas in which genetic algorithms could be used, preferably with effective results. Two such areas popped up in my head: reverse engineering (of course nothing illegal) and application development.
I'm not really sure about creating applications, or at least components of them, with genetic algorithms using the method I've described below. I will also explain why I think so.

The goal of the method: To create, for the purpose of reverse engineering, a function which takes output from a specified program and rebuilds the program in an optimized way, rather then decompiling or disassembling. The goal is the same for the application development area, except input is taken from the user using pre-set commands.

Lets continue...

The main idea is that to supply a function, which uses genetic algorithms (not sure how it would look, though, so you could point me towards a tutorial or syntax involving such), with input (later...) and then output code that satisfies the input.
The function is given code syntax in which it will run the generation process, combining pieces of code in several generations until a solution has been accomplished. To test each generation, the function has to be linked to an external compiler and then the results of the code produced has to be checked if it fulfills the input. If it does, the produced code is outputted.

Some of the problems with this approach:

In the terms of reverse engineering input is the biggest problem, from what I can see. Just taking the absolute final output, i.e letters and numbers, and running them in the theoretical function would most certainly result in the object cout as the output of the function, followed by the specific characters. I know no such solution to this problem (suggestions?), but then, I haven't given it so much thought.
Also a minor problem is communication between the function and an external compiler.

Application development in the way I described in the goals section pretty much resembles an ineffective compiler, although maybe easier for a non-programmer depending in which commands you link to which syntax and such.

Anyway, it was just an idea. Suggestions are welcome.

2. It's certainly possible. Just not very efficient - it takes far too long.

Basically, you'd have more or less random functions. Their instructions are their DNA. Recombination means swapping out instructions for those of the other parent. Mutation means random changes.

Fitness is evaluated by taking random input (a large set of random input is generated once and every individual is tested against it all) and feeding it to the function, then comparing the result to the target function (the function to be reverse-engineered/optimized/whatever). This is the target proximity. A second fitness test evaluates performance - perhaps in execution speed, perhaps in memory use, perhaps in code size. You weigh those factors as you need them. Correctness is the primary fitness test and more important. Performance is secondary. This means that correctness has a much greater weight.

The most complicated part is actually the code generation, i.e. generating an initial population, recombining and mutating.