Programming a Disassembler

This is a discussion on Programming a Disassembler within the C Programming forums, part of the General Programming Boards category; This isn't really a C question, but there is no general programming board for this king of stuff so I ...

  1. #1
    Registered User Xzyx987X's Avatar
    Join Date
    Sep 2003
    Posts
    107

    Programming a Disassembler

    This isn't really a C question, but there is no general programming board for this king of stuff so I decided this was as good of a place as any. Anyway, I've been wondering a few thing about how dissassemblers work. First, you start parsing a file looking for opcodes, and run into something that looks like an opcode but actually is data. Is there any way to discern between the two? Second, if during the parsing the end of a chunk of data, and the beggining of an opcode when put together looked like an opcode in and of themselves, how would you catch this? Last, when you're dealing with a fixed lengnth opcode processor, will opcodes only be stored at addresses that are multiples of the opcode lenth, or can they be anywhere?

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,333
    > Is there any way to discern between the two?
    Most object file formats have enough additional information to decide which is code and which is data.
    If you've just got a raw block of bytes from a memory dump say, then you're on your own.

    If you start off in the wrong place as it were, then the first few instructions will make no sense, but it usually sorts itself out. Try it with Microsoft debug in a console window.

    For a fixed length opcode machine, all opcodes begin on word boundaries (ot whatever word size the machine is). This is so they can be fetched efficiently in one cycle.
    Well they have to be aligned to run, they could be obfuscated at mis-aligned addresses to stop disassembling
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Registered User Xzyx987X's Avatar
    Join Date
    Sep 2003
    Posts
    107
    Quote Originally Posted by Salem
    > Is there any way to discern between the two?
    Most object file formats have enough additional information to decide which is code and which is data.
    If you've just got a raw block of bytes from a memory dump say, then you're on your own.
    Unfourtunately that is my case, but since I am dealing with a fixed length processor that should simplify things a bit.

    Quote Originally Posted by Salem
    If you start off in the wrong place as it were, then the first few instructions will make no sense, but it usually sorts itself out. Try it with Microsoft debug in a console window.

    For a fixed length opcode machine, all opcodes begin on word boundaries (ot whatever word size the machine is). This is so they can be fetched efficiently in one cycle.
    Well they have to be aligned to run, they could be obfuscated at mis-aligned addresses to stop disassembling
    I figured as much, but of course I could always incorporate a feature for interpretting at odd offsets.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Interesting Question
    By Swordsman in forum A Brief History of Cprogramming.com
    Replies: 24
    Last Post: 07-10-2007, 11:19 AM
  2. Disassembler
    By siavoshkc in forum C++ Programming
    Replies: 17
    Last Post: 03-01-2006, 04:43 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21