Thread: Building an assembler in C O_o

  1. #1
    Registered User
    Join Date
    Apr 2009
    Posts
    19

    Building an assembler in C O_o

    Seriously stuck here

    I have to build an assembler and I'm getting stuck on the first bit :-(

    I need to read each line of the input file. i.e

    Code:
    loop:
    add $1,$1,$1
    b   loop
    CHECK

    Then discard comments any comments and removing leading or whitespace: CHECK

    Tokenise the line into instruction mnemonics, operands, labels and so on. - This is where I'm stuck. What does this even mean?

    Recognise invalid instructions, invalid operands and so on. - I reckon I can do this, but after the previous stage.


    Maybe I'm being a bit vague here but I would really appreciate a push in the right direction if anyone could give me one. I don't necessarily need any code... just some way of doing it.

    Many thanks.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Read a line. Parse its contents. Basically, chop each line into its words. The number of words per line will likely be determined by what the first word actually is. So in this case, you look at add, and after figuring out that it says "add", you know that it has X number of parts after it. Those parts are then added, or whatever it is the command is supposed to do.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User
    Join Date
    Apr 2009
    Posts
    19
    Therein lies the problem though. There's like...30 different "types" of instruction. So I have no idea at what angle to come at this from.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    How different are those instructions? In my experience, assembler instructions come in groups that have similar features, e.g. there's math/binary operations such as
    add, sub, mul, div, and, or, xor, shifts (e.g. lsr, lsl), rotates (ror, rol).
    move/load/store instructions to read/write memory.
    conditional jumps
    unconditional jump and call instruction
    return instruction
    and there's usually a couple of "odd" instructions that don't resemble any others.

    For each of thos groups (except the last one), you can share code to decode and validate the parameters/arguments/operands (whatever you want to call them).

    You obviously need to split the line BEFORE that, at least enough to find out which instruction it is.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by Evenstevens View Post
    Therein lies the problem though. There's like...30 different "types" of instruction. So I have no idea at what angle to come at this from.
    How about with a list of say ... 30 elements?

    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    30 Helens Agree neandrake's Avatar
    Join Date
    Jan 2002
    Posts
    640
    Parse a line

    Means tokenize everything. Is it a mnemonic? Is it whitespace? Is it an expression? You need to compare the input against valid symbols. Then check if the symbols are valid against a grammar.

    Declare an enum or array of all possible symbols (all mnemonics, keywords, etc.). Now run through the line and check for your symbols. If a valid symbol is found, you want to save the symbol to check against the grammar.

    Grammar might look like this on paper:

    <line> := <mnemonic> <op> [, <op>]*
    <mnemonic> := <term_ops> | <expr_ops> | <mem_ops> | ......
    <op> := <register> | <immediate> | <address>
    etc...

    The grammar could get complicated, but it all depends on what your assembler needs to support. If it's fairly simple then you could probably get away with a few checks (check that there's a mnemonic and that it has necessary number of operands).
    Environment: OS X, GCC / G++
    Codes: Java, C#, C/C++
    AOL IM: neandrake, Email: neandrake (at) gmail (dot) com

  7. #7
    Registered User
    Join Date
    Apr 2009
    Posts
    19
    Okay, so basically I have an instruction table which looks like this to work from:

    http://i40.tinypic.com/s2e5q1.jpg

    Are you saying I make an enum with every "instruction" in it?

    i.e.

    Code:
    enum instruction
    {
        nop = 0;
        syscall;
        break;
        add;
        ...
        ...
        ...
        jal;
    } instruct;
    This thing is bloody confusing me. I don't even feel over the first hurdle yet...

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Yes. Make something to reflect every operation. Hell just to get something rolling, I'd probably make a structure for each one, and fill out a table of them as well:
    Code:
    struct ops oplist[] =
    {
        { "nop", 0, 0, NULL },
        { "syscall", 1, 0, func_syscall },
        ...
    };
    Fit your structure to have whatever fields you need. Like for example, the number of ops, or the actual op code, and possibly a function pointer of what function they should be performing.


    Quzah.
    Hope is the first step on the road to disappointment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 0
    Last Post: 06-08-2009, 05:33 PM
  2. 68000 assembler (building one that is)
    By Nutcasey in forum C Programming
    Replies: 7
    Last Post: 01-22-2004, 04:14 PM
  3. Assembler
    By GaPe in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 02-03-2003, 01:01 PM
  4. Assembler...
    By face_master in forum A Brief History of Cprogramming.com
    Replies: 14
    Last Post: 10-18-2002, 03:44 PM
  5. MAINFRAME Assembler.
    By sean in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 12-05-2001, 05:32 PM