Thread: Register tranfer language interpreter in C

  1. #1
    Registered User
    Join Date
    Dec 2003
    Posts
    16

    Register tranfer language interpreter in C

    Hello,

    I'm building a CPU simulator for my final year project at university

    I'm trying to make the simulator accept a file that defines each instruction that the simulator supports. It has a format specified in the file "intSet.txt". This file detials the exact specification of a given instruction in my program and i was wondering how i could make my program understand the instruciton operation that has been specified by the "RTL" section of the file.

    Ie understand that by saying <SOURCE> + <DESITNATION> -> <DESITNATION> i want to create a function that will add due to the "+" sign being used.

    The file attatched is an example of hoe a given instruction will be defined via the program so any help with any of this would be great.

    Thanks for all your help
    Garry


  2. #2
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    Does anyone know how i might do this?


    Garry

  3. #3
    Registered User Draco's Avatar
    Join Date
    Apr 2002
    Posts
    463
    don't double post. We see your question, and people are looking at it. When someone has an answer they'll post it. This is one of the toughest questions I've seen in a while, so it might take some time

  4. #4
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    I didn't really understand your pseudocode but I know what you are trying to do.

    How many instructions are you going to have? How many variations of each instruction are you going to support. For instance as you probably know the mov instruction in the Intel x86 set is one of the most complex instructions ever hard-coded for a CPU. There are several diff variations of it.

    It looks like you wish to support these:

    mov <r/m,imm>,<r/m,imm>

    which breaks down to:

    Move immediate 8/16/32 bit value to corresponding register
    Note: Immediate value may reside in stack or in program global data.

    mov <r8>,<imm8>
    mov <r16>,<imm16>
    mov <r32>,<imm32>

    Move immediate value to memory location
    mov <m>,<imm8/imm16/imm32>


    This is just a sample of what I'm getting from your post. Am I on
    the right track here?

    I really need to know how many and what type of instructions you are wanting to support.

  5. #5
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    I'm very sorry i should have been more specific about what i was basing the simulator on, the CPU that the simulator is based on is the 68000. I'm trying to keep in with the way 68000's instructions operate in terms of size and addressing modes. (The simualors registers will be like the 68000 too in terms of 8 data and 8 address registers, with the stack on Address register A7)

    The text file that I posted shows how a single instruction could be defined via a text file that can be loaded into a structure (within the program).

    The basic instruction set that i aim to have is:

    ADD
    ADDA
    MOVE
    SUB
    CMP
    BNE
    BEQ
    MULU

    After getting these to work, anything will be a bonus. If there is anything else anyone needs to know, please post and i will try to provide information as quick as i can.

    Bubba your kinda right in that your using a type of bakus nour-form to define the capabilities of an instruction. Thats what i hope to do with the text file in that all of the information that is required about an instruction can be read by the simulator and then acted upon in the approprate manor.

    EDIT

    Also, I would like to be able to perform b, w, l operations for any given instruction that supports it. That was meant to be part of the text file above but i accidently forgot to include it.

    If you think inputting the instruction set into the simulator in a different way would help then im open to different ideas.

    Thank You!
    Last edited by Nutcasey; 12-17-2003 at 09:25 AM.

  6. #6
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    ADD
    ADDA
    MOVE
    SUB
    CMP
    BNE
    BEQ
    MULU

    Not sure what these do as I'm not familiar with them.

    ADD is obvious. How many forms of the ADD do you wish to support.

    Key to abbreviations
    imm8 - an immediate byte value, imm8 is signed between -128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value.

    imm16 - an immediate word value used for instructions whose operand size attribute is 16 bits; signed number between -32768 and +32767 inclusive.

    imm32- an immediate doubleword value used for instructions whose operand size attribute is 32-bits - between +2147483647 and - 2147483648 inclusive.

    r/m8 - a one byte operand that is either the contents of a byte register (AL,BL,CL,DL,AH,BH,CH,DH) or a byte from memory

    r/m16 - a word register or memory operand used for instructions whose operand size attribute is 16 bits. The word registers are AX,BX,CX,DX,SP,BP,SI,DI. The contents of memory are found at the address provided by the effective address computation. (same as LEA - load effective address nearly)

    r/m32 - a doubleword register or memory operand used for instructions who operand size attribute is 32-bits. The doubleword registers are: EAX,EBX,ECX,EDX,ESP,EBP,ESI,EDI. The contents of memory are found at the address provided by the effective address computation.

    The complete Intel ADD instruction
    ADD

    Operation
    DEST <- DEST + SRC

    Description
    The ADD instruction performs an integer addition of the two operands (DEST and SRC). The result of the addition is assigned to the first operand (DEST), and the flags are set accordingly.

    When an immediate byte is added to a word or doubleword operand, the immediate value is sign-extended to the size of the word or doubleword operand.

    Flags Affected
    The OF, SF, ZF, AF, CF, and PF flags are set according the result

    Protected mode exceptions
    #GP(0) if the results is in a non-writeable segment; #GP(0) for an illegal memory operand effective address in the CS,DS,ES,FS, or GS segments; #SS(0) for an illegal address in the SS segment; #PF(fault-code) for a page fault; #AC for unaligned memory reference if the current privilege level is 3.

    Real address mode exceptions
    Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFh.

    Virtual 8086 mode exceptions
    Same exceptions as in Real Address Mode; #PF(fault-code) for a page fault; #AC for unaligned memory reference if the current privilege level is 3.


    • 04 ib - ADD AL,imm8 - 1 - Add immediate byte to AL
    • 05 iw- ADD AX,imm16 - 2 - Add immediate word to AX
    • 05 id - ADD EAX,imm32 - 1 - Add immediate dword to EAX
    • 80 /0 ib - ADD r/m8,imm8 - 1/3 - Add immediate byte to r/m byte
    • 81 /0 iw - ADD r/m16,imm16 - 1/3 - Add immediate word to r/m word
    • 81 /0 id - ADD r/m32,imm32 - 1/3 - Add immediate dword to r/m dword
    • 83 /0 ib - ADD r/m16,imm8 - 1/3 - Add sign extended immediate byte to r/w word
    • 83 /0 ib - ADD r/m32,imm8 - 1/3 - Add sign extended immediate byte to r/m dword
    • 00 /r - ADD r/m8,r8 - 1/3 - Add byte register to r/m byte
    • 01 /r - ADD r/m16,r16 - 1/3 - Add word register to r/m word
    • 01 /r ADD r/m32,r32 - 1/3 - Add dword register to r/m dword
    • 02 /r ADD r8,r/m8 - 1/3 - Add r/m byte to byte register
    • 03 /r - ADD r16,r/m16 - Add r/m word to word register
    • 03 /r - ADD r32,r/m32 - Add r/m dword to dword register


    So you can see that instruction coding can be quite complex. Perhaps you want to support only a subset of these?

    How many registers and flags will you have? If you have flags you must have a CMP; J(x) or equivalent function that will check the flags for branching execution.

    asm:
    mov ax,5
    sub ax,5
    jz ZERO

    pseudocode:
    • Move sign-extended immediate byte value of 5 into ax register
    • Subtract sign extended immediate byte value of 5 from ax register
    • Check zero flag to see if set, if so make a short jump (+/- 128 bytes from this instruction) to ZERO label. If ZF not set, continue with next instruction.


    Here is an example of program flow:

    • Save current registers/segment registers
    • Save current stack registers/pointers
    • Save current IP or EIP
    • Save current flags
    • Check flag to see if inside of interrupt
    • If so, issue IRET
    • Shut interrupts off
    • Turn all interrupts on
    • Reset the FPU
    • Call OS for loading of opcode file into memory
    • Set CS to memory segment or descriptor
    • Set (E)IP to zero offset inside of segment
    • Set DS to data segment - either allocate from OS or allocate from your own OS
    • Setup SS for stack segment - again allocated
    • Setup (E)SP to point to top of stack
    • Setup (E)BP to point to bottom of stack
    • Clear segment registers DS, ES, FS, and GS
    • Clear all explicit return registers
    • Clear index registers (E)DI, E(SI)
    • Begin program execution
    Last edited by VirtualAce; 12-17-2003 at 01:36 PM.

  7. #7
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    Despite the 68000 being a cisc CPU i dont think it is as complex as any intel CPUs! The best way to understand the instruction set (as well as the flags and addressing modes that will be used) is to look here:

    http://www.ticalc.org/pub/text/68k/

    "Motorola 68000 Programmers Manual (Adobe Acrobat format)"

    and any other document there would be of help to you too.

    The operation you used:

    Operation
    DEST <- DEST + SRC

    is illustrated in the same way in the documents and provides keys to all the other aspects such as instruction size etc

    ADD.W D1,D2
    this would a possible format for an add instruction used in the 68000 (you will see this in the manual).

    As for flags, there are only 5 on the 68000's CCR, you can see this in the .pdf document too.

    You will also see in this in document too. (however i only wish to support the instructions listed above, (there is a CMP there too) and i wish to only simulate the N Z C V bits due to time constraints for my project).

    In answer to the different isntruction types:

    ADD (would use data registers for the destination operand)
    ADDA (would use address reigsters for the destination operands)
    ADDI (would use a literal within an instruction)

    When using a normal 68000 cross assembler you can usually write ADD and still use different operand addressing modes, the cross assembler would then append the "A" to the ADD instruction and make it ADDA (if an address register has been used instead of two data registers). This is also shown in the document that can be downloaded.

    I wish to have support for both ADDA, ADD and ADDI as well as the other instructions listed above.

    I hope this is all making sence, if not then please say.

    Thank you very much for your help on this its much appreciated!
    Last edited by Nutcasey; 12-18-2003 at 08:19 AM.

  8. #8
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    Yikes. I would have to learn an entirely new set of instructions and addressing mode to even begin to help you. I'm a pure x86 man and really don't want to get into another type of assembly - but I'll look into the docs.

  9. #9
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    Thanks mate, if you find anything either way then let us know.

  10. #10
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The only thing I've come up with is a binary standard that you could create for each instruction.

    Different opcodes would mean different addressing modes.


    For a data register 01 opcode could be used. The register is then encoded directly into the instruction - hence 01 will always mean use a certain data register. Variations on the opcode would give you the addressing modes.

    Example for 1 data register.

    01 - ADD <constant_data_register>,operand
    ------------------------------------------
    01 ib - ADDI <dataregister8>,imm8
    01 iw - ADDI <dataregister16>,imm16
    02 id - ADDI <dataregister32>, imm32
    03 /0 ib - ADDA <dataregister8>,m8
    ...
    ...

    and so on

  11. #11
    Visionary Philosopher Sayeh's Avatar
    Join Date
    Aug 2002
    Posts
    212
    First of all, your 'intset.txt' file is nothing more than that fronticepiece on the instruction, virtually as found in Motorola's manual on this chip.

    It's for humans to understand how the instruction works not a computer.

    If you want to emulate an assembler instruction, you need to handle it the same way a processor handles it-- by actually evaluating the binary that makes up the instruction.

    It's trivial, and a lot of fun.

    if you still need help on this, reply, and I'll show you how.

    all processors are alike, same principle will work with any.
    It is not the spoon that bends, it is you who bends around the spoon.

  12. #12
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    Yeah ok, any ideas from yourself would be great.

  13. #13
    Visionary Philosopher Sayeh's Avatar
    Join Date
    Aug 2002
    Posts
    212
    Okay, before I take the next step, I need to be clear on this point-

    Are you having actual 68K binary submitted to you for emulation/execution, or are you having human understandable mnemonics (aka assembly language) submitted to you in a text file?

    If the former, then you are writing just an emulator. If the latter, you are writing a partial assembler, and then an emulator-- much larger project, but even more fun, and supremely more educational.

    Also, how much time do you have?

    Let me know answers to both questions... thx.
    It is not the spoon that bends, it is you who bends around the spoon.

  14. #14
    Visionary Philosopher Sayeh's Avatar
    Join Date
    Aug 2002
    Posts
    212
    If I can find a copy of the motorola book for you, would you be interested in buying it? The concepts I'm willing to share will apply to any processor, anywhere. This is a _big_ jump in knowledge for you, not even your professors really understand how this stuff is done. You'll knock their socks off.
    It is not the spoon that bends, it is you who bends around the spoon.

  15. #15
    Registered User
    Join Date
    Dec 2003
    Posts
    16
    The idea is to just have the simuator uderstand a collection of assembly language instructions (not actual 68K binary).

    I only wanna simualte the following instrucitons :

    MOVE
    MOVEA
    ADD
    SUB
    CMP
    BEQ
    BNE
    BRA
    MULU
    CLR
    LEA

    I only have the best part of about 5 - 6 weeks to do it in. I would also like to have as many addressing modes of the 68000 simualted (where applicable) for use by the instructions.

    I already have 2 books on the m68000 i just wanted a few ideas on how i should simulate it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Why C Matters
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 136
    Last Post: 01-16-2008, 09:09 AM
  2. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  3. Replies: 0
    Last Post: 04-06-2007, 04:55 PM
  4. Strange loop
    By D@rk_force in forum C++ Programming
    Replies: 22
    Last Post: 12-18-2004, 02:40 PM
  5. Language of choice after C++
    By gandalf_bar in forum A Brief History of Cprogramming.com
    Replies: 47
    Last Post: 06-15-2004, 01:20 AM