Writing a MOS 6502 emulator

**Syscal** · 07-01-2013

I tried tackling this project a few years ago but never got the time to finish it. Now, I'm wanting to revisit this and have a question which might be a matter of preference. I have a register structure as illustrated below:

Code:

typedef struct{

  union{
    uword hi_lo;
    struct{
      ubyte hi;    //Program Counter Hi
      ubyte lo;    //Program Counter Lo
    }val;
  }pc;

  ubyte ac; //Accumulator
  ubyte xr; //X Register
  ubyte yr; //Y Register
  ubyte sr; //Status Register
  ubyte sp; //Stack Pointer

}registers;

registers r;

Should I make the "r" variable which is of type 'struct registers' a global variable or should I declare it in my main method and pass a reference all over the place between method calls? It will be used extensively. Is there a pro/con to making it global/local?

**GReaper** · 07-01-2013

What I did when I was building a kenbak emulator was to declare-define global variable into a separate source file, providing access only through the interface( functions declared in the header ).
Pros are that the user of the interface needn't worry about what happens when functions get called, except of what s/he can see. There's also a performance advantage, relative to passing arguments around, particularly in the bottleneck of a CPU-intensive emulator.
Cons are that you can't have more than one state in a single process. It's more difficult to expand/analyse the code due to many dependencies that may be unclear.

**stahta01** · 07-01-2013

Originally Posted by Syscal

Should I make the "r" variable which is of type 'struct registers' a global variable or should I declare it in my main method and pass a reference all over the place between method calls? It will be used extensively. Is there a pro/con to making it global/local?

I suggest at least using a longer name than "r" for a global variables.
The main reasons to NOT use global vars. is it makes unit testing harder and some bugs are very hard to find when global vars. are used.

Tim S.

**Syscal** · 07-02-2013

Thanks guys... I think I'll go with passing a reference to my registers.

**Malcolm McLean** · 07-02-2013

Originally Posted by Syscal

Should I make the "r" variable which is of type 'struct registers' a global variable or should I declare it in my main method and pass a reference all over the place between method calls? It will be used extensively. Is there a pro/con to making it global/local?

I'd pass a device context (resisters plus a memory buffer) to every function. Then you can easily have more than one emulator on the go.

A modern processor executes a thousand times faster than a 6502, so it's unlikely that performance will be a problem.

**Epy** · 07-02-2013

Originally Posted by Malcolm McLean

A modern processor executes a thousand times faster than a 6502, so it's unlikely that performance will be a problem.

If all he's making is a 6502 emulator then sure, but if he plans on emulating an entire system, then things will become much more complicated and programming for performance will be key.

**Nominal Animal** · 07-02-2013

Originally Posted by Malcolm McLean

I'd pass a device context (resisters plus a memory buffer) to every function. Then you can easily have more than one emulator on the go.

Me too.

As an example, consider the following:

Code:

#include <stdint.h>

typedef struct {
    /* Registers */
    uint16_t  pc;
    uint8_t   a;
    uint8_t   x;
    uint8_t   y;
    uint8_t   st;
    uint8_t   sp;
    /* Other chips? */
    /* Memory */
    uint8_t   mem[65536];
} mos_6502_t;

#define PC(stateptr) ((stateptr)->pc)
#define PC_LOW(stateptr) ((stateptr)->pc & 255U)
#define PC_HIGH(stateptr) ((stateptr)->pc / 256U)

For the different addressing modes, define macros similar to above. The code is then not dependent on host byte order, and should be much easier to write.

Since the mos_6502_t structure contains the entire state of the emulated machine, you can emulate more than one in a single process. Also, you only need to provide the one pointer to the state structure to your functions.

(Most embedded scripting languages have a very similar problem, Lua being the exception that comes to mind. Instead of using a separate state variable, most scripting languages just use global variables. That means you can't have more than one independent interpreter state in one process. Some have grown warts afterwards to support multiple interpreters in one process, but those tend to be fragile and prone to leakage.)

There is very few downsides to this approach. Your functions will pass the pointer to the state all over, and you may need to add fields to describe at least some of the emulator internal state, but in my experience, those are not downsides. In fact, I think it tends to push your mindset towards modularity, and yield better organized, more readable code.

(I don't know why, but whenever global variables are used a lot in C, the code also tends to become spaghetti. I suspect it is a psychological effect; that the types of data structures you use, also affect the patterns of code you create.)

**Syscal** · 07-02-2013

Good info! After I have implemented this, I plan to expand on it to eventually implement an NES emulator.

**Epy** · 07-02-2013

As I mentioned before, if using a global gives a performance advantage (which I'm not sure it does actually), I would take it because of the other hardware you have to emulate. I don't have a great deal of understanding of it personally, but I know that SNES and other emulators have such performance issues because of the syncs performed between the various emulated chips.

**Nominal Animal** · 07-02-2013

Originally Posted by Epy

if using a global gives a performance advantage (which I'm not sure it does actually)

I'm pretty sure it does not give a meaningful performance advantage.

On x86-64, there is no advantage. Global variables are addressed using %rip. Pointer-to-structure references use a base register, often %rdi if specified as first parameter to a function. Aside from using one register to hold the pointer, there is just no difference at all.

On x86, addressing uses the 32-bit immediate form, which requires five bytes, so that any relocation can be done correctly. The indirect form using a pointer requires only four bytes (for offsets -128..127 to the pointer), and is only four bytes. Both addressing modes are equally fast. So, direct addressing produces longer code, and indirect addressing uses one register for the pointer; otherwise use the same clock cycles.

Even on architectures where there is a difference, we're talking about one or two clock cycles per access. Typical Linux processors, even embedded ones, already tend to have near or over gigahertz; meaning the access difference we are talking about is less than one-thousandth of the typical mos-6502 clock cycle. It just isn't relevant.

Furthermore, if we take into account that one can tell the compiler how each function accesses the pointer (whether the pointer is const, or the data is const, or neither, or both), this tends to let the compiler generate better code. You cannot tell the compiler which functions modify which global variables, so when the compiler compiles a function that calls other functions, it cannot always tell whether a global variable might be modified, leading to sub-optimal code.

Of course, if you do not sprinkle const in your function parameter definitions, this likely does not matter. But, if you do, using well-marked pointers will likely lead to faster code than one that uses global variables.

Originally Posted by Epy

I know that SNES and other emulators have such performance issues because of the syncs performed between the various emulated chips.

Synchronization issues have nothing to do how the state is referenced. That is, the same problems exist whether the various emulated chips are described by global variables, or whether they are described by structures hanging off from pointers in the emulated machine state structure.

**phantomotap** · 07-02-2013

I know that SNES and other emulators have such performance issues because of the syncs performed between the various emulated chips.

O_o

Well, yeah, syncing to ~21 mHz and ~28 is a costly process, but you'd need enough of a boost from global variables to allow a significant increase in the number or complexity of syncing to see a benefit in emulated accuracy. I just don't think the boost is available.

Soma

**Epy** · 07-03-2013

Originally Posted by Nominal Animal

Synchronization issues have nothing to do how the state is referenced. That is, the same problems exist whether the various emulated chips are described by global variables, or whether they are described by structures hanging off from pointers in the emulated machine state structure.

That's not what I was implying, I was implying that you want the most performance possible as there's a lot more to be done than just emulate this one chip. Read the articles below. Soma's response and your first remark are more correct; if there's no performance increase, then there's no reason to do it.

"The primary demands of an emulator are the amount of times per second one processor must synchronize with another. An emulator is an inherently serial process. Attempting to rely on today's multi-core processors leads to all kinds of timing problems."
Why Perfect Hardware SNES Emulation Requires a 3GHz CPU - Tested
Accuracy takes power: one man’s 3GHz quest to build a perfect SNES emulator | Ars Technica

Originally Posted by Nominal Animal

Even on architectures where there is a difference, we're talking about one or two clock cycles per access.

While 1-2 clock cycles IS nothing, you can't disregard a few cycles' difference in this application. If you did take that design philosophy, apply it everywhere, you'd have a slow, crappy emulator. We're talking about the realm of programming here that isn't just about using the best algorithm necessarily, it's squeezing the living ........ out of everything to get the most. Again, look at the articles and look at how often things have to be synced. The NES is of course simpler, but the same concepts apply. If you want cycle-accurate emulation, you have to sync out the ass. Also, why do you think ZSNES is mostly written in assembly?

**Syscal** · 07-03-2013

Any tips on the instructions? I'm planning on creating a jump table with all the elements containing function pointers. Each pointer represents a different opcode and each one would have the parameters being a reference to my device context and the other being the opcode.

**Nominal Animal** · 07-03-2013

Originally Posted by Epy

While 1-2 clock cycles IS nothing, you can't disregard a few cycles' difference in this application. If you did take that design philosophy, apply it everywhere, you'd have a slow, crappy emulator.

No, that's not what I meant. I meant that compared to the other work the emulator has to do, the access delay is insignificant. You can make much bigger savings by optimizing the emulator on the algorithmic level, even if the access to the emulator innards happened to be a cycle or two slower than optimal.

The majority of these bigger savings start with having a clear structure. Not for the computers' sake, but for the programmers'. Compilers can only do so much; it's up to us humans to design the structure of the emulator to be efficient and fast.

When emulating individual chips, there are always more than one way to go. You can simply emulate the inputs and outputs, without duplicating the structure of exactly how the chip does this. Or you can write a cycle-based emulator of the chip, like say reSID for the mos-6581 SID audio chip.

Synchronization issues only arise as a byproduct of the emulator design. I'm not saying I know how to avoid them for SNES, or that a design that avoids the synchronization issues would yield a faster SNES emulator, I'm just saying it is a result of the design, not a necessary evil of emulation.

Originally Posted by Epy

Also, why do you think ZSNES is mostly written in assembly?

Could be many reasons. Perhaps the programmers were more comfortable in assembler? Perhaps the compilers they used did not optimize the code well enough? Perhaps the programmers thought handwritten assembly would beat compiled C and C++? Perhaps they wanted to showcase their assembly skills? (Note the handle of the original programmers; in the demo world, showcasing your skills is pretty darn important.)

Some of the code I write is very performance sensitive. I'm quite proficient at assembly (for x86 and some other architectures; I've written quite a lot of 6502 assembly years ago, and I still have a working C128), and I've written a couple of emulators myself. (For the first one, during the MS-DOS era, I didn't even have a proper x86 assembler; I had to use the DOS DEBUG to write the tricky parts of the code. Even jumps had to be addressed, no label support. I used Turbo Pascal for the non-performance-critical parts.. So I do claim I know what I'm talking about.)

However, I'm not under the illusion that simply because something is written in assembly, it must be faster.

A very good assembly programmer can optimize a loop or a function better than a compiler. However, I've never met a programmer who could keep all the details of a complex application (with complicated internal interactions) in their mind well enough to integrate and optimize these pieces into a program even nearly as well as a compiler can. A human, it seems, can only focus on a limited area at once; at larger scales, we rely on abstractions. You can highly optimize a small part, but applying that sort of focus to a large application is inhuman.

ZSNES was developed quite a long time ago, in computing terms. We even know a lot more about how to emulate hardware in software now. Our typical machines have a lot of RAM, and multiple processing cores, and the facilities for using those in C and C++. I suspect that ZSNES is not nearly the paragon of efficiency and speed on current machines that you seem to believe.

**Nominal Animal** · 07-03-2013

Originally Posted by Syscal

I'm planning on creating a jump table with all the elements containing function pointers.

If the code is portable, then I'd use a switch statement with an uint8_t argument (the opcode byte, obviously). The compiler is then free to optimize that however it sees best. On some architectures, function pointers are slower.

Looking at one of the opcode tables, many of the opcodes can actually be implemented in the case statement itself, without a separate function call. The instruction set itself is simple enough that this might make sense.

Maintenance-wise, it might even make the code maintenance easier: all the opcode implementations are in the same place, so if a bug fix applies to more than one opcode, one is likelier to remember to fix them too.

Thread: Writing a MOS 6502 emulator

Thread Tools

Search Thread

Display

Writing a MOS 6502 emulator

Similar Threads

6502 assembler question

Should the 6502 core file be plain-text or binary.

Working on a 6502. Could you tell me if this is efficient?

New to C++, not coded in years, ex C64 6502 / Amiga MC6800 coder here!

98 emulator?