Efficiently access member variables

**cyberfish** · 03-31-2012

If I have a function that needs very fast (down to instructions level) access to a few variables in a struct, how should I do it?

The program in question is an interpreting emulator. The fetch/decode/execute function needs very fast access to the (emulated) registers.

The logically correct way to do this would be to use a context struct (C style) or have them as member variables and function (C++ style).

Code:

struct Context {
     int regs[16];
}

void f(Context *ctx)
{
     // do stuff with ctx->regs[]
}
...
Then

Context ctx;
f(&ctx);

However, that means all accesses to regs need to go through at least a layer of redirection (de-referencing ctx).

Or in C++

Code:

class C
{
     int regs[16];
     ...
     void f()
     {
          // do things with regs[]
     }
};

which will compile to exactly the same thing (except ctx would be called "this").

A more efficient way is to make them static

Code:

void f()
{
     static int regs[16];
     // do things with regs[]
}

This way would save a de-reference.

However, that's ugly. Is there a way to get best of both worlds? Somehow "instantiating" code?

**grumpy** · 03-31-2012

You're going to have a "dereference" anyway. If you're not accessing a member of a struct, you will still be accessing members of an array (which in C and C++, is often adding an offset to a pointer, and dereferencing the resultant pointer).

If you want to avoid the "dereference" to access a member of a struct, just pass a pointer to regs around. To either a static member function of a class, or a "normal" (as in not a member of a class) function.

You might be able to do some tricks with template instantiation, which allow the compiler to optimise out the struct accessing. But YMMV will vary with that, depending on compiler implementation.

**Salem** · 03-31-2012

In the case of C++, then ctx.f() is (usually?) an inline function when f() is a small function declared within the class itself.

Also, if your array subscripts are constants, there is no additional run-time overhead compared to accessing named variables, since the compiler will do all the address calculation at compile time.

**manasij7479** · 04-01-2012

I'm not sure how well it will perform in comparison, but you can try offsetof(3): offset of structure member - Linux man page .

**brewbuck** · 04-01-2012

I don't think there is any way to reasonably save the dereference. If you make them static data, then you instantly lock yourself in to a single-threaded emulator. It just seems like the wrong design.

Put trust in the compiler. After inlining, cross-module optimizations etc, you may find that the context pointer gets placed directly in a (real) register and lives there for a long time, pretty much wiping out the cost of that dereference.

**cyberfish** · 04-01-2012

In the case of C++, then ctx.f() is (usually?) an inline function when f() is a small function declared within the class itself.

Also, if your array subscripts are constants, there is no additional run-time overhead compared to accessing named variables, since the compiler will do all the address calculation at compile time.

That's true, but it seems like inlining won't get rid of the dereferencing by itself. The subscripts are not always constants. Since it's a RISC arch, almost all instructions work on all registers. Though I suppose that's a possible area of optimization.

I'm not sure how well it will perform in comparison, but you can try offsetof(3): offset of structure member - Linux man page .

I believe that's what the compiler will do either way.

I don't think there is any way to reasonably save the dereference. If you make them static data, then you instantly lock yourself in to a single-threaded emulator. It just seems like the wrong design.

Put trust in the compiler. After inlining, cross-module optimizations etc, you may find that the context pointer gets placed directly in a (real) register and lives there for a long time, pretty much wiping out the cost of that dereference.

Yeah I really don't want to go down the static route. I guess that makes sense. LTO + inlining will probably have the pointer end up in a register.

I think I'll just leave it there for now, and see what happens. Speed may not be a problem at all anyways, since I'm only trying to emulate an ARM core at 16.8MHz (GameBoy Advance), so I still have about 100 host cycles to 1 emulated cycle, which should be enough.

**VirtualAce** · 04-02-2012

Since inlining is simply a suggestion to the compiler I would check the assembly language that is created from your code to ensure it actually did inline them...that is if performance is super critical. If not then I wouldn't go through the trouble. Often times I have been surprised by the un-optimized nature of some assembly output (at least from MSVS) even with all the doo-dads and optimizations turned on.

**brewbuck** · 04-02-2012

Originally Posted by VirtualAce

Since inlining is simply a suggestion to the compiler I would check the assembly language that is created from your code to ensure it actually did inline them...that is if performance is super critical. If not then I wouldn't go through the trouble. Often times I have been surprised by the un-optimized nature of some assembly output (at least from MSVS) even with all the doo-dads and optimizations turned on.

I've noticed that the major modern compilers all seem to be very aggressive with inlining nowadays, sometimes inlining surprisingly complex functions if it believes it might lead to better optimization. Also, when evaluating a piece of assembly code, remember that the compiler cannot assume anything about pointer aliases, even though human programmers often understand that certain pointers cannot alias each other, and the compiler therefore sometimes does things which appear stupid such as repeatedly dereferencing the same memory location. Many could probably by fixed by use of the restrict keyword.

**MK27** · 04-03-2012

Originally Posted by brewbuck

the restrict keyword.

Hmmm, thanks. I'll be sure to abuse this one.

**phantomotap** · 04-03-2012

Be aware of the flaky and often bizarre support for `restrict' or `__restrict__'.

That is one of things that should absolutely be used through macro.

Soma

Thread: Efficiently access member variables

Thread Tools

Search Thread

Display

Efficiently access member variables

Similar Threads

Cannot access a non-static member

How do I access this member?

Providing access to member variables between classes

why does this have private member access?

Class member variables only readable by member functions?