Write it in C xD
Elysia is right. Write first, optimize later. Debugging (and you will do that a lot) is much more important than speed. A program that crashes in 9 ms less time is not useful.
Just thinking that a routine will be called often is not a good solution. Use a profiler. You will be surprised at how much time passes in little functions you ignored.
That means, you can optimize on-the-fly. That would be f.i. choosing the right containers, pass by reference or pointer, inline small functions etc. which would not make it difficult to debug.
Also, in my experience, more than 50% of my code is thrown away immediately, so coding everything in assembly or complicated expressions is not the best solution. Maybe some people are not that wasteful, but in my case, hey, I'm not perfect.
Or you could fill your code with #ifdefs and use a threaded dispatch mechanism for all switch-case-labels for each architecture and compiler. That may be used in a final version, but debugging it will be a pain.
Anyway, the switch-case-thingies might as well go as this:
Code:
static const ptrdiff_t table[] = {&offsetof(BitBoard, WhitePawns), offsetof(...)};
if (Piecetype > 13)
return NULL;
return *(unsigned long long *)((char *)Board + table[Piecetype]);
One comparison, two adds and two deferences. That should be fast enough. But, if you change something in BitBoard, aka make it a real class, you have to rewrite this routine.
And inline those functions. The getter/setter ones always.