Also, there was condescension somewhere?
O_o
If you still perceive my post as condescending, I apologize.
If that was more "Which bit is supposed to be condescending?", I removed part of my post.
Stack allocations vs. heap allocations in this case have the potential to be costly (if using new()) when running on some 16 core processor because now that's 16 searches through the pool instead of just throwing something on top of the stack 16 times.
I get the feeling you learned about the generic forms of "nearest fit list" style allocators without ever studying more modern designs. (Granted, still decades old techniques in any event.) You need not thing of heap allocation as always being a mutually exclusive lock around linear polling for the next block big enough to store the size requested. You have fallen into the classic trap: "If `malloc' isn't fast enough, programmers are doomed to write their own implementation.". (I probably butchered this quote, and I have no idea who said it.) The thing is, modern allocator designs for C++ know the "Best Practices" of C++ a lot better than you.
Code:
void DoSomething(/**/)
{
some_smart_pointer s = new int[4];
// do something
}
// ...
int main()
{
while(/**/)
{
DoSomething(/**/);
}
}
This form of allocation is extremely common in modern C++ thanks to "Best Practices" and canonical idioms--like "type erasure" and "expression templates"--which imparts the need for an allocator that uses some method of stack-like caching internally. Combine those techniques with "thread local storage" hinting at the "next nearest fit" and the comparison with "linearly searching a pool" fails.
Code:
GLOBALType * store = /**/;
TLSType * size[] = /**/;
TLSType * cache = /**/;
bool dirty_pool = /**/;
void DoSomething(/**/)
{
some_smart_pointer s;
if(cache->size > (sizeof(int) * 4))
{
s = cache->memory;
}
else
{
void * temp = allSiz[LINE > (sizeof(int) * 4)]->next;
// update allSiz[LINE > (sizeof(int) * 4)]
if(!temp)
{
// grab more memory for this thread pool
// or
// grab this allocation from global
}
s = temp;
}
// do something
if(dirty)
{
// set cache
// release cache to thread pool
}
else
{
// release pool segment to global if $(metric)
// or
// release allocation from global
}
}
Yes. This example looks scary expensive compared to a version with `alloca', but with a largely stable process--allocation strategy fits cache metrics--the complexity behind the scenes is largely irrelevant.
Code:
GLOBALType * store = /**/;
TLSType * size[] = /**/;
TLSType * cache = /**/;
bool dirty_pool = /**/;
void DoSomething(/**/)
{
some_smart_pointer s;
if(cache->size > (sizeof(int) * 4))
{
s = cache->memory;
}
// do something
if(!dirty)
{
// do nothing for allocator
// cache is "warm"
}
}
I'm obviously grossing over a lot of implementation details, but the point is, a lot of modern allocator implementations for C++ do all sorts of nearly magic caching behind the scenes. (The suggestion of manually using "thread pools" from Elysia is an example; a lot of modern allocator designs use the same techniques transparently.) Anything you might try to "add" to the process--including using multiple allocator implementations--will often decrease performance because you aren't following the expected strategy of "lots of small thread specific allocations".
And if not alloca, how am I supposed to get variable length arrays on the stack in C++ if ISO C++ forbids it?
As laserlight says, you are already using a compiler extension; using a different compiler extension shouldn't be a problem for you.
That said, because you are already following a miniscule approach, you may use a manually employ a "double ended stack" which, once created, has almost identical performance to `alloca' with the added benefit of being extremely portable.
[Edit]
You use one side of the stack for "this function" and the other side for "that function". So, to allocate as `alloca' you use only the one side while "return values" use the other side.
Code:
void Core(/**/)
{
// this function needs an array
DEStack sA(global);
int * s = sA.get<int>(4);
// `s' is released automatically
}
Code:
int * Implementation(/**/)
{
// the calling function needs an array
DEStack sA(global);
int * s = sA.rget<int>(4);
// ...
return s;
}
void Client(/**/)
{
DEStack sA(global);
int * s = Implementation();
// `s' is released automatically
}
[/Edit]
Be fairly warned, that is an extreme form of "micro-optimization" only to buy a passable form of `alloca'.
Elysia, I'm having a really hard time admiring you right now.
You should put your money... over your mouth because you are just ridiculous.
Software development isn't a popularity context, but at least a few people here already blame you for "chasing off" one regular. Keep up nonsense like that and you may very well find that no one wants to help you anymore.
[Edit]
^_^;
I love "popularity context" so hard.
[/Edit]
You also need to admire microkernel operating systems. Those are a piece of art, unlike monolithic operating systems...
The modern "Windows" kernels are hybrid designs being truly neither monolithic nor microlithic.
Soma