@Elysia: Why don't bitfields work very well?
"...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson
Indeed. I have encountered some compilers that target embedded systems which do not support bitfields.
Even among compilers that support them, the layout of bit fields within a variable is implementation defined according to the standard (which means that code which relies on a particular layout of bitfields within a variable is inherently not portable).
Malcolm, you are very knowledgeable. I'm impressed. But don't worry, I found two papers by the exact same authors who have done research in exactly what I'm trying to do and in a way that's very similar to mine. I found "A Versatile and Robust Model for Geometrically Complex Deformable Solids" and "Optimized Spatial Hashing for Collision Detection of Deformable Objects" which are both, for the most part, by the same people. They're very educated and it looks like I can just copy-paste their intersection test into a new function for me and use it that way. They don't address intersecting edges but I think my code should work irregardless as theirs did. And that's because my vertices pass the Delaunay test. Because of that, I have 4 isolated vertices. I just want to make sure that I can create 4 isolated tetrahedrons that aren't touching. I have degenerate cases so there's a correct combination somewhere.
My code uses Springel's Peano-Hilbert hash function to sort points in space. Whether or not it is to be used in a useful manner is something I will find out on my own.
And a quick question, once I install Hoard and export the proper preload, does writing "malloc" in a piece of C code then call Hoard? Or does Hoard need its own command? Or is malloc just to set to call the library that is given the most precedence which would the preloaded hoard library?
who says? and how do they know? and which C runtime? does that include the overhead of garbage collection in Java?
edit: in lieu of info from the OP, I'll answer my own question: http://www.ibm.com/developerworks/ja...275/index.html
Last edited by dmh2000; 04-30-2013 at 02:24 PM.
Even if you're allocating massive amounts of data, worrying about allocation speed smacks of premature optimization. Have you actually run empirical tests to identify this as a problem? For that matter, a lot of your "optimizations" from this thread and others seem premature. Get a working single-threaded version of your application and profile the code to find where you actually need to optimize, and to give yourself a baseline to start from.
Trying to optimize without even knowing where your code NEEDS optimization is an exercise in futility, and may well make overall performance worse. Almost anything with performance is a tradeoff - it will make some things better and other things worse. Until you know which areas actually need improvements, you can easily make things insignificantly better in one area and significantly worse in another. Find and quantify your bottlenecks first, THEN optimize.
Last edited by Cat; 04-30-2013 at 06:12 PM.
You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.
Alright, I re-wrote this entire post because the last draft was me just ranting and swearing 'cause it's my thing to do.
So let's talk about what needs to happen per timestep here.
We have a particle set. We build an initial tetrahedron of size such that it encompasses the whole of the particle set. As a sidenote, me and Springel found the same "Is this point inside this tetrahedron?" routine which is great because it's all linear algebra operations and those are quick. We then take a particle and 'insert' it into our tetrahedron and that splits our root into 4 daughters. So we have 5 points and 4 tetrahedrons. We find the tetrahedrons such that they satisfy Delaunay conditions for every inserted particle and once we do that and make sure they aren't colliding and fill space properly (which is harder than it sounds, as I'm finding).
This is easy to multithread. Now that we have 4 daughters, let's create a thread for each one so we went from 1 thread to 4. It's basically a quadtree, to put it simplest. Each thread that is created by a parent is allowed to create any children that fit because each tetrahedron is independent of the other. Or so I'm assuming. His paper seemed to agree with me on this and that's good as independent data sets are independent.
So if we were doing this on a GPU, we'd create 4^n threads and go nuts. And don't worry, he (Springel) locates nearest neighbours by the Peano-Hilbert curve which is fantastic because that at least works for me flawlessly. Mostly 'cause I copy-pasted it lol.
I'm recalculating the Delaunay triangulations per timestep. So let's incorporate the appropriate complexity analysis, we have something that advances like 1e-07 gigayears up to 3 gigayears so that's quite a bit right there. It's ~10^7 calculations which is quite a bit because of all the tests we have to do not to mention the physics calculations that come after. This is ten million times I need to allocate for what should be up to millions of particles. That's millions of vertices to triangulate so this code is not light or fast. Okay, it's relatively fast for what it does but on an absolute scale, the physicists know how to make code heavy.
I will be using many threads to allocate many different children so a multithreaded memory manager is optimal in my case. Unless I'm entirely wrong in which case, that was my bad.
My point on performance is you shouldn't start to think about performance problems until you have started to gather data and determine where the bottlenecks actually are. Out of every second, how many milliseconds are actually being spent on allocation? Until you know that answer, you don't even know if this is worth bothering to try to optimize.
Maybe you're doing all the right things, but without data, you end up with "we think it's getting better but nobody's really sure". Every performance decision is going to be some kind of tradeoff - you will gain something and you will lose something, but until you measure, you can't know if you'll end up ahead. Worse, you won't even know you're focusing on the right thing - you could work for months to make a particular operation take half the time, but if you find out that operation only accounts for 2% of your program's execution time, you've spent months for a 1% overall gain; your time could probably have been better spent elsewhere.
Measure first, then identify areas to focus on, then improve them. Without data, all you really have are educated guesses about where your problems lie, and from a lot of experience in development, educated guesses are often wildly off the mark.
Last edited by Cat; 05-03-2013 at 06:02 PM.
You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.
But profilers can also give you very misleading results. Consider a sort routine written with a naive qsort which takes the first element as the pivot. This will give O(N log N) performance, until it is passed an already sorted array, when it degenerates to O(N^2) and uses excessive stack space. But the chance of a random array being sorted in 1 in N!, so we can ignore this case - the computer is far more likely to break. Unless for some reason the test data you ran the profiler on is random, but the data passed by real users has sometimes been sorted. That's not a particularly unlikely scenario.
So you need to do both, do the big O analysis so you know how your algorithm scales in N, know the likely N and the maximum or worst case N, then run the profiler and check that the bottlenecks you've identified are the real ones and there isn't a performance bug in the code.
I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
Visit my website for lots of associated C programming resources.
https://github.com/MalcolmMcLean