optimizing long if structure

**GanglyLamb** · 09-01-2010

Is there a way to optimize a long series of if structures? For example the code might look like:

Code:

if(x>0 && x<= 0.1) 
  t = calculation;
if(x>0.1 && x<=0.2)
  t = calculation2;

So basically for each value of x there is only 1 valid if structure. x can take a value between 0 and 1 and has a uniform probability distribution (or should have, as I am using a mersenne twister for random sampling).
I have around 100 to 1000 if structures depending how accurate I want to model a certain mathematical function using a cubic spline.

So far I have tried the following to no avail:
* move the if series into a function and return from the first if structure that yields true, as every if structure has the same probability of being valid there is no ordering of the if series to make the function return faster.
* a switch - case structure would be an ideal solution but is not possible as I am not looking for exact values but values between 2 other values...

Note that this piece of code is inside a function which is being called around 10^7 times (which is why I am looking for any optimization even though it is very small). It is a piece of code for a raytracer in case you are wondering why this function is being called so many times.

If anyone has any suggestions you are more then welcome to share them with me.

**KBriggs** · 09-01-2010

If there is only one valid branch for each x then use else if(...). That way it will stop checking conditions after it finds the right one.

**GanglyLamb** · 09-01-2010

Which works fine as long as I have less then ~150 if - else if's after each other. With 200 of those structures I get a compiler error:

compiler limit : blocks nested too deeply

Using if else compared to only if's took ~20% less execution time though but is not really an option due to the above compiler error (unless i can somehow set this limit manually).

**Elysia** · 09-01-2010

The easiest solution might be to try another compiler if you can, since this limit is specific to the compiler.

**Bayint Naung** · 09-01-2010

Is it always x > n && x <= n+0.1 format?

**GanglyLamb** · 09-01-2010

The easiest solution might be to try another compiler if you can, since this limit is specific to the compiler.

That would be an option, but not a nice solution. If the mathematical function which I am trying to approximate by using a cubic spline changes very rapidly then I need more samples. The amount of if structures equals the amount of samples that I need. Right now I can approximate well enough using 200 samples, but who is to say that I will not encounter functions for which 1000 samples will be just enough?

Is it always x > n && x <= n+0.1 format?

Yes, except for the last case which is of the format x>=... && x<=...

**Syndacate** · 09-01-2010

This might be a shot in the dark here - it'll take more memory....butttt it'll work. I think.

Concept:
- Use a pairing data-structure - pair indexes of .1 mononumeric increments (ie. 0.0, 0.1, 0.2, etc.) to function pointers
- Throw 1000 of these structures in an array
- Round your x val
- Find it in the list, call the corresponding function pointer.

At worst case you're looking at O(N) lookup time, since they're decimals you can't have the constant access you'd have normally with an array.

It will use a bit more RAM to store the lookup table, though. I guess this is the RAM more, CPU cycles less side of the coin.

Something like:

Code:

typedef struct
{
   float index;
   double (*calculation)(int);
} Pair;

int main()
{
   blah blah blah, do some stuff

   double x_round = round(x); // math to round x goes in round funct

   Pair lookup_table[1000];
   (function*) funct_ptr_array[1000];

   // initialize the funct ptr array as you would have in the if blocks for the
   // calculation aspects.  You would have had to do this anyway, it simply would
   // have been hardcoded into the if statements instead.
   lookup_table[0] = (funct*);
   lookup_table[1] = (funct*);
   ...
   lookup_table[999] = (funct*);

   // Load up the lookup table indexes
   for (int i = 0, double g = 0 ; i < 1000 ; ++i, g = g + .1)
   {
      lookup_table[i].index = g;
   }

   // Find its entry in your pseudo-map
   for (int i = 0 ; i < 1000 ; ++i)
   {
      if (lookup_table[i].index == x_round)
      {
         lookup_table[i].calculation(x); // however the hell you call a funct ptr
         break;
      }
   }
}

Few things...

One, as you can probably tell, I'm extremely foggy with how to define/declare/call function pointers - I don't use them too often

Secondly, you can almost call the above psuedo-code, it's not an exact drop in copy/paste, just meant to give you an example/idea of wtf I was talking about.

Thirdly, this will use more memory (not too much, though, probably about 8KB or so for the lookup table, and will probably take a bit longer than the previous proposition to initialize. On the flipside, it should find the correct function to call rather quickly. This is weighted in favor of CPU cycles, not memory.

Hope that helps you some, and if not, hope it gives you or somebody else an idea

.

PS:
Fourthly, it might be easier to think of this as a C++ std::map<float, (funct*)>. A std::<std:

air<float, (funct*)> > would be even better, as it uses a red/black binary search tree, which will have O(N*log(N)) to build and O(log(N)) to lookup. Though you'd have to write that in C++, or write the data structures..or I'm sure somebody already made and posted them online somewhere.. I'm basically just stealing the functionality out of the former, without the RB BST...

PPS:
Fifthly, I feel rounding and storing the result in a temp value is key here, as it will allow you to use any sort of lookup table or similar concepts, without the need for double comparisons everywhere. I mean at worst after the lookup table is created you're looking at O(N) complexity for each sample which isn't much compared to a gargantuan if statement.

Sixthly, as for initializing the function ptr list - you can easily write a script in any scripting lang such as PHP, Python, Ruby, or Perl which can pump that list into your source file in no time at all....you can do it obv in any lang, but it'd be easiest in a scripting language.

**Salem** · 09-01-2010

Code:

if(x>0 && x<= 0.1) 
  t = calculation;
if(x>0.1 && x<=0.2)
  t = calculation2;

And what is the relationship between the two calculations?

If the tests were collapsed, how would that affect your calculations?

**iMalc** · 09-01-2010

This really isn't all that hard to figure out:

Code:

switch ((int)(_nextafter(x, -999)*10)) {
    case 0: t = calculation; break;
    case 1: t = calculation2; break;
}

The _nextafter bit can be replaced with just x if you aren't that interested in the boundary cases afterall.

**brewbuck** · 09-01-2010

1. Why do you consider 10^7 to be a big number? It's practically microscopic
2. Do you seriously think a few floating point compares are going to be at all noticable compared to computing a cubic spline?
3. What are you smoking?

**GanglyLamb** · 09-01-2010

Originally Posted by iMalc

This really isn't all that hard to figure out:

Code:

switch ((int)(_nextafter(x, -999)*10)) {
    case 0: t = calculation; break;
    case 1: t = calculation2; break;
}

The _nextafter bit can be replaced with just x if you aren't that interested in the boundary cases afterall.

Except that this is not possible the values which are being compared to are not fixed. I am actually generating the C code in matlab, which I later plug into a dll which is being fed to a ray tracer. This raytracer is used to determine for example light distributions of sources (measured using near field gonio photometers) from which the rays start and interact with a certain geometry (reflectors) of which I have measured the BSDF (or to be more specific BRDF as I am only dealing with reflections caused by surface scattering).

Anyhow for example a distribution of a BRDF might look like Q(theta_s,phi_s,theta_i,phi_i,lambda)=cos(theta_i) In this case we assume a isotropic surface (hence phi_s is not present and takes a random value between 0 and 2*Pi), theta_i goes from 0 to Pi/2. Actually its a surface which scatters in a lambertian way.

Anyhow, with this function it is quite easy to analytically compute the cumulative function, normalise this and inverse it. But as soon as the iniital function gets more complicated we can no longer analytically compute the inverse of the normalised cumulative function. This is why I am numerically integrating the initial function, normalising this and then flipping the x - y values, thus giving the ICDF (inverse cumulative distribution function). This ICDF takes a uniformly distributed random variable between 0 and 1 (because of the normalization) and returns a value between 0 and Pi/2 representing the theta_s value according to the initial distribution function.

Now to get to where I need the if structure. Suppose I numerically calculate the ICDF using 1000 samples, then I could either store all these samples in a table and use some kind of interpolation. Right now we are assuming that we have an isotropic function, when we are dealing with data from real measurements we will of course no longer have isotropic functions. Furthermore the BRDF value will also depend on the incoming angles phi_i and theta_i along with lambda. So when dealing with real data we will have to do more then one interpolation to get our result. This is why I am fitting a cubic spline to the ICDF in which I can plug my variables and get proper results for phi_s and theta_s.

So to get to the point where I can say why I can not use fixed intervals is because I want to sample the ICDF not on the x-axis which goes from 0 to 1 using equidistant samples, I want to have equidistant samples on the y-axis. If I would not do this then my final results in the raytrace would not be accurate (it would not follow the initial distribution function) as there would be places in the ICDF which are undersampled.

An example of how part of the structure looks like right now (beware it doesnt look too nice as it is being generated by matlab):

Code:

if(x>=0 && x<0.00034947) 
t = ((double)4986107.6052 * pow(x-0.0000000000e+000,3)) + ((double)-1.9448905812e+004 * pow(x-0.0000000000e+000,2)) + ((double)2.8774584596e+001*(x-0.0000000000e+000)) +0.0000000000e+000;
else if(x>=0.00034947 && x<0.0013835) 
t = ((double)4.9861076052e+006 * pow(x-3.4947388154e-004,3)) + ((double)-1.4221362676e+004 * pow(x-3.4947388154e-004,2)) + ((double)1.7007705175e+001*(x-3.4947388154e-004)) +7.8934488784e-003;
else if(x>=0.0013835 && x<0.0031001) 
t = ((double)-3.8408721761e+005 * pow(x-1.3835180461e-003,3)) + ((double)1.2462037430e+003 * pow(x-1.3835180461e-003,2)) + ((double)3.5908177960e+000*(x-1.3835180461e-003)) +1.5786897757e-002;
else if(x>=0.0031001 && x<0.005496) 
t = ((double)9.9996059239e+004 * pow(x-3.1001367606e-003,3)) + ((double)-7.3179017428e+002 * pow(x-3.1001367606e-003,2)) + ((double)4.4738697550e+000*(x-3.1001367606e-003)) +2.3680346635e-002;
else if(x>=0.005496 && x<0.0085666) 
t = .....

//somewhere in the middle of the structure the if structures look like
else if(x>=0.69757 && x<0.71036) 
t = ((double)1.3754739284e+000 * pow(x-6.9756733940e-001,3)) + ((double)5.3877102783e-001 * pow(x-6.9756733940e-001,2)) + ((double)6.0992720295e-001*(x-6.9756733940e-001)) +4.5782003495e-001;
else if(x>=0.71036 && x<0.72285) 
t = ((double)1.5444259520e+000 * pow(x-7.1035969012e-001,3)) + ((double)5.9155766253e-001 * pow(x-7.1035969012e-001,2)) + ((double)6.2438676399e-001*(x-7.1035969012e-001)) +4.6571348382e-001;
else if(x>=0.72285 && x<0.73503) 
t = ....

I am also attaching some images to claridy things visually. The initial distribution is being showed (cos(theta)*sin(theta) - as we are in spherical coordinates and need to take the Jacobian into account, the radius is 1 by the way).
You can also see the ICDF (which is already the cubic spline) which exists by taking 10 and 100 samples from the numerically computed cumulative normalised distribution function, inverse these values and fit 10 and 100 cubic splines. Also attached are the histograms of the values which are returned when feeding 10^6 uniformly distributed random variables between 0 and 1. It is clear that 10 samples are not enough, 100 seem to work, even though when used in a real raytracing simulation it seems that it is still not enough (250 samples do the trick though).

**GanglyLamb** · 09-01-2010

Originally Posted by brewbuck

1. Why do you consider 10^7 to be a big number? It's practically microscopic
2. Do you seriously think a few floating point compares are going to be at all noticable compared to computing a cubic spline?
3. What are you smoking?

10^7 is big when you do not only evaluate this function, we are taking into account fresnel reflections, phase changes, scattering etc... Our simulations now take around a couple of weeks to finish, granted that is not using 10^7 rays but 10^20 rays (and this is only for one wavelength, simulating the whole visible spectrum with an interval of 5nm means 80 times this amount of rays).

Secondly I am not computing the cubic spline when each ray hits a different surface. The cubic spline is fitted in advance to measure data, as a surface structure and scatter properties never change in time (at least not during simulations). Also it does matter the floating compares, when using 500 if structures compared to 100 makes a huge difference. Simulations using 10^6 rays take 20 seconds using 100 structures while 400 structures take around 50 seconds. Note that no ray splitting is allowed during these test simulations so exactly 10^6 rays are being traced, during normal simulations we split a ray untill the current flux is around 0.005% of the initial flux of the ray. With highly specular materials where we have a total integrated scatter of 0.97 there is little absorption so you can calculate how many rays will be traced before the simulation will be finished.

3)I smoke different things, but that depends on the occasion.

Thread: optimizing long if structure

Thread Tools

Search Thread

Display

optimizing long if structure

Similar Threads

Socket Programming Help

error: request for member ‘err’ in something not a structure or union

error: double free or corruption

Dev-cpp - compiler options

HUGE fps jump