Quote Originally Posted by lehe View Post
Code:
Vignette_size =   Vignette::width * Vignette::height ; //24 * 24. Vignette is a class.
for(every possible tmplt){// pseduo-code, since the real code is too long. tmplt is of type Vignette*
   generate the tmplt // pseduo-code, since the real code is too long
   for(int k = 0; k < Vignette_size ; k++) { 
      if(tmplt->content[k] == 0){ // tmplt->content is a 1D int array of 24*24
          for(int i = 0; i < nb_vignettes; i++){
              double tmp = _xi[vignettes[i].content[k]]; // _xi is a 1D double array of size 256. vignettes  is of type Vignette*
              log_prob_img_t[i] += tmp;}
      }
    }
}
What's showing up here on the [vignettes[i].content[k]] ... line is that your access pattern on that line suggests that inverting the order of the for-loops could improve cache coherence. However it also looks likely that it would in some ways be faster this way around. It's a shame you didn't post the real code for the for loop. Doesn't matter if it's a little long. As long as I dont have to scroll my screen it's fine.
You're also missing all optimisation opportunities with not showing "generate the tmplt". But knowing how big Vignette_size, nb_vignettes are and how many tmplts there typically are would tell us whether it's worth looking at optimising that part more too.