Thread: Optimize array access

  1. #1
    Registered User
    Join Date
    Nov 2009
    Posts
    24

    Optimize array access

    Hello all. Forgive me if I do not post any real code snippets just yet. This is my first post, so I'm just mostly saying hello for now. Until I get a chance to get a look at the code I am working on, and get back to you..can anyone post any tips on optimizing C code when there are a lot of cache misses?

    I am new to C code, so I'll give the basic run down. The code I am working on speeding up sends array references to a function, does some calculations on them. This obviously has its own set of loops that I think is as fast as it is going to get. Wrapping the call to that function is a loop that is pulling the arrays out of a structure. The setup of trhis is kind of funny to me, as it seems like the structure has been "wrapped" by typedef statements to declare it as a pointer type and avoid the struct reference. The structure's real data is passed back as void from some function, cast to the correct type, and assigned to the pointer type of the structure. I'm not entirely sure i have that interpretation correct.

    ex:
    Code:
    typedef struct { double *AA } typeName
    
    followed by something like
    
    typedef typeName *typeNamep;
    
    /*then declaring an object type */
    
    typeNamep theObject;
    
    theObject = (typeNamep) some_function_returning_data();
    
    
    //then finally setting some arrays up like
    
    loop: 
    
    AA = theObject->AA; // this line has a lot of samples in my profiler, order of  magnitude more than the others
    
    //then passing AA to a function
    
    arrayOperations(AA);
    
    endloop
    So this is all psuedo-codeish but like I said, I am new to C. I feel like somewhere in all those pointers, there is something slowing me down and causing those misses, but not really sure where to start. Reading the profling tool I am using can only tell me so much. TIA

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    To be honest, typedef doesn't do anything useful aside from saving you typing. For example, instead of typing: 'struct foo x;', you type 'mytype x;' All you really gain is the ability to not have to type 'struct'. That's pretty much it.

    Other than that, any time you have an array, which is passed anywhere, you've already avoided passing by value--arrays are always passed "by reference" so to speak. So you don't really need another pointer some place. Also, in your example 'AA' isn't really an array, just to be clear. I'll assume you know that though and spare you the boring details.

    So it comes down to this: Do you need to wrap things up in a structure? If not, just use a pointer to a double for everything and call it good enough.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User
    Join Date
    Nov 2009
    Posts
    24
    theObject = (typeNamep) some_function_returning_data();

    I did just realize i had a slight error with this line. I will get back to you guys with real code when I get a chance (hard to get anything done around holidays). I think the real code actually passes back some other structure which contains a struct, which is what is cast. Anyhow, perhaps I can somehow skip that middle step, and cast directly to double for the individual arrays of the aforementioned structure. Will this kind of thing actually save my runtime?

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    You should generally avoid casting. The only time you should really need it is if you're using a void * or something like that. If you can post a small-ish working example of what you want to do, and explain why you think you need to do whatever it is you're doing, we could be of better help.


    Quzah.
    Hope is the first step on the road to disappointment.

  5. #5
    Registered User
    Join Date
    Nov 2009
    Posts
    24
    It is quite a large code. I will see if I can work up a compilable example, or at least get the exact layout of what is currently happening posted tomorrow. Sleep for now!

  6. #6
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by DerekC View Post
    Anyhow, perhaps I can somehow skip that middle step, and cast directly to double for the individual arrays of the aforementioned structure. Will this kind of thing actually save my runtime?
    Nope, not one bit.

    Just remember, you might think you have some code that is as fast as it can get, but that assumes that you're better at optimisation than every other person here. If that were true, you probably wouldn't be posting . There's a huge chance that this part you speak of can be optimised furthur, possibly by insane amounts in ways you'd never have dreamed of. I would like to consider myself an optimisation guru, but there are cases where I've been outdone.

    There's always the possibility of dropping into asm for your most speed critical parts. Just the other day I converted the entire scanline loop of my quincunx software antialiasing function to a mere 16 assembly instructions, mostly SSE, and that's including all the prefetching commands. You can do some wicked stuff with it eh!

    One thing's for sure, if you haven't examined the generated assembly already to look for things that the compiler needs help with, you aren't even close to where some of us can get you to .
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  7. #7
    Registered User
    Join Date
    Nov 2009
    Posts
    24
    Quite the opposite actually...I KNOW this code is slow. I think I figured out what the one other problem is. There was a typedef defined that points to the structure. Deep down in a set of loops, there is a line that is like

    Code:
    psi[n] = a[n] * *fd_indx->x_m;
    where fd_indx is the typedef that is a pointer to the real structure. the a[n] values used to be defined as fd_indx->a in that example, but was a constant value. I was able to pull all of those constant values out of the loop, but the x_m values (defined as double *x_m in the structure definition) change as the loops progress, so I havent gotten them out just yet. This seems kind of screwy to me, because you essentially have a pointer to a pointer here, right? Like I said, this is happening in the 3rd interior loop. Would I be any faster if I define an array of pointers (all the *fd_indx) to eliminate one level of misdirection? I can do that outside of all of the loops, in my code, I think. If so, how does one go about sizing/typing that array? Is the array of pointers going to be the same type as fd_indx?

    As it turns out, the location of this code is on another network and I won't be able to post it here for compilable examples. I will try to seek "general help" and go from there..thanks in advance

  8. #8
    Registered User
    Join Date
    Nov 2009
    Posts
    24
    I wanted to post up better psuedo code the reflects what I think is the real problem with this code.


    Code:
    typedef struct{
       double *xp;
       double cp; 
    } SV_FDIndx
    
    typedef SV_FDIndx * SV_pFDIndx
    
    SV_pFDIndx fd_phi;
    
    double *a;
    double *xvals;
    
    //assume all mallocs memsets appropriately done
    
    for (int i = 0; i < get_n_nodes(); i++){
       fd_phi = struct_returning_function(i); //this actually returns it from within another, separate structure not relevant here
       a[i] = fd_phi->cp; //"c" for constant
    }
    
    for(int j = 0; j < 500; j++){
       for(int k = 0; k < 3; k++){
          for(int n = 0; n < get_n_nodes(); n++){
              fd_phi = struct_returning_function(n); 
              psi[n] = a[n] * *fd_phi->xp;
          }
        //assume something may modify the value of xp here
       }
    }

    This seems to be what is happening, at least. I am the one who modified it to include the array a[] within that inner loop. it used to be

    Code:
     
    psi[n] = fd_phi->cp * *fd_phi->xp;
    and once I realized those were constants, it worked fine.

    I have also tried to, in my "pre-load" loop at the beginning, define an array of type SV_pFDIndx, and store all the values returned from struct_returning_function(n); so that I could attempt to write

    Code:
    fd_struct_array[i] = struct_returning_function(i);
    *xvals[i] = *fd_struct_array[i]->xp;
    in the preload loop, and

    Code:
           
    for(int n = 0; n < get_n_nodes(); n++){
         fd_phi = struct_returning_function(n); 
         psi[n] = a[n] * xvals[n];
    }
    in the main loop.

    This didnt seem to work for me, but if I did

    Code:
     
    for(int n = 0; n < get_n_nodes(); n++){
         psi[n] = a[n] * *fd_struct_array[i]->xp;
    }
    it did seem to work.

    This got me thinking about declaring that array of pointers, and how I might be doing that wrong/is it any faster anyway?

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I'm sorry, but it doesn't really make sense to attempt to optimise this hypothetical/mocked up code.
    For example, unless get_n_nodes or struct_returning_function have some side effect, the j & k loops have no effect and could be removed.
    Code:
    for(int j = 0; j < 500; j++){
       for(int k = 0; k < 3; k++){
          for(int n = 0; n < get_n_nodes(); n++){
              fd_phi = struct_returning_function(n); 
              psi[n] = a[n] * *fd_phi->xp;
          }
        //assume something may modify the value of xp here
       }
    }
    Clearly those loops can't be removed from your real code, so any attempt to point out what could be optimised is probably going to be in vain.
    There's all sorts of missing things too like what type is psi? If it were say an int array then there's a lot of slow double to int conversions in there that we wont know about and therefore can't help you with.
    The biggest opporitunity for optimisation is likely to be within struct_returning_function for which you have not posted any code. It's inside the innermost loop making it a prime candidate, and as long as you even have the source code to that function (or maybe even just it's documentation!), then it's highly relevant to the optimisation efforts. Same goes for get_n_nodes.
    Have you confirmed whether the get_n_nodes function is actually getting inlined? Are you sure that you need the accuacy or range of doubles as opposed to just floats? That's about all we can do with what is here.

    To be brutally honest, you're cutting off all avenues to anyone suggesting a useful optimisation at every turn, and only leaving us with what you think is left. If you want real optimisation help with it, it is going to have to be the real unedited code, and if the code cannot be shown because you're not legally allowed to do so (as I suspect may be the case), then you're unfortunately on your own. At the very least it needs to be compileable code that you've confirmed has the same performance characteristics and approximate timings as your real code.
    Last edited by iMalc; 12-02-2009 at 02:12 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Nov 2009
    Posts
    24
    Thanks, I will see what I can do about posting a compilable example with the proper characteristics. There is a lot of other work being done in the outer loops, which is why they cannot be removed. I realized last night I forgot to post the psi declaration, but basically all the values are, and need to be doubles.

    I do appreciate the inputs received so far, and I realize it's kind of tricky for anyone looking at it from the outside and not able to see what I can see exactly. I am pretty new to C, but have done some optimizations in fortran before. I will double check a few of the things brought up so far. Thanks again

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 16
    Last Post: 05-29-2009, 07:25 PM
  2. Access array from class
    By parad0x13 in forum C++ Programming
    Replies: 6
    Last Post: 03-18-2009, 08:12 AM
  3. How to access a dynamic array inside a std::set ?
    By IndioDoido in forum C++ Programming
    Replies: 16
    Last Post: 11-04-2007, 05:27 PM
  4. access violation in int array
    By George2 in forum C Programming
    Replies: 2
    Last Post: 08-02-2007, 11:28 PM
  5. Struct *** initialization
    By Saravanan in forum C Programming
    Replies: 20
    Last Post: 10-09-2003, 12:04 PM