I have a 125X125 array, each element is 55 bits. The cache line for the CPU ( Cortex A9) is 32 bits. Is there anyway to do optimizations, like loop tiling that would make multiplication and calculations faster?
Any other ideas that I could try?
I have a 125X125 array, each element is 55 bits. The cache line for the CPU ( Cortex A9) is 32 bits. Is there anyway to do optimizations, like loop tiling that would make multiplication and calculations faster?
Any other ideas that I could try?
What does Google show for optimizing code for the Cortex A9?
Why are you (seemingly) starting optimizations at the cache level? Have all the algorithmic and other optimizations been made already? How much improvement in run time are you looking for?
Can you post an example of what your program is doing, and the code you're using to do it? Do you have a test case (10-30 seconds worth of run time), which can be used for testing? Have you checked the generated assembly code of your program? Have you profiled the program yet?
I have checked and done some implementation based on that.What does Google show for optimizing code for the Cortex A9?
I'm not. All algos have been optimized.Why are you (seemingly) starting optimizations at the cache level?
Depends on how much time and complexity it takes.How much improvement in run time are you looking for?
Yes.Do you have a test case (10-30 seconds worth of run time), which can be used for testing? Have you checked the generated assembly code of your program? Have you profiled the program yet?
I am really just asking a specific question about 55 bytes elements in arrays. If a cache line is 32 bytes then I guess nothing can be done?
Thanks.
I didn't say that. Without seeing the program tested, about all you can say is that you want to flush out the cache as few times as possible, to work with new data.
Another question, can the data be represented using bits, in a more compact form? What is the widest range of the data? Is there a secondary cache, and if so, what size is it? Have you tried multiplication by bit shifting?
There might be some forum members who can rattle off something specific, but I would have to study every detail of the data, the algorithm, and make several tests just to start at optimizing at this level - too much work for me to take on, "just because".
I'm sorry, it's not possible to ask that we simply trust that you have performed adequate higher-level optimisations and can now only do micro-optimisations.
If you were really that skilled, then you probably wouldn't be the one asking a question here.
Or at the least, reputation is to be earned rather than handed out unconditionally.
Instead, you should try to trust us, and take advantage of what additional wisdom we might impart, about that which you may have overlooked.
In other words you'll need to post some relevant code. If we are half as skilled as many of us here think we are, then we should easily be able to come to the same conclusion as you, if it is indeed correct.
My homepage
Advice: Take only as directed - If symptoms persist, please see your debugger
Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"
Run your code through cachegrind and then get back to us.
My latest code had something like 0% miss rate because cachegrind truncates so it read 0.0%. That means it's about as optimized for the cache as it's gonna get.
Try to limit your reads too.
Also, read this : What every programmer should know about memory, Part 1 [LWN.net]
This will literally help you with everything you wanted to do and more. At the bottom they have links to other sections on cache optimization so I think it'd be worth looking over.