Thread: Understanding CPU architecture to write better code

  1. #1
    Registered User
    Join Date
    Dec 2015
    Posts
    112

    Understanding CPU architecture to write better code

    Does anyone have some good articles on understanding the underlying computer architecture with the end goal being to write faster code.

    For example, how do you make the best use of cache? How do you get functions to run in registers? How to write code to make use sequential memory locations?

    I'll be using intel processors for the forseeable future.

    https://people.freebsd.org/~lstewart.../cpumemory.pdf
    Found this link.
    Last edited by CodeSlapper; 03-11-2016 at 03:36 PM.

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Unless your intention is to write something in assembly or somesuch, the best answer is to do nothing and know almost nothing. The compiler has switches for things it can do on specific architectures, and it is smart enough to turn code written simply and cleanly into optimized code. I mean, you can tell the compiler to use SSE, you can tell the compiler you're compiling on multicore processors, you can control some floating point stuff, etc. etc. Read your compiler's docs.

    For example, how do you make the best use of cache?
    When you use threads, cache misses are an important thing to worry about, otherwise not so much. concurrency - What is a cache hit and a cache miss? Why context-switching would cause cache miss? - Stack Overflow
    How do you get functions to run in registers?
    There is nothing you can do in code to help this anymore. The keyword register is ignored.
    How to write code to make use sequential memory locations?
    Make use of arrays, or libraries with guarantees. Additionally you can read in your compiler docs about structures and ways to pack them. Locality of reference is not hard to write. Malloc and new[] will guarantee that the memory it returns is contiguous.

  3. #3
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    There are many optimizations level, from very high level to very low level like assembly. Optimizing at assembly level is your last resort, when nothing else can be done. If your end goal is to write faster code, no matter if you use C++, C# or Java, first focus on following good programming practices and idioms. For example, instead of decompiling code, ensure that you pass to functions "const std::string&" instead of "std::string". Or, when you intend to return a container, make use of move semantics if possible. Or, when iterating through a container, use iterators instead of indices. Calling function twice in the same expression => call once and store result in a local variable. Examples could go on.

    Of course, the above examples are not premature optimizations - they are rules of thumb. Given solutions A and B which both cost same, why would one want to pick that of a lower quality?

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    C++ advice on the C forum?

  5. #5
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    I said that it does not matter what language is used. (I admit that I could give more C-like examples though.)

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Iterators and pointers make it harder for compilers to optimize code, just so you know. It makes it harder for the compiler to figure out what's going on. But still, it is good advice. Only if you have figured out that it's causing issues and you need to optimize it, should you consider other options.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #7
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    Quote Originally Posted by CodeSlapper View Post
    For example, how do you make the best use of cache?
    By trying to keep the inner loops of algorithms confined to memory spaces that fit within the cache. For multi-threaded programs, each core has it's own L1 cache, and usually it's own L2 cache. L3 cache and main memory are shared between cores.

    Quote Originally Posted by CodeSlapper View Post
    How do you get functions to run in registers?
    Try to limit the number of highly used variables to what the compiler can use registers for when it optimizes the code. For X86 running in 64 bit mode, there are 16 registers, and knowing that 16 registers are available, I wrote a 4-way bottom up merge sort that uses 10 working pointers, and 1 integer (for run size). The pointers are used without indexing or offsets, which provides a slight improvement in performance. If sorting a large array of 32 bit or 64 bit integers, it's about 15% faster than a 2-way bottom up merge sort, and as fast or slightly faster than quicksort.

    Note - a 4-way merge sort involves the same total number of operations as 2-way merge, except it's 1.5 x number of compares and 0.5 x number of moves. Since each compare has already read the data, the moves are essentially just writes. The compares are a bit more cache friendly than the moves (writes), which explains the small ~ 15% increase in performance.
    Last edited by rcgldr; 03-12-2016 at 02:50 PM.

  8. #8
    Registered User
    Join Date
    Dec 2015
    Posts
    112
    Quote Originally Posted by rcgldr View Post
    By trying to keep the inner loops of algorithms confined to memory spaces that fit within the cache. For multi-threaded programs, each core has it's own L1 cache, and usually it's own L2 cache. L3 cache and main memory are shared between cores.

    Try to limit the number of highly used variables to what the compiler can use registers for when it optimizes the code. For X86 running in 64 bit mode, there are 16 registers, and knowing that 16 registers are available, I wrote a 4-way bottom up merge sort that uses 10 working pointers, and 1 integer (for run size). The pointers are used without indexing or offsets, which provides a slight improvement in performance. If sorting a large array of 32 bit or 64 bit integers, it's about 15% faster than a 2-way bottom up merge sort, and as fast or slightly faster than quicksort.

    Note - a 4-way merge sort involves the same total number of operations as 2-way merge, except it's 1.5 x number of compares and 0.5 x number of moves. Since each compare has already read the data, the moves are essentially just writes. The compares are a bit more cache friendly than the moves (writes), which explains the small ~ 15% increase in performance.
    This is the sort of thing I'm looking for. How did you know to try and use the L1 cache as much as possible? Did you learn this in a course, paper, website, etc?

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I learned all this stuff by actually taking some computer architecture courses.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    Registered User
    Join Date
    Jul 2008
    Posts
    67
    Hi,

    this is a good read:
    Write Great Code, Volume 1: Understanding the Machine
    Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level

    And also is Code Complete.

    Of course Agner Fog's Software optimization resources should have a place in your favourites ...

    Cheers
    Greenhorn__
    Last edited by Greenhorn__; 03-13-2016 at 09:47 AM.

  11. #11
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> Iterators and pointers make it harder for compilers to optimize code, just so you know.
    On the plus side, compiler optimizers are always getting better. Things like whole-program-optimization allow for additional insight into how memory locations will actually be accessed, which allows optimizations that prior generation optimizers could not assume. I would also add that using good const-correctness can also help out the optimizer. There is also the restrict keyword which helps as well. There are also compiler specific tools that can help branch prediction (eg. in the Linux kernel).

    You'll also want to google "premature optimization".

    gg

  12. #12
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    Quote Originally Posted by Elysia View Post
    Iterators and pointers make it harder for compilers to optimize code, just so you know. It makes it harder for the compiler to figure out what's going on. But still, it is good advice. Only if you have figured out that it's causing issues and you need to optimize it, should you consider other options.
    It depends what containers we are talking about and what implementations. For std::vector, the difference will be minimal (if any). However, as for the other containers iterators generally do better than accessing by index (or key in the case of map and unordered_map), because they reduce the number of indirections which must be made otherwise (e.g.: std::deque), and indirections is something compiler cannot really optimize out. Also, templates (iterators) have almost no impact on machine code generation - these are different (generally independent) compilation phases.
    Last edited by kmdv; 03-13-2016 at 12:45 PM.

  13. #13
    Registered User
    Join Date
    Mar 2016
    Posts
    2
    think low level and write high level
    nice strategy.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Code architecture: A call for advice
    By Median in forum Tech Board
    Replies: 36
    Last Post: 10-15-2014, 08:45 AM
  2. Code architecture: A call for advice
    By Median in forum C++ Programming
    Replies: 0
    Last Post: 10-06-2014, 04:00 AM
  3. Return Code Architecture
    By juuuugroid in forum C Programming
    Replies: 4
    Last Post: 02-28-2014, 02:51 PM
  4. Need help to understanding the code
    By zbonzbon in forum C Programming
    Replies: 10
    Last Post: 02-13-2011, 10:23 PM
  5. Sharing variables and code architecture
    By JackR in forum C++ Programming
    Replies: 2
    Last Post: 11-30-2007, 04:29 PM