Thread: SSE math

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    486

    SSE math

    I have heard that this stuff is a great way to optimize repeated math optimizations, and I have been googling around for a tutorial, but almost all of them dive right into technical jargon I have never heard before, as well as all being based in C++, which I have no experience in at all. Does anybody know of a very basic tutorial where I can start from the bottom up?

    And yes, I am googling more as we speak, I just thought that maybe someone here knew of sometihng useful already

  2. #2
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    SSE is an instruction set, so you cannot use C++ to explicitly take use of the new instructions. You can either use assembly, or rely on the compiler to optimize your code for you. SSE has been around since 1999, so it's a good bet that compilers are already making use of these instructions.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Indeed, some compilers can generate SSE code. In the past, I have tested this out with not-so-great results. But that's not to say that the latest versions aren't working better.

    Generally, using inline assembler is the best way to achieve good results, in my experience. Most compilers generate rather awkward results.

    If you don't kind locking down your code to Intel only, you could use Intels compiler - it is probably the best for x86. Unfortunately, it means "can not run on AMD" executable files. It is also an expensive compiler, but I believe you can "try it out" for something like 30 days for free. It should be enough to show if it's improving your code or not.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    There is a standard set of Intel Intrinsics for SSE. This is supported by VC, gcc, and the Intel compiler. It's a step up from writing assembler by hand, but you still remain in control of the exact instruction sequences.

    See:

    How to Vectorize Code Using Intrinsics on 32-Bit IntelĀ® Architecture - Intel® Software Network
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by brewbuck View Post
    There is a standard set of Intel Intrinsics for SSE. This is supported by VC, gcc, and the Intel compiler. It's a step up from writing assembler by hand, but you still remain in control of the exact instruction sequences.

    See:

    How to Vectorize Code Using Intrinsics on 32-Bit IntelĀ® Architecture - Intel® Software Network
    And does gcc generate sensible code when using these. Past versions of MS compilers produce rather AWFUL code (mostly because it kept using the same registers for every single intrinsic function, and thus spilling/filling to/from memory for every line of intrinsic. But it may be better in gcc, and it may be better in 2008 - I haven't tried.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Yeah, I'm really new - been using C less than two months, so I have no idea what gcc does. I do know that turning on the -O3 flag cuts the runtime by almost an order of magnitude, so gcc must be doing something right ^_^

  7. #7
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Or you are doing something wrong (very pessimized code). I typically only get 2x-3x speed up with -O3 compared to -O0. But of course, it can differ greatly between different scenarios, and fast code (unoptimized) isn't necessarily good. Just make sure you are not pessimizing prematurely.

    And last time I checked, the Intel compiler is free (as in beer) on Linux, and commercial (with a timed trial) on Windows (and requires Visual Studio). Strange eh?

    been using C less than two months, so I have no idea what gcc does
    To find that out, you will probably need to use the -S switch to get gcc to output assembly, and read that (and of course, you need to learn assembly language before that). Leave that to matsp for now, just focus on C .
    Last edited by cyberfish; 06-17-2009 at 08:25 AM.

  8. #8
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Haha no the code isn't too bad, the runtime all comes down to square root calculations (up to a billion of them per run) so gcc might be replacing the sqrt with something more efficient, which makes a major difference. An order of magnitude may have been an exxageration - it's actually about 5-6x speedup.

  9. #9
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by cyberfish View Post
    Or you are doing something wrong (very pessimized code). I typically only get 2x-3x speed up with -O3 compared to -O0. But of course, it can differ greatly between different scenarios,
    Just for reference, with the testing I did in this thread last week on (eg) this basic task:
    Code:
    int check_digit (char c) {
    	return ((c>='0') && (c<='9'));
    }
    The difference between -O3 and no flag was > 400%. Testing ctype.h's isdigit() the difference was even more severe (1200%).
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #10
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    Yeah I saw that one. I would dearly love to get a good understanding of gcc's inner workings, but I suspect that it is still a fair bit beyond my current experience

  11. #11
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by KBriggs View Post
    Yeah I saw that one. I would dearly love to get a good understanding of gcc's inner workings, but I suspect that it is still a fair bit beyond my current experience
    I just started reading "Compilers: Principles, Techniques, & Tools". I should be thru that in a few years, and ready to start hacking the source big time
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  12. #12
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by KBriggs View Post
    Yeah I saw that one. I would dearly love to get a good understanding of gcc's inner workings, but I suspect that it is still a fair bit beyond my current experience
    If you understand x86 assembly language (even a little bit), you can always look at the code gcc is producing. Instead of the -c compile switch, use -S. gcc will produce a .s assembly file instead of a .o object file.

    Adding the --fverbose-asm flag will insert some commentary into the code to make it (a little) easier to figure out what's what. With optimizations turned all the way up, the code can be difficult to map back to the source, but it's possible.

    I do it a lot -- if you need guidance, ask.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  13. #13
    Registered User
    Join Date
    Jun 2009
    Posts
    486
    I don't know any assembly at all, so it probably wouldn't be much help at the moment - but if you know of a good book or online tutorial where I could pick up the basics, let me know.

  14. #14
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by KBriggs View Post
    I don't know any assembly at all,
    Niether do I, but I think it is fairly simple (then gets complicated fast). It is all sets of instructions like

    MOV R1, varx

    where "MOV" is the instruction, R1 is a destination register, then memory variables to apply this to (aside: I guess those are in the symbol table??)

    so it could be:

    ADD R1, R2, varx

    which would be like, add R2+varx and store the result in R1.

    Start with that and google. I'm sure there are some basic tutorials around...
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Math
    By knightjp in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 04-01-2009, 05:36 PM
  2. Help with C++ Math
    By aonic in forum C++ Programming
    Replies: 4
    Last Post: 01-29-2005, 04:40 AM
  3. Basic Math Problem. Undefined Math Functions
    By gsoft in forum C Programming
    Replies: 1
    Last Post: 12-28-2004, 03:14 AM
  4. Math Header?
    By Rune Hunter in forum C++ Programming
    Replies: 26
    Last Post: 09-17-2004, 06:39 AM
  5. toughest math course
    By axon in forum A Brief History of Cprogramming.com
    Replies: 12
    Last Post: 10-28-2003, 10:06 PM