Thread: Interview Question

  1. #1
    Registered User
    Join Date
    Feb 2011
    Posts
    6

    Interview Question

    Hey Guys,

    I am new to C board and wish I wasn't since it seems like there is a ton of good information presented here. In any case, I just finished an interview today and I was asked a particular c problem that stumped me and when I thought I was right, I was wrong. But still the interviewers explanation of why he was right does not convince me and that's why to further my understanding and stay true to the C programming language I want to get more experienced C programmer's input as to what is really going on.

    So basically the premise of the question:
    Compare the two code snippets below. What is it that they do and why is one faster than the other?
    Code:
    foo(char *_a, const char *_b, int _count)
    {
    do{
    *_a++ = *_b++;
    }while(--_count > 0);
    }
    Code:
    foo2(char *_a, const char *_b, int _count)
    {
    int n = (_count + 7)/8;
    switch(_count % 8) {
    case 0:  do { *_a++ = *_b++;
    case 7: *_a++ = *_b++;
    case 6: *_a++ = *_b++;
    case 5: *_a++ = *_b++;
    case 4: *_a++ = *_b++;
    case 3: *_a++ = *_b++;
    case 2: *_a++ = *_b++;
    case 1: *_a++ = *_b++;
    }while(--n> 0);
    }
    }
    My reasoning:
    Both functions are essentially doing the same thing: the pointer a is pointing to data stored into data pointed by b.

    The first one is faster because it has a lot less statements and will branch a lot less. Where as the second one deals with a lot of comparisons in the switch statements before it does any of its data manipulation. Therefore the second one implements comparisons/branch instructions and loops before it copies data and the first one just has one while loop without any branching thus is more efficient.

    The interviewers said the second one is much better because it reduces the loop by an order of n/8, where n is the number of loops. I don't really see how this works, I can see how the idea is more efficient, but to me the it seems the second code is working harder for no reason at all and I don't get how it achieves that

    If there is something I am missing here please let me know, cause I still don't see it. Thanks.
    Last edited by nndhawan; 02-17-2011 at 01:30 PM.

  2. #2
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    In a Pentium CPU, we really don't know which one will be faster. Because, while the first does 8 times more iterations, the second does 8 times more comparisons. In worst cases though, when the branch is lost 50% of the time, the latter will be much much slower!
    Devoted my life to programming...

  3. #3
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    Last edited by Bayint Naung; 02-17-2011 at 01:22 PM.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    I agree with Sipher... that second setup is just plain silly.

    Doing modulus is one heck of a lot slower than counting.
    Using case statements is slower still.

    Plus underscores on variable names ir a bad idea, it risks colision with the decoration used in librarys and object files.

    Your interviewer needs to take some C programming lessons.

    EDIT: Thanks Bayint... I haven't seen Duffs Device in years. I thought it was finally gone...
    Last edited by CommonTater; 02-17-2011 at 01:23 PM.

  5. #5
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    The second one is faster because it does not have to decrease count for each byte copied.
    Every eight copy operations it does do one decrement of n but the rest of the time it does no other work than what it has to. Dropping through case statements is zero overhead.

  6. #6
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    ... Inc or Dec are fast, really fast, you know.
    Devoted my life to programming...

  7. #7
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    Yes, i forgot about jump tables.

    But still, the second version creates a long dependance chain that stalls the execution just as much.
    Devoted my life to programming...

  8. #8
    Registered User
    Join Date
    May 2010
    Location
    Naypyidaw
    Posts
    1,314
    The second version may *not* be faster.
    But that illustrates 2 features of C.
    switch fallthrough and
    switch() allows statements;

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Not to mention the risk of underrunning allocated memory if Count isn't a multiple of 8.

  10. #10
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    Quote Originally Posted by CommonTater View Post
    Not to mention the risk of underrunning allocated memory if Count isn't a multiple of 8.
    No, that is encountered by placing the switches backwards.

    EDIT: But, it is valid C? Can cases be nested inside other repeat structures?
    Devoted my life to programming...

  11. #11
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Sipher View Post
    No, that is encountered by placing the switches backwards.

    EDIT: But, it is valid C? Can cases be nested inside other repeat structures?
    Yes, it is valid. As was mentioned, it's called Duff's device, and it used to be a significant optimization. It's just a loop that's been unrolled by hand instead of the compiler. I say "used to be" because most modern compilers will do this optimization for you automatically.

    I wouldn't challenge somebody on it during an interview, but I'd be surprised if a decent compiler didn't produce almost the exact same code for the two versions.

    EDIT: Another use of Duff's device is to create a loop with more than one entry point. It's good to have the option, but in most cases there are other ways to write it to avoid doing this weird stuff.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  12. #12
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Quote Originally Posted by CommonTater View Post
    Doing modulus is one heck of a lot slower than counting.
    Using case statements is slower still.
    Modulo is super fast. Probably 1 clock cycle like most integer operations. It is only done once. Especially since _count % 8 essentially masks off the lower 3 bits. So it doesn't even require a divide if the compiler optimizes. Whereas if you decrement a counter per byte copy it will impact performance that many times... potentially thousands or millions depending on the amount of data.
    Last edited by nonoob; 02-17-2011 at 01:36 PM.

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Of course you can nest cases inside loops of all kinds... even inside each other (that one's done in windows all the time).

  14. #14
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by nonoob View Post
    Modulo is super fast. Probably 1 clock cycle like most integer operations. It is only done once. Whereas if you need to decrement a counter per byte copy it will impact performance that many times... potentially thousands or millions depending on the amount of data.
    inc and dec are 1 clock cycle CPU instructions as well... part of the microcode, whereas modulo has to be calculated.

    A++ pretty much amounts to INC EAX; in assembler.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by nonoob View Post
    Modulo is super fast. Probably 1 clock cycle like most integer operations. It is only done once. Whereas if you need to decrement a counter per byte copy it will impact performance that many times... potentially thousands or millions depending on the amount of data.
    Modulo by a power of two, if the compiler can tell it's a power of two, is super fast. In general, modulo is one of the slowest arithmetic operations if not the slowest.

    This is the stuff people shouldn't spend time worrying about. A good compiler will see this code, say "Hey, this is a memcpy" and just call memcpy() for you. memcpy() itself has all these optimizations already built into it.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. c Interview question.
    By bkankur in forum C Programming
    Replies: 3
    Last Post: 12-10-2009, 07:53 AM
  2. SDL buffer channels question
    By TriKri in forum Game Programming
    Replies: 3
    Last Post: 12-09-2009, 05:52 PM
  3. Newbie question, C #
    By mate222 in forum C# Programming
    Replies: 4
    Last Post: 12-01-2009, 06:24 AM
  4. another do while question
    By kbpsu in forum C++ Programming
    Replies: 3
    Last Post: 03-23-2009, 12:14 PM
  5. Question...
    By TechWins in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 07-28-2003, 09:47 PM