Thread: memset vs. for loop

  1. #1
    Registered User
    Join Date
    May 2011
    Posts
    13

    memset vs. for loop

    Hey,

    I was wondering if you guys could explain in detail the time-space differences between memset and a for loop. For example, consider the two pieces of code:

    Code:
    char arr[10];
    for (int i = 0; i < 10; i++)
    arr[i] = '\0';

    Code:
    char arr[10];
    memset( arr, '\0', sizeof(arr) );
    How much faster is the second version versus the first version? I know you save space since you don't need to allocate anything for i or get it's value, but how exactly does memset work? Does it go through the bytes specified and put in a 0, or does it do it more efficiently?

    Thanks.

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Ach1lles View Post
    Hey,

    I was wondering if you guys could explain in detail the time-space differences between memset and a for loop. For example, consider the two pieces of code:

    Code:
    char arr[10];
    for (int i = 0; i < 10; i++)
    arr[i] = '\0';

    Code:
    char arr[10];
    memset( arr, '\0', sizeof(arr) );
    How much faster is the second version versus the first version? I know you save space since you don't need to allocate anything for i or get it's value, but how exactly does memset work? Does it go through the bytes specified and put in a 0, or does it do it more efficiently?

    Thanks.
    It is very likely that memset internally uses a loop... so there's really not that much difference.

    This would be only marginally more efficient than a for() loop...
    Code:
    void *my_memset(void *dst, int c, size_t num)
      { while( num--)
           *(dst+num) = c;
         return dst; }
    But you can also do this to initize your array...
    Code:
    char arr[10] = {0};
    So it is actually cleared automatically...

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by CommonTater
    This would be only marginally more efficient than a for() loop...
    Actually, it would result in a compile error since you can neither do pointer arithmetic on nor dereference a pointer to void.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by laserlight View Post
    Actually, it would result in a compile error since you can neither do pointer arithmetic on nor dereference a pointer to void.
    Oh crap! You're right... My bad.

    Thanks for pointing that out...

  5. #5
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Tater's right about the initializer, it's a nice, easy, clean way, and probably calls memset under the hood. He's also right about memset using a loop internally (it has to). But, in many implementations, memset may be optimized to set entire words at once, instead of individual bytes. This would be a roughly 4x increase on a 32-bit system for a sufficiently large piece of memory. memset should never be slower than your loop, so there's no speed advantage to your loop, at best it would be a tie. Add to that the fact that using memset is good "code reuse", i.e. using a well-tested library instead of reinventing the wheel, and the clear winner is memset.

    Tater's my_memset has a slight problem of needing to cast dst (to a char *) before dereferencing it.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > How much faster is the second version versus the first version?
    Says who?
    Some compilers may inline a memset, and others may call a function.

    For such a small amount of memory, a function call overhead would be significant.

    > I know you save space since you don't need to allocate anything for i or get it's value,
    Why do you think i would be allocated any space?
    An efficient compiler would just leave it in a register.
    Plus there is the whole "stack frame to call memset, plus whatever local variables it needs".

    > but how exactly does memset work?
    Exactly how you wrote it.

    > Does it go through the bytes specified and put in a 0, or does it do it more efficiently?
    It might do.
    Not all memset implementations take the same approach.

    > So it is actually cleared automatically...
    Well by the compiler, possibly calling memset, automatically anyway.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Quote Originally Posted by laserlight View Post
    Actually, it would result in a compile error since you can neither do pointer arithmetic on nor dereference a pointer to void.
    Even working around that, there's no telling which would be more efficient in the general case. Compilers are tricky. They can unroll a loop of known size, making the loop in the OP more efficient than the code posted by CommonTater. Or replace the loop with a call to memset. Or inline memset when it's called as a function. Or just put the variable in BSS so it's initialized by the loader and generate no code for it at all.

    I know you save space since you don't need to allocate anything for i or get it's value
    Nope, there's nothing that says any space will be used for i at all. It's perfectly legal for the compiler to unroll the loop completely so i disappears.

    memset should never be slower than your loop, so there's no speed advantage to your loop, at best it would be a tie.
    Could be if the compiler can unroll the loop into 10 byte (or 2 word & 1 half word) stores and that's quicker than the function calling and other overhead associated with a general purpose memcpy() routine.

    Short answer - it's impossible to tell which code option in this thread is more efficient. More importantly - does it matter? Only way to know is to measure it.
    Last edited by KCfromNC; 08-02-2011 at 11:01 AM.

  8. #8
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Here's what you really need to know:
    Using memset you are telling it "what" to do, and using a for-loop, you are telling it "how" to do it.
    The general rule is that if you tell the compiler "what" to do and let it figure out the details of "how", then it will pick the most efficient way that it knows.
    If you tell it "how" to do the job, then it will follow your every command, even if it isn't the best way to do it.

    When the goal is to set all bytes in a consecutive range of memory to a certain value, memset is the best choice, because it gives the compiler the ability to pick the best implementation.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  9. #9
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    I was under the impression that memset and others were implemented in assembly for utmost speed. I realize there are "source" files for library functions but they are mostly for illustration purposes. A well established compiler would have tweaked the standard functions because compiled code is often benchmarked and their performance reflects the quality of the compiler / company. I'd be very shocked and surprised if a programmer's loop can outperform memset.

  10. #10
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by nonoob View Post
    I was under the impression that memset and others were implemented in assembly for utmost speed. I realize there are "source" files for library functions but they are mostly for illustration purposes. A well established compiler would have tweaked the standard functions because compiled code is often benchmarked and their performance reflects the quality of the compiler / company. I'd be very shocked and surprised if a programmer's loop can outperform memset.
    What gives you the idea that assembler code is always faster than C code?
    That hasn't actually been true since the days of interpreters.

    Think about what a compiler does... It tranlates C source in to object code... the same object code an Assembler creates... At that level C and ASM code is indistinguishable. If it were otherwise, the linker couldn't build the executable...

    Yes there are some cases where ASM is faster than C because of human optimizations using obscure op-codes the compiler does not. But as time marches on those cases are becoming fewer and a lot further between.

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    If you initialize a block of memory ahead of time, you can directly assign it if you cast it correctly. You can do it with variable length arrays too, provided you at least do one standard call on your reset block. It's even easier if you have a fixed size array.
    Code:
    static struct cheat
    {
        char array[ YOURSIZE ];
    } omghax;
    ...
    *(struct cheat *) yourarray = omghax;
    You can even do this with VLAs, but you can't use static to memset it the first time for you, and you can't use declaration-time-initialization so you have to actually memset it once (so this really only is helpful if you need to reset multiple times).

    You could do fun things like increment the pointer once if the byte count was odd, after setting the first byte to 0, then while it was a multiple of say 8, cast to an unsigned long long (or whatever is eight on your system), and increment the pointer as a ULL, then when it's a multiple of four, then a multiple of 2...

    There are all sorts of interesting ways to do the same task in C.


    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Registered User
    Join Date
    Mar 2009
    Posts
    344
    Quote Originally Posted by nonoob View Post
    I was under the impression that memset and others were implemented in assembly for utmost speed. I realize there are "source" files for library functions but they are mostly for illustration purposes. A well established compiler would have tweaked the standard functions because compiled code is often benchmarked and their performance reflects the quality of the compiler / company. I'd be very shocked and surprised if a programmer's loop can outperform memset.
    On the one hand, optimized general purpose memset - one which has to deal with different behavior in caching as the size changes, has to worry about the various combinations of cache and page table alignment of source and data arrays, has to worry about which instructions are implemented on various processors (SSE, SSE2, SSE3, SSE4, AVX, etc), needs to handle prefetching without causing cache and TLB misses, and so on to get the best average performance in all cases. On the other hand, you have a simple loop to initialize 10 characters to zero written in C with no function calling overhead. Which is going to be more efficient? Probably neither in this case - performance doesn't matter for something so insignificant. But even in general it's hard to answer.

    Plus in many compilers a simple loop like the one in the original example can be replaced with a few loads and stores right in line with the program - so you get the benefit of assembler code without the difficulty.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memset
    By DickArmy in forum C Programming
    Replies: 3
    Last Post: 06-30-2009, 04:59 PM
  2. how to use memset for int?
    By manav in forum C Programming
    Replies: 4
    Last Post: 04-12-2008, 07:15 AM
  3. memset
    By l2u in forum C Programming
    Replies: 3
    Last Post: 07-03-2006, 04:16 PM
  4. memset, memcpy...
    By chrismiceli in forum C Programming
    Replies: 2
    Last Post: 08-31-2003, 11:42 PM
  5. memset()
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 08-11-2002, 09:34 PM