Thread: strcmp question

  1. #1
    Registered User
    Join Date
    Jun 2012
    Posts
    14

    strcmp question

    Here is the explanation for strcmp(), I've highlighted my focus:

    Explanation: Tests the strings for equality. Returns a negative number if string1 is less than string2, returns zero if the two strings are equal, and returns a positive number is string1 is greater than string2

    (I guess I just noticed a typo here: LINK)

    Anyway, I'm just having fun with it and noticing that the values are always -1 or 1 (or zero). Why? I was assuming the actual function would return str1[i] - str2[i] where i is the address where they are unequal (or both null).

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by lowestOne
    Anyway, I'm just having fun with it and noticing that the values are always -1 or 1 (or zero). Why?
    Because your standard library implementation does it that way.

    Quote Originally Posted by lowestOne
    I was assuming the actual function would return str1[i] - str2[i] where i is the address where they are unequal (or both null).
    I think you mean index rather than address. One problem with that subtraction method is that if char is unsigned, you could end up never getting a negative value. Furthermore, there is likely to be a loop, and the way it is written perhaps makes the subtraction not very useful to begin with (unlike say, just comparing two integers).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Jun 2012
    Posts
    14
    The sign of the char makes no difference, a signed int is returned. Furthermore, of course it's in a loop, that's how I get to i in the first place.

    Code:
    int myStrcmp(const char* str1, const char* str2)
    {
        unsigned int i = 0;
        while (str1[i] != NULL && str2[i] != NULL && str1[i] == str2[i]) i++;
        return str1[i] - str2[i];
    }
    Seems like making it a -1 or 1 adds a little bulk, but I would also guess there was a good reason... would love to hear it.
    Last edited by lowestOne; 06-07-2012 at 10:39 PM.

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by lowestOne
    The sign of the char makes no difference, a signed int is returned.
    Yeah, but I was thinking of the subtraction itself. However, since there will be integral promotion, yeah, this is not a problem here. It can be a problem for larger integer types though, or for signed integer types with extreme negative values.

    Quote Originally Posted by lowestOne
    Furthermore, of course it's in a loop, that's how I get to i in the first place.
    I don't say "of course" because I'm being pedantic: this is implementation defined. I can implement strcmp without using an array index, for example. How would you implement strcmp? Show your code.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Jun 2012
    Posts
    14
    That's what I just did. Thinking about it though, I guess it could be a bitwise subtraction. I would highly doubt it's a nested if statement. The only tough part is figuring out the four zeros to make a null character.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by lowestOne
    I would highly doubt it's a nested if statement.
    Maybe it is. Or maybe it is a single if statement after the loop.

    If your standard library implementation's source is available, you could inspect it to see what is the implementation of strcmp.

    EDIT:
    Another thought that comes to mind is that the implementation could be using a set way of comparing two values. So, even though a subtraction is perfectly safe for all values here, they just used the same recipe, e.g., the one I tend to use for "intcmp" would be:
    Code:
    return (x < y) ? -1 : (x > y);
    But other possibilities exist, e.g.,
    Code:
    return (x > y) - (x < y);
    Last edited by laserlight; 06-07-2012 at 11:20 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    Actually, "str1[i] == str2[i] && str1[i] != '\0'" would be enough.
    Devoted my life to programming...

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by lowestOne View Post
    The sign of the char makes no difference, a signed int is returned.
    Signedness of the char type does make a difference to your function. If char is an unsigned type, then str1[i]-str2[i] will always yield a non-negative value (since unsigned integral types support modulo arithmetic operations). That value will then be converted to int. So your function will never return a negative value.

    If char is a signed type, then a subtraction that gives a result out of range for the type yields undefined behaviour.

    It is implementation dependent whether char is signed or unsigned. Either way, there is potential your code will work incorrectly.

    Quote Originally Posted by lowestOne View Post
    Seems like making it a -1 or 1 adds a little bulk, but I would also guess there was a good reason... would love to hear it.
    Assuming that your function is required to work whether char is a signed or unsigned type, how would you ensure that it works?

    If by "adds a little bulk" you are worried about brevity or performance of your code, then you are barking up the wrong tree. If you really care about adding bulk, you wouldn't even introduce any new variables within your function
    Last edited by grumpy; 06-07-2012 at 11:41 PM.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by grumpy
    Signedness of the char type does make a difference to your function. If char is an unsigned type, then str1[i]-str2[i] will always yield a non-negative value (since unsigned integral types support modulo arithmetic operations). That value will then be converted to int. So your function will never return a negative value.
    Okay, I re-examined this, and yeah, the promotion will be to unsigned int, not int as I thought, so the problem remains. That said, depending on how conversions from unsigned int to signed int are implemented, the function might still return a negative value, but a standard library implementation may be designed to work without such an assumption.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by lowestOne View Post
    the values are always -1 or 1 (or zero). Why? I was assuming the actual function would return str1[i] - str2[i] where i is the address where they are unequal (or both null).
    A proper implementation of strcmp would be written in assembly language, presumably using special assembly instructions for doing repetitive operations like this efficiently. Specifying that the return value has to be the difference between the differing characters would put an unnecessary constraint on the implementation.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  11. #11
    Registered User
    Join Date
    Jun 2012
    Posts
    14
    Quote Originally Posted by grumpy View Post
    If by "adds a little bulk" you are worried about brevity or performance of your code, then you are barking up the wrong tree. If you really care about adding bulk, you wouldn't even introduce any new variables within your function
    Here is my deal, I've taken intermediate C++ and JAVA and now I'm taking beginner C. In class we're learning about things like "what is an array" and the scope of variables. Some things are interesting and new, but most of the time I have to entertain myself. So, why not replicate string.h?

    myStrcmp() is just the first function that my returns are different than string.h's strcmp(). I know strcmp() is "better" than mine, and the way that I can think of returning -1 or 1 is bulky.

    I did look at string.h in notepad, but I can't even tell where the function actually happens

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by lowestOne
    myStrcmp() is just the first function that my returns are different than string.h's strcmp(). I know strcmp() is "better" than mine, and the way that I can think of returning -1 or 1 is bulky.
    If you're talking about brevity, I've shown two possible one-liners (in C: the corresponding assembly is a different issue) in post #6.

    Quote Originally Posted by lowestOne
    I did look at string.h in notepad, but I can't even tell where the function actually happens
    That is just the header file, so you're only looking at the function declaration. You need to look at the implementation, which would typically be in a source file.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Registered User
    Join Date
    Jun 2012
    Posts
    14
    The examples in post 6 have the computer performing the comparison twice. Also opens up the question of what the < operator is doing.

    I was able to find this LINK, which does show the source for a strcmp. Pretty similar to mine, but yeah, no new variables.

    Edit: Also noticing that '\0' is used rather than NULL. Figured this would be a #define somewhere. I'm doing all of this in c++ so I'm using <iostream>, which does have a #define NULL. (My c compiler is in the schools UNIX server, and don't really feel like using vi in my spare time )
    Last edited by lowestOne; 06-08-2012 at 12:35 PM.

  14. #14
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by lowestOne View Post
    The examples in post 6 have the computer performing the comparison twice. Also opens up the question of what the < operator is doing.
    You are incorrect if you believe your code does not exhibit the same problems. You are trying to exploit subtraction to avoid comparison, and neglecting the fact that you can only do that because there is a mathematical relationship between equality/inequality comparisons and arithmetic (addition, subtraction) operations.

    There is no law of nature that says a subtraction is more efficient that doing two comparisons. In fact, if you look at the circuits for implementing integral subtractions, you will see they do multiple bitwise operations and comparisons (consider the "borrow" logic of subtraction). So there are more factors in play affecting efficiency than instruction count - the implementation of those instructions by the machine also matter.

    Quote Originally Posted by lowestOne View Post
    Also noticing that '\0' is used rather than NULL. Figured this would be a #define somewhere.
    Use of '\0' is always correct. NULL is more usually employed for pointer comparisons, so your usage of it is bad style. It is defined in standard headers (<stdlib.h> in C, <cstdlib> in C++)

    Quote Originally Posted by lowestOne View Post
    I'm doing all of this in c++ so I'm using <iostream>, which does have a #define NULL.
    You just got lucky. A lot of implementations leak a definition of NULL into headers like <iostream>. But there is no guarantees that all do.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  15. #15
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by lowestOne View Post
    The examples in post 6 have the computer performing the comparison twice. Also opens up the question of what the < operator is doing.
    In addition to what grumpy said, that was actually source for a completely different function. I don't want to challenge you but if you read all of post 6 it has nothing to do with the function you want to write, apart from the fact that laserlight wanted to point out that you can compute the return result of a comparison function on one line. Doing the actual comparison on string data types takes more lines because it is required that you examine, potentially, the whole string: imagine if you called strcmp("helloW", "helloV");

    I was able to find this LINK, which does show the source for a strcmp. Pretty similar to mine, but yeah, no new variables.
    You're wrong: If you are talking about the glibc code, it does introduce local variables. If you're not talking about that code, you're still wrong, because the other sources don't work.

    Edit: Also noticing that '\0' is used rather than NULL. Figured this would be a #define somewhere.
    '\0' is a character constant. It's defined in the ASCII table and in other character sets. NULL is a pointer constant defined in <stddef.h>.

    It's always been a pet peeve of mine that people use NULL all the time to compare anything remotely like 0 no matter what type of thing is being compared. In a fit of rage, if I ever get the chance to write a C90 compiler I think I will make the NULL constant 0x18ec5. Then that won't work anymore.

    You know, even old hardware can compute millions of instructions in a second so even my laziest, unsnazziest attempt at strcmp() would work well.
    Last edited by whiteflags; 06-08-2012 at 04:16 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. strcmp question
    By voidpain() in forum C Programming
    Replies: 5
    Last Post: 08-11-2011, 09:16 PM
  2. question about strcmp
    By lolguy in forum C Programming
    Replies: 3
    Last Post: 03-07-2009, 10:39 PM
  3. Strcmp Question - C
    By Dr.Zoidburg in forum C Programming
    Replies: 8
    Last Post: 01-22-2009, 10:03 AM
  4. A question about strcmp...
    By krsauls in forum C Programming
    Replies: 6
    Last Post: 05-02-2007, 04:39 AM
  5. Question about strcmp
    By readerwhiz in forum C Programming
    Replies: 1
    Last Post: 09-23-2001, 05:18 PM