Thread: Scanset issue in C language

  1. #1
    Registered User
    Join Date
    Oct 2013
    Posts
    24

    Scanset issue in C language

    Hello Everyone!!


    While I was trying some online C language quiz, I was stuck in problem where the constraints was to take strings that only have 'R' an 'S' letter. I searched it online and got some suitable resource about the scanset Scansets in C - GeeksforGeeks. However, I have not got complete answer as I need to take input inside the while statement.


    Here is the snippet:

    Code:
    #if 1
    
    int main()
    {
        int loop=10;
        u8 str[122];
        while(loop--)
        {
            scanf("%[^\n]s", str);
            printf("%s\n",str);
        }
        return 0;
    }
    #endif // 1


    This code just take input only as first time, and print the same on the output window for 10 times.
    I am not sure how to take string after each display of printf function.


    Any help or hint would be really appreciable.

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Your source has incorrect information. I don't know where people got the idea that scanset was an %s thing, it's not, it's its own thing.

    scanf(3): input format conversion - Linux man page
    [

    Matches a nonempty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.
    If that makes sense, then you ought to try "%121[RS]" instead. "%[^\n]s" would match a sequence of characters up to \n, and then expect an 's' in the input (and almost certainly fail there, since the character next in the input is known - '\n').

  3. #3
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Is the Linux man page giving an example that is implementation-defined behaviour?

    From the C11 Standard (the others are the same):

    If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

    Based on that I would expect [0-9] or [^0-9] to be implementation-defined even though I've never seen it not work as intended.

  4. #4
    Registered User
    Join Date
    Oct 2013
    Posts
    24
    Quote Originally Posted by whiteflags View Post
    Your source has incorrect information. I don't know where people got the idea that scanset was an %s thing, it's not, it's its own thing.

    scanf(3): input format conversion - Linux man page


    If that makes sense, then you ought to try "%121[RS]" instead. "%[^\n]s" would match a sequence of characters up to \n, and then expect an 's' in the input (and almost certainly fail there, since the character next in the input is known - '\n').
    I tried "%121[RS]" and still after scan it for first time, it prints the same string for the next 10 times.
    Few more points I tried:
    Lets say I entered "RSRSY", in that case , the output was "RSRS", 10 times. But I when I entered, "YRSRS", the output is "Q".
    Well you have also mention '121' which I don't understand why...

    P.S.

    Additionally, I have check this reference std::scanf, std::fscanf, std::sscanf - cppreference.com and it was quite useful. @whiteflags @Hodor

    Modifying your answer scanf("%121[RS]s", str); with scanf(" %[RS]s", str); works.
    Explanation is:

    Whitespace characters: any single whitespace character in the format string consumes all available consecutive whitespace characters from the input (determined as if by calling isspace in a loop). Note that there is no difference between "\n", " ", "\t\t", or other whitespace in the format string.
    Still I don't understand 2 things:
    1) Why you added 121? If I am correct, is it the limit until scanf take the input?
    2) Why irrelevant values when entered string ""YRSRS"".
    Last edited by shaswat; 12-06-2017 at 02:23 AM.

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Quote Originally Posted by shaswat View Post
    I tried "%121[RS]" and still after scan it for first time, it prints the same string for the next 10 times.
    Few more points I tried:
    Lets say I entered "RSRSY", in that case , the output was "RSRS", 10 times. But I when I entered, "YRSRS", the output is "Q".
    Well you have also mention '121' which I don't understand why...
    Yes there is a reason your computer shows you that. When scanf() encounters input that does not match the format, the corresponding variables you passed it will not be changed. This may result in some, or all of the variables not being changed, depending on where the failure was. Additionally, the input that caused the mismatch will be left in the buffer for other input functions to read.

    I think I can show you exactly what I mean with a small program:
    Code:
    C:\Users\jk\Desktop>more scan.c
    #include <stdio.h>
    
    int main(void)
    {
       char leftover;
       char input[122];
       int loop = 10;
    
       while (loop--) {
          input[0] = '\0'; /* clear string */
          scanf(" %121[RS]", input);
          printf("\"%s\"\n", input);
       }
    
       printf("*** This was left in the input buffer: ");
       while ((leftover = getchar()) != '\n' && leftover != EOF) {
          putchar(leftover);
       }
    
       putchar('\n');
       return 0;
    }
    
    C:\Users\jk\Desktop>gcc scan.c -o scan
    
    C:\Users\jk\Desktop>scan
    RSRSY
    "RSRS"
    ""
    ""
    ""
    ""
    ""
    ""
    ""
    ""
    ""
    *** This was left in the input buffer: Y
    Hopefully, with this, you can see what scanf() actually read, versus what changes were actually made, and what was left in the input buffer. Similarly, we can run this program with different inputs, and see when scanf() actually reads something.
    Code:
    C:\Users\jk\Desktop>scan
    RSSSRSR
    "RSSSRSR"
    SRSR
    "SRSR"
    RSS
    "RSS"
    YRSSY
    ""
    ""
    ""
    ""
    ""
    ""
    ""
    *** This was left in the input buffer: YRSSY
    Certain inputs will cause scanf() to become stuck, and it won't work again until you clear the "bad" input away. There is a longer explanation of how scanf() is actually a complex function to use located here. I have heard people on here say, seriously, that people shouldn't use scanf() until they've implemented it themselves. I'm not telling you what to do, but please be aware of the substantial work using scanf() correctly involves. There is a reason that reference you linked, and the one that I linked, is so long.

    Modifying your answer scanf("%121[RS]s", str); with scanf(" %[RS]s", str); works.
    Note that this still has the problem of the trailing 's' character that I mentioned earlier. Please try to correct this misunderstanding, it may bite you in the future. Scanf is strict about following a format, and an s by itself means a literal s in the input.

    Still I don't understand 2 things:
    1) Why you added 121? If I am correct, is it the limit until scanf take the input?
    It is the maximum number of characters that scanf() will try to put into the string. scanf("%s", str); is as bad as gets(). The input in that string doesn't have a limit; it will happily step over the room you have allocated.

    2) Why irrelevant values when entered string ""YRSRS"".
    That I don't know, maybe Q was what was stored in the uninitialized string input before. As I said, the variable is left unchanged on failure, so whatever that Q was, it was there before scanf() was called. Initialize your variables.
    Last edited by whiteflags; 12-06-2017 at 05:45 AM.

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Quote Originally Posted by Hodor View Post
    Is the Linux man page giving an example that is implementation-defined behaviour?

    From the C11 Standard (the others are the same):

    If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

    Based on that I would expect [0-9] or [^0-9] to be implementation-defined even though I've never seen it not work as intended.
    Yes, it is, but that would be normal for a reference for a specific platform, like man is for Linux. I cannot be held responsible whether the information presented about Linux's implementation is true, mind you. It's beyond my scope of knowledge.

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    You should also be checking the return result of scanf as well.
    Code:
    if ( scanf("%121[RS]", str) == 1 ) 
    {
        // success
    }
    else
    {
        // throw away some input before
        // calling the same scanf again
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Quote Originally Posted by Hodor View Post
    Is the Linux man page giving an example that is implementation-defined behaviour?

    From the C11 Standard (the others are the same):

    If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.

    Based on that I would expect [0-9] or [^0-9] to be implementation-defined even though I've never seen it not work as intended.
    I suspect it's implementation-defined because a character range may depend on the character set used by the implementation. It works "as intended" with ASCII because letters and digits are contiguous in that character set, but they're not necessarily contiguous in some character sets like EBCDIC.

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    The C standard specifically states that 0 to 9 are contiguous.
    5.2.1 Character sets
    ...
    In both the source and execution basic character sets, the
    value of each character after 0 in the above list of decimal digits shall be one greater than
    the value of the previous.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Quote Originally Posted by Salem View Post
    The C standard specifically states that 0 to 9 are contiguous.
    Oh, nice! I know characters 0 to 9 are contiguous in both ASCII and EBCDIC, but I didn't know that the C standard requires that of the implementation. That makes me wonder why the standard doesn't at least define the character range "[0-9]", even if it leaves other character ranges implementation-defined. Maybe support for character ranges adds too much overhead for some implementations? (Think embedded or other small systems.)

  11. #11
    Banned
    Join Date
    Aug 2017
    Posts
    861
    Quote Originally Posted by shaswat View Post
    Hello Everyone!!


    While I was trying some online C language quiz, I was stuck in problem where the constraints was to take strings that only have 'R' an 'S' letter. I searched it online and got some suitable resource about the scanset Scansets in C - GeeksforGeeks. However, I have not got complete answer as I need to take input inside the while statement.


    Here is the snippet:

    Code:
    #if 1
    
    int main()
    {
        int loop=10;
        u8 str[122];
        while(loop--)
        {
            scanf("%[^\n]s", str);
            printf("%s\n",str);
        }
        return 0;
    }
    #endif // 1


    This code just take input only as first time, and print the same on the output window for 10 times.
    I am not sure how to take string after each display of printf function.


    Any help or hint would be really appreciable.
    you're trying to find the cap R or cap S to then do with what you will?
    Code:
    userx@slackwhere:~/bin
    $ ./only_CAP_R_or_S                 
    Enter some test string with or without
    Cap S or Cap R
    Hello how R you
    found R
    userx@slackwhere:~/bin
    $ ./only_CAP_R_or_S
    Enter some test string with or without
    Cap S or Cap R
    hello what are you a Sucker?
    found S
    $ ./only_CAP_R_or_S
    Enter some test string with or without
    Cap S or Cap R
    Toys R Us
    found R
    some like that so you know you got one?
    Last edited by userxbw; 12-06-2017 at 03:31 PM.

  12. #12
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Quote Originally Posted by Salem View Post
    The C standard specifically states that 0 to 9 are contiguous.
    I was aware that 0 to 9 are guaranteed to be contiguous, however this doesn't (in my mind) clear up the range "issue" in the format specifier in scanf because all it says is that it's implementation-defined and no exceptions are specified. The 0 to 9 being contiguous is in another part of the standard unrelated to scanf.

  13. #13
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    To me it's just another area where the standard is purposefully vague in order to make things easier to implement.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Importance of english language in programming language
    By Lea Pi in forum General Discussions
    Replies: 10
    Last Post: 04-17-2015, 07:43 AM
  2. fscanf() scanset question
    By audifanatic518 in forum C Programming
    Replies: 9
    Last Post: 02-26-2012, 11:21 AM
  3. bandwidth issue / network issue with wireless device communication
    By vlrk in forum Networking/Device Communication
    Replies: 0
    Last Post: 07-05-2010, 11:52 PM
  4. scanf() scanset
    By bradleyd in forum C Programming
    Replies: 21
    Last Post: 04-20-2007, 06:42 PM
  5. What's the Difference Between a Programming Language and a Scripting Language?
    By Krak in forum A Brief History of Cprogramming.com
    Replies: 23
    Last Post: 07-15-2005, 04:46 PM

Tags for this Thread