Question related to scanf's implementation.

This is a discussion on Question related to scanf's implementation. within the C Programming forums, part of the General Programming Boards category; Code: scanf("%d", &my_integer); How does scanf convert the characters from stdin into an int (in this case, my integer)? I ...

  1. #1
    Registered User
    Join Date
    Nov 2006
    Location
    Coimbra, Portugal
    Posts
    64

    Question related to scanf's implementation.

    Code:
    scanf("%d", &my_integer);
    How does scanf convert the characters from stdin into an int (in this case, my integer)? I know it's possible to do this:

    Code:
    //Assume that read_a_string_from(stdin) exists.
    //Also assume that it contains nothing but a positive number; this code won't work properly for negative numbers.
    char* my_string = read_a_string_from(stdin);
    int result = 0;
    for (i = 0; i < strlen(my_string); ++i) {
      int current_digit = (int)my_string[i];
      result = result*10 + current_digit;
    }
    return result;
    Is this what scanf actually does? If not, how is it implemented to handle integers?

    Thank you in advance... And I hope this makes sense.
    Last edited by Mr_Miguel; 09-19-2007 at 12:38 PM. Reason: Corrected code according to post #3
    Name: Miguel Martins
    Date of birth: 14th August 1987

    "He who hesitates is lost."

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,434
    You could always grab the source code for the GNU GlibC implementation.

    But you're partly on the right track.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,046
    Code:
    result *= 10 + current_digit;
    That's the same as
    Code:
    result = result * (10 + current_digit);
    But you probably want
    Code:
    result = (result * 10) + current_digit;
    So perhaps
    Code:
    result *= 10;
    result += current_digit;
    I'm sure if you searched around you'd find lots of code that converts strings to numbers. Here's a clone of atof() I wrote recently: atof: string to float
    Note that I did not test that code, and that instead of converting integers it converts floating-point numbers in general form (I think that's what they're called . . .) and not 1.3E-4. It's not the best example, I just posted it because it's fresh in my memory.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  4. #4
    Registered User
    Join Date
    Nov 2006
    Location
    Coimbra, Portugal
    Posts
    64
    Quote Originally Posted by dwks View Post
    Code:
    result = result * (10 + current_digit);
    Right. My mistake. Fixed.

    Quote Originally Posted by dwks View Post
    I'm sure if you searched around you'd find lots of code that converts strings to numbers.
    I'm aware of that. However, I was interested in scanf's code in particular. It's just out of curiosity.

    I'm currently looking at glibc. scanf's code seems pretty complicated. But I'll do my best to figure it out. Thanks.
    Name: Miguel Martins
    Date of birth: 14th August 1987

    "He who hesitates is lost."

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Unfortunately, the implementation of GLIBC's scanf isn't entirely straightforward (well, scanf itself is very trivial, as it just does some stuff to get hold of the arguements, and then calls the generic version with stdin as one of the parameters - but the generic code is definitely not straight forward, not particularly helped by the fact that the code is full of #if-statements that excludes some bits from being compiled in, depending on which version of scanf is intended (e.g. widechar or "narrow" char, as well as various flags to support varying options of compilers and OS architectures etc).

    The source is here:
    http://sourceware.org/cgi-bin/cvsweb...&cvsroot=glibc

    And the bit that actually converts the number for %d, etc, looks like this:
    Code:
                {
                  if (flags & NUMBER_SIGNED)
                    num.l = __strtol_internal (wp, &tw, base, flags & GROUP);
                  else
                    num.ul = __strtoul_internal (wp, &tw, base, flags & GROUP);
                }
    But that is many many lines after the bit of code that says "is this %d", and the code in between is related to the input and conversion of the number too.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Nov 2006
    Location
    Coimbra, Portugal
    Posts
    64
    Quote Originally Posted by matsp View Post
    Code:
                {
                  if (flags & NUMBER_SIGNED)
                    num.l = __strtol_internal (wp, &tw, base, flags & GROUP);
                  else
                    num.ul = __strtoul_internal (wp, &tw, base, flags & GROUP);
                }
    Mats
    Which leads to strtol's implementation, which I can find in glibc's strtol_l.c
    Their implementation seems to be (somewhat) equivalent to mine:

    Code:
    use_long:
      i *= (unsigned LONG int) base;
      i += c;
      //Then they test for errors and whatnot... Then this:
      return negative ? -i : i;
    Assuming, of course, base is 10, which is what the %d conversion expects. Of course, they also have code to handle other bases.
    Name: Miguel Martins
    Date of birth: 14th August 1987

    "He who hesitates is lost."

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Mr_Miguel View Post
    Which leads to strtol's implementation, which I can find in glibc's strtol_l.c
    Their implementation seems to be (somewhat) equivalent to mine:

    Code:
    use_long:
      i *= (unsigned LONG int) base;
      i += c;
      //Then they test for errors and whatnot... Then this:
      return negative ? -i : i;
    Assuming, of course, base is 10, which is what the %d conversion expects. Of course, they also have code to handle other bases.
    Yes, there's not that many ways that you can convert a string to an integer - there are some more convoluted or stupid ways, but essentially, walk over the string, keep multiplying by base and add the current digit, until there are no more digits.

    Float isn't much worse, but after the decimal separator, you need to keep track of a multiplier and divide the multiplier by 10 for each digit, and add "current digit times multiplier" (there are other ways, but that's roughly what it amounts to).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,235
    The implementation of C standard functions is obviously implementation specific. It only makes sense to ask about specific implementations.

    For all we know, scanf() converts digits to integers by writing them on a card and passing it to a midget in a cedar chest, who computes the answer and delivers it via carrier pigeon.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by brewbuck View Post
    The implementation of C standard functions is obviously implementation specific. It only makes sense to ask about specific implementations.

    For all we know, scanf() converts digits to integers by writing them on a card and passing it to a midget in a cedar chest, who computes the answer and delivers it via carrier pigeon.
    Yes, that's entirely true. I'm sure that Microsofts C-library code is also quite complex, but it's most likely also quite different from the glibc code, and both of those are different from the C-library supplied by Borland for their Turbo-C compiler, of course.

    Edit: and as far as I'm aware, none of the three C-libraries use cards, midgets or cedar chests, but one can never really know for sure.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Nov 2006
    Location
    Coimbra, Portugal
    Posts
    64
    I know what you both mean, but one might question: isn't the algorithm to convert digits the same in all of them, despite the differences in code? Or do those code differences actually lead to different algorithms?

    Personally, I don't know any other algorithm than the one that has been seen on this thread. Regardless, glibc's implementation has somewhat satisfied my curiosity.
    Name: Miguel Martins
    Date of birth: 14th August 1987

    "He who hesitates is lost."

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Mr_Miguel View Post
    I know what you both mean, but one might question: isn't the algorithm to convert digits the same in all of them, despite the differences in code? Or do those code differences actually lead to different algorithms?

    Personally, I don't know any other algorithm than the one that has been seen on this thread. Regardless, glibc's implementation has somewhat satisfied my curiosity.
    I would expect that it's fairly similar, but perhaps not identical. There aren't that many different ways to convert a string into a number (excluding methods involving midgets and cedar chests or similar exotic solutions). You could of course find a scanf() where the number conversion isn't implemented as an external function, but as an inlined piece of conversion code, but the essence of it will be "multiply by base, add digit".

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Yacc question related to %type
    By g_p in forum Tech Board
    Replies: 0
    Last Post: 05-31-2007, 09:55 AM
  2. Question related to getpid and getppid
    By g_p in forum C Programming
    Replies: 4
    Last Post: 12-18-2006, 10:35 AM
  3. opengl DC question
    By SAMSAM in forum Game Programming
    Replies: 6
    Last Post: 02-26-2003, 08:22 PM
  4. Very simple question, problem in my Code.
    By Vber in forum C Programming
    Replies: 7
    Last Post: 11-16-2002, 02:57 PM
  5. keystrokes related question..
    By AmiTsur in forum C Programming
    Replies: 3
    Last Post: 11-10-2002, 08:22 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21