Thread: scanf "%as" (dynamic char arrays)

  1. #1
    Registered User
    Join Date
    Jan 2010
    Posts
    16

    scanf "%as" (dynamic char arrays)

    I have tried many things to get this function to work on multiple Windows OSs (XP, Vista, Windows 7). It works fine in Linux. I have updated my compilers to the latest versions. I have tried MinGW and cygwin both and both can't seem to take a file redirected through stdin. I get no error mind you. I get no warnings even with -Wall.

    This is my code that does not work:
    Code:
    typedef struct {
      int wordSizeMax;
      int *hashCount;
    } StructCountArray;
    
    StructCountArray buildSCA(void) {
      StructCountArray SCA;
      SCA.wordSizeMax = 10;
      char *wordFoundInFile;
      SCA.hashCount = calloc(SCA.wordSizeMax, sizeof(*SCA.hashCount));
      int wordLength;
    
    
      while(scanf("%as", &wordFoundInFile) != EOF) {
        if((wordLength = strlen(wordFoundInFile)) > SCA.wordSizeMax) {
          SCA = expandArraySize(SCA, wordLength);
        }
        SCA.hashCount[wordLength]++;
        free(wordFoundInFile);
      }
      return SCA;
    }
    This is my code that DOES work but does not implement dynamic char arrays
    Code:
    typedef struct {
      int wordSizeMax;
      int *hashCount;
    } StructCountArray;
    
    StructCountArray buildSCA(void) {
      StructCountArray SCA;
      SCA.wordSizeMax = 10;
      char wordFoundInFile[1024];
      SCA.hashCount = calloc(SCA.wordSizeMax, sizeof(*SCA.hashCount));
      int wordLength;
    
    
      while(scanf("%s", wordFoundInFile) != EOF) {
        if((wordLength = strlen(wordFoundInFile)) > SCA.wordSizeMax) {
          SCA = expandArraySize(SCA, wordLength);
        }
        SCA.hashCount[wordLength]++;
        free(wordFoundInFile);
      }
      return SCA;
    }

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by brightmatter View Post
    This is my code that does not work:

    This is my code that DOES work but does not implement dynamic char arrays
    NEITHER of those involves allocating for a dynamic array. Obviously you've just been getting lucky with one particular compiler.

    Code:
      char *wordFoundInFile;
      SCA.hashCount = calloc(SCA.wordSizeMax, sizeof(*SCA.hashCount));
      int wordLength;
    
      while(scanf("%as", &wordFoundInFile) != EOF) {
    Simply calling scanf() will not allocate space for wordFoundInFile. That's a buffer overflow.

    A normative way to approach this is to use a sufficiently large buffer in your loop (such as 1024), then strlen that and malloc a separate pointer.

    Also, this may compile:
    Code:
    char wordFoundInFile[1024];
    free(wordFoundInFile);
    But it will not run, whatever you want to claim.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Jan 2010
    Posts
    412
    Quote Originally Posted by MK27 View Post
    NEITHER of those involves allocating for a dynamic array. Obviously you've just been getting lucky with one particular compiler.
    scanf(3): input format conversion - Linux man page
    An optional 'a' character. This is used with string conversions, and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialised before the call). The caller should subsequently free(3) this buffer when it is no longer required. This is a GNU extension; C99 employs the 'a' character as a conversion specifier (and it can also be used as such in the GNU implementation).

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Hey, wow I did not know about that and failed to notice and investigate the "a". Sorry brightmatter.

    Anyway, GNU extensions require GNU libraries -- which I would presume using gcc on windows still means MS libraries.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Registered User
    Join Date
    Jan 2010
    Posts
    16
    So what you are saying is it is not possible to have this useful little gem while running on windows. Is this why Windows has so many buffer overflows? Isn't there any way to use scanf safely? By safe, I mean not having to enter a static buffer value before receiving the string.

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    No, it means it's not possible to do it on a non-GNU compiler + libraries? I don't know if any exist on Windows.
    Anyway, yes, it is possible. I don't know about scanf, but fgets works. You just need a little trickery. You pass to fgets the size of your buffer, and if the buffer isn't sufficiently large, then the last character won't be a '\n'. So you allocate memory memory and repeat.

    The reason there are so many overflows was because C wasn't and isn't designed for security in mind. It gives you the gun, but it lets you use the gun as you please.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by brightmatter View Post
    So what you are saying is it is not possible to have this useful little gem while running on windows. Is this why Windows has so many buffer overflows? Isn't there any way to use scanf safely? By safe, I mean not having to enter a static buffer value before receiving the string.
    You can limit the input length this way:
    Code:
    scanf("%1023s", string);
    That will read a max of 1023 characters in, to prevent overflow. It should work everywhere.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Jan 2010
    Posts
    16
    So my options are to corrupt(truncates) the data or to have an overflow?? That just seems wrong somehow. Nothing against you, but more against the idea of a hard coded limit like that. Honestly, this function is not making a great first impression. It just seems broken.

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Then perhaps you shouldn't be using C in the first place.
    I agree with you that a lot of C functions are broken. A lot of them unsecure. A lot of them truncates data instead of just allocating the necessary amount...
    But there is nothing you can do except suck it up and live it. Or switch language.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by brightmatter View Post
    So my options are to corrupt(truncates) the data or to have an overflow??
    No. If I set the limit at 12, and the data is 20 bytes, there will be 8 bytes left in the input buffer. That will be the next thing read.

    So if you are worried, just write a function that reads in a loop with a set buffer size and mallocs storage. If the buffer was completely full (test with strlen), iterate again and realloc that data until you have the data you want.

    A little tedious, but in reality:
    1) this is almost never the situation. Use a 4 or 8 or (heck) 64 Kb buffer.
    2) if you write a simple generic library function to do that, you only have to do it once.

    If you are dealing with strings that could be anywhere from a few characters to megabytes long redirected to stdin, scanf() is the wrong function to be using. Generally, scanf() is most useful for actual keyboard input.

    Used properly, C does not have any limitations.
    Last edited by MK27; 03-05-2010 at 05:31 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Edit:
    Nevermind - MK27 has said everything I was going to say anyway
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. I need help :(
    By ramenen in forum C++ Programming
    Replies: 1
    Last Post: 02-17-2010, 04:31 PM
  2. The UNIX System Interface
    By rrc55 in forum C Programming
    Replies: 1
    Last Post: 10-20-2009, 05:56 PM
  3. Char Help! "Packing " bits to a signle unsigned char
    By xxrexdartxx in forum C Programming
    Replies: 7
    Last Post: 10-11-2009, 04:45 AM
  4. Simple Proxy
    By Lina in forum C Programming
    Replies: 0
    Last Post: 04-01-2007, 12:36 PM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM