Thread: help with text input

  1. #1
    Registered User
    Join Date
    Mar 2007
    Posts
    28

    help with text input

    Allright im working on a new program. I basically need to write something that will read a text file and count the number of alphabetic characters, digits, punctuation characters, and whitespace characters. Then i need to basically report my findings. This is what ive got so far.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
          int alpha = 0;
          int digit = 0;
          int punct = 0;
          int wspace = 0;
          FILE* sp1;
          int input;
          while ((input = fgetc(sp1)) != EOF)
                {
                if (isalpha(input))
                   alpha++;
                else if (isdigit(input))
                   digit++;
                else if (ispunct(input))
                   punct++;
                else if (isspace(input))
                   wspace++;
                 }//while
          printf("alphabetic character: &#37;d \n digits: %d \n punctuations: %d \n whitespace characters: %d \n", alpha, digit, punct, wspace);
          system("PAUSE");
          return 0;
    }
    Im not sure what exactly to do, it compiles but how do i test it? Do i need to make a file called sp1 in the same directory as the source file? Also in my programming book input is of class int (which is why i put it) but what i dont get is if input is an int then how will input pick up punctuation, whitespace and characters?

    Also since i dont know how to test it i dont know if my coding is correct. Feel free to nitpick the coding as well ^^.

  2. #2
    Registered User divineleft's Avatar
    Join Date
    Jul 2006
    Posts
    158
    the variable sp1 is the name of the variable, not the name of the file. to work with the file, you have to use the function "fopen" and subsequently "fclose". here is the modified source code:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>
    
    int main()
    {
    	int alpha = 0;
    	int digit = 0;
    	int punct = 0;
    	int wspace = 0;
    	FILE* sp1;
    	sp1 = fopen ("<insertfilename>", "r");
    	int input;
    	while ((input = fgetc(sp1)) != EOF)
    	{
    		if (isalpha(input))
    			alpha++;
    		else if (isdigit(input))
    			digit++;
    		else if (ispunct(input))
    			punct++;
    		else if (isspace(input))
    			wspace++;
    	}//while
    	printf("alphabetic character: &#37;d \n digits: %d \n punctuations: %d \n whitespace characters: %d \n", alpha, digit, punct, wspace);
    	fclose (sp1);
    	return 0;
    }
    Here is more info

  3. #3
    Registered User Noir's Avatar
    Join Date
    Mar 2007
    Posts
    218
    You need to open the file first. I think you're not developing the program the right way, so I'll do it and post each version so you can see how to incrementally build a program.

    1 - Start with a skeleton program:
    Code:
    #include <stdio.h>
    
    
    int main( void ) {
      return 0;
    }
    2 - The big thing is the source of the input, so I open and print a test file to make sure that everything works:
    Code:
    #include <stdio.h>
    
    
    int main( void ) {
      FILE *fp = fopen( "test.txt", "r" );
    
      if ( fp ) {
        int ch;
    
        while ( ( ch = fgetc( fp ) ) != EOF ) {
          fputc( ch, stdout );
        }
    
        fclose( fp );
      } else {
        perror( "error opening the file" );
      }
    
      return 0;
    }
    3 - Okay, the file opens and the program reads it like I think I want. Now I'll count all of the characters and compare that with the actual file to see if it's accurate:
    Code:
    #include <stdio.h>
    
    
    int main( void ) {
      FILE *fp = fopen( "test.txt", "r" );
    
      if ( fp ) {
        int ch;
        int n = 0;
    
        while ( ( ch = fgetc( fp ) ) != EOF ) {
          ++n;
        }
    
        printf( "total characters: %d\n", n );
        fclose( fp );
      } else {
        perror( "error opening the file" );
      }
    
      return 0;
    }
    4 - Now I'm sure that the file is being input the way I want, so I can start counting character types one at a time, checking each time that the count is right on a test file:
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    
    int main( void ) {
      FILE *fp = fopen( "test.txt", "r" );
    
      if ( fp ) {
        int ch;
        int nalpha = 0;
        int ndigit = 0;
        int npunct = 0;
        int nspace = 0;
    
        while ( ( ch = fgetc( fp ) ) != EOF ) {
          if ( isalpha( ch ) ) {
            ++nalpha;
          } else if ( isdigit( ch ) ) {
            ++ndigit;
          } else if ( ispunct( ch ) ) {
            ++npunct;
          } else if ( isspace( ch ) ) {
            ++nspace;
          }
        }
    
        printf( "alphabetic characters: %d\n", nalpha );
        printf( "digit characters: %d\n", ndigit );
        printf( "punctuation characters: %d\n", npunct );
        printf( "whitespace characters: %d\n", nspace );
        fclose( fp );
      } else {
        perror( "error opening the file" );
      }
    
      return 0;
    }
    5 - Testing is done on producing the output, but the file is still hard coded, and I want the user to pass a file to the program. main is getting kind of long, so I'll refactor the counting code out into a function and test it all again to make sure it still works. Any change, even something tiny, means retesting:
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    
    void process_file( FILE *fp );
    
    
    int main( void ) {
      FILE *fp = fopen( "test.txt", "r" );
    
      if ( fp ) {
        process_file( fp );
        fclose( fp );
      } else {
        perror( "error opening the file" );
      }
    
      return 0;
    }
    
    
    void process_file( FILE *fp ) {
      int ch;
      int nalpha = 0;
      int ndigit = 0;
      int npunct = 0;
      int nspace = 0;
    
      while ( ( ch = fgetc( fp ) ) != EOF ) {
        if ( isalpha( ch ) ) {
          ++nalpha;
        } else if ( isdigit( ch ) ) {
          ++ndigit;
        } else if ( ispunct( ch ) ) {
          ++npunct;
        } else if ( isspace( ch ) ) {
          ++nspace;
        }
      }
    
      printf( "alphabetic characters: %d\n", nalpha );
      printf( "digit characters: %d\n", ndigit );
      printf( "punctuation characters: %d\n", npunct );
      printf( "whitespace characters: %d\n", nspace );
    }
    6 - Now I can add the argument stuff for taking a file as a command line parameter without cluttering main up too much. I make sure my tests touch every code path, so I'll fake error and stuff to make sure that the error cases work like I want:
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    
    void process_file( FILE *fp );
    
    
    int main( int argc, char *argv[] ) {
      if ( argc > 1 ) {
        FILE *fp = fopen( argv[1], "r" );
    
        if ( fp ) {
          process_file( fp );
          fclose( fp );
        } else {
          perror( "error opening the file" );
        }
      } else {
        fprintf( stderr, "usage: prog <filename>\n" );
      }
    
      return 0;
    }
    
    
    void process_file( FILE *fp ) {
      int ch;
      int nalpha = 0;
      int ndigit = 0;
      int npunct = 0;
      int nspace = 0;
    
      while ( ( ch = fgetc( fp ) ) != EOF ) {
        if ( isalpha( ch ) ) {
          ++nalpha;
        } else if ( isdigit( ch ) ) {
          ++ndigit;
        } else if ( ispunct( ch ) ) {
          ++npunct;
        } else if ( isspace( ch ) ) {
          ++nspace;
        }
      }
    
      printf( "alphabetic characters: %d\n", nalpha );
      printf( "digit characters: %d\n", ndigit );
      printf( "punctuation characters: %d\n", npunct );
      printf( "whitespace characters: %d\n", nspace );
    }
    7 - Now I can really crank down on the stability of the code by adding defensive cases:
    Code:
    #include <stdio.h>
    #include <ctype.h>
    
    
    int process_file( FILE *fp );
    
    
    int main( int argc, char *argv[] ) {
      if ( argc > 1 ) {
        FILE *fp = fopen( argv[1], "r" );
    
        if ( fp ) {
          if ( !process_file( fp ) ) {
            perror( "error reading from the file" );
          }
    
          fclose( fp );
        } else {
          perror( "error opening the file" );
        }
      } else {
        fprintf( stderr, "usage: prog <filename>\n" );
      }
    
      return 0;
    }
    
    
    int process_file( FILE *fp ) {
      int ch;
      int nalpha = 0;
      int ndigit = 0;
      int npunct = 0;
      int nspace = 0;
      int rc = 0;
    
      if ( fp != NULL ) {
        while ( ( ch = fgetc( fp ) ) != EOF ) {
          if ( isalpha( ch ) ) {
            ++nalpha;
          } else if ( isdigit( ch ) ) {
            ++ndigit;
          } else if ( ispunct( ch ) ) {
            ++npunct;
          } else if ( isspace( ch ) ) {
            ++nspace;
          }
        }
    
        if ( !ferror( fp ) ) {
          printf( "alphabetic characters: %d\n", nalpha );
          printf( "digit characters: %d\n", ndigit );
          printf( "punctuation characters: %d\n", npunct );
          printf( "whitespace characters: %d\n", nspace );
          rc = 1;
        }
      }
    
      return rc;
    }
    8 - Now the code is solid, but there aren't any comments, so I'll go through it and add comments to places that might be confusing and then call it a day. There aren't many because this is a pretty simple program:
    Code:
    /*
      File - prog.c
      Author - D. Burke (Noir)
      
      Count alphabetic, digit, punctuation, and
      whitespace characters in a user supplied file
    */
    #include <stdio.h>
    #include <ctype.h>
    
    
    int process_file( FILE *fp );
    
    
    int main( int argc, char *argv[] ) {
      if ( argc > 1 ) {
        FILE *fp = fopen( argv[1], "r" );
    
        if ( fp ) {
          if ( !process_file( fp ) ) {
            // failure means a stream error or bad file
            perror( "error reading from the file" );
          }
    
          fclose( fp );
        } else {
          perror( "error opening the file" );
        }
      } else {
        fprintf( stderr, "usage: prog <filename>\n" );
      }
    
      return 0;
    }
    
    
    int process_file( FILE *fp ) {
      int ch;
      int nalpha = 0;
      int ndigit = 0;
      int npunct = 0;
      int nspace = 0;
    
      // assume failure
      int rc = 0;
    
      if ( fp != NULL ) {
        while ( ( ch = fgetc( fp ) ) != EOF ) {
          if ( isalpha( ch ) ) {
            ++nalpha;
          } else if ( isdigit( ch ) ) {
            ++ndigit;
          } else if ( ispunct( ch ) ) {
            ++npunct;
          } else if ( isspace( ch ) ) {
            ++nspace;
          }
        }
    
        if ( !ferror( fp ) ) {
          // only produce output if there are no errors
          printf( "alphabetic characters: %d\n", nalpha );
          printf( "digit characters: %d\n", ndigit );
          printf( "punctuation characters: %d\n", npunct );
          printf( "whitespace characters: %d\n", nspace );
          rc = 1;
        }
      }
    
      return rc;
    }
    That's how you should do it too. Start with a skeleton and build the program up bit by bit, making sure to test after every change. It's okay to change the requirements for testing like when I counted all the characters in the file or just printed the file out. It's okay to backtrack and change your mind on stuff too like when I decided to factor the counting code into a function. It's not as much building a program from a blueprint as it is evolving a program from an idea. You get to change your mind and make it better along the way even after you've finished doing it another way.

  4. #4
    Registered User
    Join Date
    Mar 2007
    Posts
    28
    That was an awesome explaination noir, thanks.

    What about my integer classification question, can anyone answer that?

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    > What about my integer classification question, can anyone answer that?
    > input is of class int
    Your book was just telling you what data type to use for the numerical data.

    Your program will differentiate between whitespace, digits, characters and punctuation if you use the right functions to look for those things. All input to a console program is textual in nature... integers aren't normally sent to the console (except in special cases), so if necessary, you would convert a string to an integer. In a counting program such as this one, it isn't necessary.

  6. #6
    Registered User Noir's Avatar
    Join Date
    Mar 2007
    Posts
    218
    What about my integer classification question, can anyone answer that?
    Oh, I missed that part. My bad. The reason you have to use an int for fgetc() is because of EOF. fgetc() returns either a character in the range of unsigned char or EOF. unsigned char is guaranteed to be positive, and EOF is guaranteed to be negative, so fgetc() can't return an unsigned char. It also can't return a signed char or a bunch of legitimate characters won't be returned right. So fgetc() returns an int because characters are just small integers. int can hold the full range of an unsigned char and the negative value of EOF.

  7. #7
    Registered User
    Join Date
    Mar 2007
    Posts
    28
    Gotcha, after looking at closer, divines code doesnt compile. Looking back at it though i dont see a need to structure it like that, because the way noir has it structured makes alot more sense. The only thing i didnt understand was the addition of the "argument stuff" i dont think im familiar with those commands. I probably wont split my program into two different blocks but well see.
    So how do i test this? Do i need to make a .txt file in the same directory?

  8. #8
    Registered User Noir's Avatar
    Join Date
    Mar 2007
    Posts
    218
    The only thing i didnt understand was the addition of the "argument stuff" i dont think im familiar with those commands.
    You can call your program from the dos prompt and pass it arguments. Those arguments can be used from the parameters to main, argc and argv. If you call your program like this:
    Code:
    C:\>prog.exe testfile.txt
    argc will be 2 and argv will be an array of strings that looks like
    Code:
    {"prog.exe","testfile.txt",NULL}
    and you can get to the test file with argv[1]. The best way is to experiment with different ways of calling your program and with different files to see how your OS handles dos arguments, but the easiest way to get it right the first time is to hard code an absolute path to the file you want:
    Code:
    int main( void ) {
      FILE *fp = fopen( "C:\\worker\\testfiles\\test.txt", "r" );
    Do i need to make a .txt file in the same directory?
    If you use a relative path the file probably has to be in the same directory as the program's exe unless you've added the program to a path environment variable, then the current working directory has to be the same directory that the test file is in.

  9. #9
    Registered User divineleft's Avatar
    Join Date
    Jul 2006
    Posts
    158
    Quote Originally Posted by Alphawaves View Post
    Gotcha, after looking at closer, divines code doesnt compile.
    it does on gcc-4.1.2

    if it's segfaulting it's because the file specified doesn't exist. if you don't want it to segfault, you need to check if the stream is actually open before reading from it. that or specify a file that actually exists

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Choosing a variable based on user text input.
    By Compiling... in forum C++ Programming
    Replies: 7
    Last Post: 11-01-2005, 01:21 AM
  2. Parsing Text File and gathering input variables
    By azamsharp1 in forum C Programming
    Replies: 2
    Last Post: 10-26-2005, 08:43 AM
  3. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  4. mygets
    By Dave_Sinkula in forum C Programming
    Replies: 6
    Last Post: 03-23-2003, 07:23 PM
  5. text input buffer clearing
    By red_Marvin in forum C++ Programming
    Replies: 4
    Last Post: 03-20-2003, 03:17 PM