C Board  

Go Back   C Board > General Programming Boards > C Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 04-07-2007, 12:28 PM   #1
Registered User
 
Join Date: Mar 2007
Posts: 28
help with text input

Allright im working on a new program. I basically need to write something that will read a text file and count the number of alphabetic characters, digits, punctuation characters, and whitespace characters. Then i need to basically report my findings. This is what ive got so far.

Code:
#include <stdio.h>
#include <stdlib.h>

int main()
{
      int alpha = 0;
      int digit = 0;
      int punct = 0;
      int wspace = 0;
      FILE* sp1;
      int input;
      while ((input = fgetc(sp1)) != EOF)
            {
            if (isalpha(input))
               alpha++;
            else if (isdigit(input))
               digit++;
            else if (ispunct(input))
               punct++;
            else if (isspace(input))
               wspace++;
             }//while
      printf("alphabetic character: %d \n digits: %d \n punctuations: %d \n whitespace characters: %d \n", alpha, digit, punct, wspace);
      system("PAUSE");
      return 0;
}
Im not sure what exactly to do, it compiles but how do i test it? Do i need to make a file called sp1 in the same directory as the source file? Also in my programming book input is of class int (which is why i put it) but what i dont get is if input is an int then how will input pick up punctuation, whitespace and characters?

Also since i dont know how to test it i dont know if my coding is correct. Feel free to nitpick the coding as well ^^.
Alphawaves is offline   Reply With Quote
Old 04-07-2007, 12:38 PM   #2
Registered User
 
divineleft's Avatar
 
Join Date: Jul 2006
Posts: 158
the variable sp1 is the name of the variable, not the name of the file. to work with the file, you have to use the function "fopen" and subsequently "fclose". here is the modified source code:

Code:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main()
{
	int alpha = 0;
	int digit = 0;
	int punct = 0;
	int wspace = 0;
	FILE* sp1;
	sp1 = fopen ("<insertfilename>", "r");
	int input;
	while ((input = fgetc(sp1)) != EOF)
	{
		if (isalpha(input))
			alpha++;
		else if (isdigit(input))
			digit++;
		else if (ispunct(input))
			punct++;
		else if (isspace(input))
			wspace++;
	}//while
	printf("alphabetic character: %d \n digits: %d \n punctuations: %d \n whitespace characters: %d \n", alpha, digit, punct, wspace);
	fclose (sp1);
	return 0;
}
Here is more info
divineleft is offline   Reply With Quote
Old 04-07-2007, 01:16 PM   #3
Registered User
 
Noir's Avatar
 
Join Date: Mar 2007
Posts: 218
You need to open the file first. I think you're not developing the program the right way, so I'll do it and post each version so you can see how to incrementally build a program.

1 - Start with a skeleton program:
Code:
#include <stdio.h>


int main( void ) {
  return 0;
}
2 - The big thing is the source of the input, so I open and print a test file to make sure that everything works:
Code:
#include <stdio.h>


int main( void ) {
  FILE *fp = fopen( "test.txt", "r" );

  if ( fp ) {
    int ch;

    while ( ( ch = fgetc( fp ) ) != EOF ) {
      fputc( ch, stdout );
    }

    fclose( fp );
  } else {
    perror( "error opening the file" );
  }

  return 0;
}
3 - Okay, the file opens and the program reads it like I think I want. Now I'll count all of the characters and compare that with the actual file to see if it's accurate:
Code:
#include <stdio.h>


int main( void ) {
  FILE *fp = fopen( "test.txt", "r" );

  if ( fp ) {
    int ch;
    int n = 0;

    while ( ( ch = fgetc( fp ) ) != EOF ) {
      ++n;
    }

    printf( "total characters: %d\n", n );
    fclose( fp );
  } else {
    perror( "error opening the file" );
  }

  return 0;
}
4 - Now I'm sure that the file is being input the way I want, so I can start counting character types one at a time, checking each time that the count is right on a test file:
Code:
#include <stdio.h>
#include <ctype.h>


int main( void ) {
  FILE *fp = fopen( "test.txt", "r" );

  if ( fp ) {
    int ch;
    int nalpha = 0;
    int ndigit = 0;
    int npunct = 0;
    int nspace = 0;

    while ( ( ch = fgetc( fp ) ) != EOF ) {
      if ( isalpha( ch ) ) {
        ++nalpha;
      } else if ( isdigit( ch ) ) {
        ++ndigit;
      } else if ( ispunct( ch ) ) {
        ++npunct;
      } else if ( isspace( ch ) ) {
        ++nspace;
      }
    }

    printf( "alphabetic characters: %d\n", nalpha );
    printf( "digit characters: %d\n", ndigit );
    printf( "punctuation characters: %d\n", npunct );
    printf( "whitespace characters: %d\n", nspace );
    fclose( fp );
  } else {
    perror( "error opening the file" );
  }

  return 0;
}
5 - Testing is done on producing the output, but the file is still hard coded, and I want the user to pass a file to the program. main is getting kind of long, so I'll refactor the counting code out into a function and test it all again to make sure it still works. Any change, even something tiny, means retesting:
Code:
#include <stdio.h>
#include <ctype.h>


void process_file( FILE *fp );


int main( void ) {
  FILE *fp = fopen( "test.txt", "r" );

  if ( fp ) {
    process_file( fp );
    fclose( fp );
  } else {
    perror( "error opening the file" );
  }

  return 0;
}


void process_file( FILE *fp ) {
  int ch;
  int nalpha = 0;
  int ndigit = 0;
  int npunct = 0;
  int nspace = 0;

  while ( ( ch = fgetc( fp ) ) != EOF ) {
    if ( isalpha( ch ) ) {
      ++nalpha;
    } else if ( isdigit( ch ) ) {
      ++ndigit;
    } else if ( ispunct( ch ) ) {
      ++npunct;
    } else if ( isspace( ch ) ) {
      ++nspace;
    }
  }

  printf( "alphabetic characters: %d\n", nalpha );
  printf( "digit characters: %d\n", ndigit );
  printf( "punctuation characters: %d\n", npunct );
  printf( "whitespace characters: %d\n", nspace );
}
6 - Now I can add the argument stuff for taking a file as a command line parameter without cluttering main up too much. I make sure my tests touch every code path, so I'll fake error and stuff to make sure that the error cases work like I want:
Code:
#include <stdio.h>
#include <ctype.h>


void process_file( FILE *fp );


int main( int argc, char *argv[] ) {
  if ( argc > 1 ) {
    FILE *fp = fopen( argv[1], "r" );

    if ( fp ) {
      process_file( fp );
      fclose( fp );
    } else {
      perror( "error opening the file" );
    }
  } else {
    fprintf( stderr, "usage: prog <filename>\n" );
  }

  return 0;
}


void process_file( FILE *fp ) {
  int ch;
  int nalpha = 0;
  int ndigit = 0;
  int npunct = 0;
  int nspace = 0;

  while ( ( ch = fgetc( fp ) ) != EOF ) {
    if ( isalpha( ch ) ) {
      ++nalpha;
    } else if ( isdigit( ch ) ) {
      ++ndigit;
    } else if ( ispunct( ch ) ) {
      ++npunct;
    } else if ( isspace( ch ) ) {
      ++nspace;
    }
  }

  printf( "alphabetic characters: %d\n", nalpha );
  printf( "digit characters: %d\n", ndigit );
  printf( "punctuation characters: %d\n", npunct );
  printf( "whitespace characters: %d\n", nspace );
}
7 - Now I can really crank down on the stability of the code by adding defensive cases:
Code:
#include <stdio.h>
#include <ctype.h>


int process_file( FILE *fp );


int main( int argc, char *argv[] ) {
  if ( argc > 1 ) {
    FILE *fp = fopen( argv[1], "r" );

    if ( fp ) {
      if ( !process_file( fp ) ) {
        perror( "error reading from the file" );
      }

      fclose( fp );
    } else {
      perror( "error opening the file" );
    }
  } else {
    fprintf( stderr, "usage: prog <filename>\n" );
  }

  return 0;
}


int process_file( FILE *fp ) {
  int ch;
  int nalpha = 0;
  int ndigit = 0;
  int npunct = 0;
  int nspace = 0;
  int rc = 0;

  if ( fp != NULL ) {
    while ( ( ch = fgetc( fp ) ) != EOF ) {
      if ( isalpha( ch ) ) {
        ++nalpha;
      } else if ( isdigit( ch ) ) {
        ++ndigit;
      } else if ( ispunct( ch ) ) {
        ++npunct;
      } else if ( isspace( ch ) ) {
        ++nspace;
      }
    }

    if ( !ferror( fp ) ) {
      printf( "alphabetic characters: %d\n", nalpha );
      printf( "digit characters: %d\n", ndigit );
      printf( "punctuation characters: %d\n", npunct );
      printf( "whitespace characters: %d\n", nspace );
      rc = 1;
    }
  }

  return rc;
}
8 - Now the code is solid, but there aren't any comments, so I'll go through it and add comments to places that might be confusing and then call it a day. There aren't many because this is a pretty simple program:
Code:
/*
  File - prog.c
  Author - D. Burke (Noir)
  
  Count alphabetic, digit, punctuation, and
  whitespace characters in a user supplied file
*/
#include <stdio.h>
#include <ctype.h>


int process_file( FILE *fp );


int main( int argc, char *argv[] ) {
  if ( argc > 1 ) {
    FILE *fp = fopen( argv[1], "r" );

    if ( fp ) {
      if ( !process_file( fp ) ) {
        // failure means a stream error or bad file
        perror( "error reading from the file" );
      }

      fclose( fp );
    } else {
      perror( "error opening the file" );
    }
  } else {
    fprintf( stderr, "usage: prog <filename>\n" );
  }

  return 0;
}


int process_file( FILE *fp ) {
  int ch;
  int nalpha = 0;
  int ndigit = 0;
  int npunct = 0;
  int nspace = 0;

  // assume failure
  int rc = 0;

  if ( fp != NULL ) {
    while ( ( ch = fgetc( fp ) ) != EOF ) {
      if ( isalpha( ch ) ) {
        ++nalpha;
      } else if ( isdigit( ch ) ) {
        ++ndigit;
      } else if ( ispunct( ch ) ) {
        ++npunct;
      } else if ( isspace( ch ) ) {
        ++nspace;
      }
    }

    if ( !ferror( fp ) ) {
      // only produce output if there are no errors
      printf( "alphabetic characters: %d\n", nalpha );
      printf( "digit characters: %d\n", ndigit );
      printf( "punctuation characters: %d\n", npunct );
      printf( "whitespace characters: %d\n", nspace );
      rc = 1;
    }
  }

  return rc;
}
That's how you should do it too. Start with a skeleton and build the program up bit by bit, making sure to test after every change. It's okay to change the requirements for testing like when I counted all the characters in the file or just printed the file out. It's okay to backtrack and change your mind on stuff too like when I decided to factor the counting code into a function. It's not as much building a program from a blueprint as it is evolving a program from an idea. You get to change your mind and make it better along the way even after you've finished doing it another way.
Noir is offline   Reply With Quote
Old 04-07-2007, 11:19 PM   #4
Registered User
 
Join Date: Mar 2007
Posts: 28
That was an awesome explaination noir, thanks.

What about my integer classification question, can anyone answer that?
Alphawaves is offline   Reply With Quote
Old 04-07-2007, 11:50 PM   #5
Registered User
 
Join Date: Apr 2006
Location: United States
Posts: 3,201
> What about my integer classification question, can anyone answer that?
> input is of class int
Your book was just telling you what data type to use for the numerical data.

Your program will differentiate between whitespace, digits, characters and punctuation if you use the right functions to look for those things. All input to a console program is textual in nature... integers aren't normally sent to the console (except in special cases), so if necessary, you would convert a string to an integer. In a counting program such as this one, it isn't necessary.
whiteflags is offline   Reply With Quote
Old 04-08-2007, 07:34 AM   #6
Registered User
 
Noir's Avatar
 
Join Date: Mar 2007
Posts: 218
Quote:
What about my integer classification question, can anyone answer that?
Oh, I missed that part. My bad. The reason you have to use an int for fgetc() is because of EOF. fgetc() returns either a character in the range of unsigned char or EOF. unsigned char is guaranteed to be positive, and EOF is guaranteed to be negative, so fgetc() can't return an unsigned char. It also can't return a signed char or a bunch of legitimate characters won't be returned right. So fgetc() returns an int because characters are just small integers. int can hold the full range of an unsigned char and the negative value of EOF.
Noir is offline   Reply With Quote
Old 04-08-2007, 03:31 PM   #7
Registered User
 
Join Date: Mar 2007
Posts: 28
Gotcha, after looking at closer, divines code doesnt compile. Looking back at it though i dont see a need to structure it like that, because the way noir has it structured makes alot more sense. The only thing i didnt understand was the addition of the "argument stuff" i dont think im familiar with those commands. I probably wont split my program into two different blocks but well see.
So how do i test this? Do i need to make a .txt file in the same directory?
Alphawaves is offline   Reply With Quote
Old 04-08-2007, 04:08 PM   #8
Registered User
 
Noir's Avatar
 
Join Date: Mar 2007
Posts: 218
Quote:
The only thing i didnt understand was the addition of the "argument stuff" i dont think im familiar with those commands.
You can call your program from the dos prompt and pass it arguments. Those arguments can be used from the parameters to main, argc and argv. If you call your program like this:
Code:
C:\>prog.exe testfile.txt
argc will be 2 and argv will be an array of strings that looks like
Code:
{"prog.exe","testfile.txt",NULL}
and you can get to the test file with argv[1]. The best way is to experiment with different ways of calling your program and with different files to see how your OS handles dos arguments, but the easiest way to get it right the first time is to hard code an absolute path to the file you want:
Code:
int main( void ) {
  FILE *fp = fopen( "C:\\worker\\testfiles\\test.txt", "r" );
Quote:
Do i need to make a .txt file in the same directory?
If you use a relative path the file probably has to be in the same directory as the program's exe unless you've added the program to a path environment variable, then the current working directory has to be the same directory that the test file is in.
Noir is offline   Reply With Quote
Old 04-08-2007, 04:54 PM   #9
Registered User
 
divineleft's Avatar
 
Join Date: Jul 2006
Posts: 158
Quote:
Originally Posted by Alphawaves View Post
Gotcha, after looking at closer, divines code doesnt compile.
it does on gcc-4.1.2

if it's segfaulting it's because the file specified doesn't exist. if you don't want it to segfault, you need to check if the stream is actually open before reading from it. that or specify a file that actually exists
divineleft is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Choosing a variable based on user text input. Compiling... C++ Programming 7 11-01-2005 01:21 AM
Parsing Text File and gathering input variables azamsharp1 C Programming 2 10-26-2005 08:43 AM
Unknown Memory Leak in Init() Function CodeHacker Windows Programming 3 07-09-2004 09:54 AM
mygets Dave_Sinkula C Programming 6 03-23-2003 07:23 PM
text input buffer clearing red_Marvin C++ Programming 4 03-20-2003 03:17 PM


All times are GMT -6. The time now is 02:04 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22