Thread: Word frequency and sorting

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    24

    Word frequency and sorting

    Hi, greetings to everyone, i'm new here and would like to ask some question. I'm a newbie to C Programming(i just learned it the last 2 months), and got an assignment to make program where it does a word counting and sorting them in the matter of their frequency. I don't need the full source code, i just need the basic logic and the basic command to:
    - read files (txt files)
    - breaking the files into words (how do you make a space the delimiter?)
    - and finally grouping them and soting them

    well, that's the outline, any help would be appreciated. Again, i sy that i don't need the source code, just need the basic command that supports the logic, because i am not permitted to outsourcing. Thanks.

  2. #2
    Registered User
    Join Date
    May 2008
    Location
    India
    Posts
    30
    Reading files :you can use FILE functions. FOPEN and fgetc.

    space delimiter : you can use the isspace function

    Grouping and sorting can be done with a tree structure.

  3. #3
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    Thanks for the quick reply, but i have some question:
    -in c , when i use fopen and put its mode to "r" will it open the file and allows the user to read the text or will the program detect the words and not open the txt file? because so far i'm using the fopen (file_name, "r") it does not open the file.

    - how do arrays fit in here? do they treat the words as arrays? if you got any info deeper than this, please let me know.

    O yeah, just out of curiosity, how long is this program in general, i see it is said to be a beginner's program so i predict a short to average length? or is it quiet long?
    Last edited by zyxx_66; 12-13-2008 at 07:51 AM.

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    You are using the right instruction to open the file, so something must be wrong with how you are doing it. Things to check: file_name has the proper name, your absolute or relative path is correct, and that the file exists.

    A program like this can be written in 2 or 3 screens, spread out nicely and easy to read.

    If you are familiar with or have been introduced to the programming concept of a "map", this would be a good application for it. A map is a key / value pair. Here, your key would be a word, and the value would be the number of times it occurs.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #5
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    So, with the "r" mode, it should open the file?, well i think i've inputted the name right but i'll check it again.
    As for the "map" concept, actually, i haven't heard it, can you give me some information on how this concept ?

  6. #6
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    A good coding practice is to check the return value of fopen(), which returns a null pointer if unsuccessful and sets errno to indicate the error. This way you will know if the system call worked or not.

  7. #7
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    OK, i've tested it again, and even used another source code for the open file program, and what i get is when the mode is "w", it overwrites if existed and created new if not exist but can not edit? and if it is "r" mode it doesn't show the text(if it is a text files), even though there is no failure message in opening a file.Is this how it should be? the computer just read to itself and not to the user?

    Oh, and also about the isspace functions, for my assignment, i saw in other boards that for this logic there are more easier to use command, can anyone list these commands?

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    "w" starts with a clean slate -- so if there was something there, well, too bad.

    "r" makes the file available to the program -- it doesn't open it up in notepad or anything like that. It's your job to process the data in it.

    And isspace is the easy-to-use function, as opposed to checking all the different space-like characters yourself.

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    O, so that what's going on with "w" and "r", thanks tabstop for the info.
    Also i forgot, to count number of words in a text, do you only need the space as a delimiter? or do you need newline and tabs for the delimiter as well? If indeed, you need those two as delimiter as well, what functions should i use?
    Sorry for asking many question, its just that i don't understand much of the c language. Thanks for the replies.

  10. #10
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    isspace() is all you need as it checks for whitespace characters as in blanks, spaces, newlines, tabs etc.

  11. #11
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    When processing a file for input, there are 3 typical stages:

    1) Open the file
    2) read the data from the file (most of the time until no more records are left to be read and you hit end-of-file (EOF) )
    3) close the file.
    Mainframe assembler programmer by trade. C coder when I can.

  12. #12
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    For the isspace function, i still don't know thoroughly(since i've never heard or learn it), :
    - How exactly am i going to use the words in a file to be scanned and applied the isspace function?
    (am i supposed to treat the file as a integer or arrays or what?)

    - And how do you specify the destination folder for a file? (so far the file i used is moved to the same directory as the program)

    I think that's all for now and i'm sure i'll be asking question again.

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by zyxx_66 View Post
    For the isspace function, i still don't know thoroughly(since i've never heard or learn it), :
    - How exactly am i going to use the words in a file to be scanned and applied the isspace function?
    (am i supposed to treat the file as a integer or arrays or what?)
    Once you figure out what this question means, let us know and we'll look at it again. (I mean, a file is just a big pile of characters, after all.)
    Quote Originally Posted by zyxx_66 View Post
    - And how do you specify the destination folder for a file? (so far the file i used is moved to the same directory as the program)
    Ditto. If you mean "how do I type in an absolute path so that the right file is found", the answer is "you type in an absolute path."

  14. #14
    Registered User
    Join Date
    Dec 2008
    Posts
    24
    For the first question, i mean how do you detect words in a file, and count it? And must the isspace function be paired with the getchar function?

    and for the second, do you also use fopen to specify the destination file?

  15. #15
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Counting words means taking each char, one at a time, and sending it to isspace(). If you get a return indicating the char is not a letter, then you are at the end of a word, so increment your count of words.

    Exceptions might rear their ugly head - things like blank lines in double spaced "block" style paragraphs, etc. You may need to have code to handle two newlines in a row mean don't increment the word count, or increment it, and then subtract one when the second newline is found.

    Code:
    //Opening the "destination.txt" file in write text mode
    if((outfile = fopen("destination.txt", "wt") == NULL)  {
       printf(" Unable to open destination.txt file \n");
       exit (1);
    }
    //if your code reaches here, you know the file was opened OK
    
    //rest of your code, here.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Need help in sorting
    By franziss in forum C++ Programming
    Replies: 3
    Last Post: 09-18-2005, 12:00 AM
  2. Sorting a string
    By Roaring_Tiger in forum C Programming
    Replies: 12
    Last Post: 09-26-2004, 08:12 AM
  3. extra word printing
    By kashifk in forum C++ Programming
    Replies: 2
    Last Post: 10-25-2003, 04:03 PM