Thread: Alphabetising a file

  1. #1
    Registered User
    Join Date
    Jan 2003
    Posts
    7

    Searching a text file

    Hi,

    I have created a text file that lists words, all in lower case, but i need to alphabetise them so i can work out how many instances of each word there are.

    I understand the principles of a bubble sort however i have only ever implemented it using integers. I have also looked at qsort but i can work out how to use it in my program. In order to sort the list, will i have to read every word into a string or array in the program to sort them?

    Any help is much appreciated.

    Matt
    Last edited by mpawsey; 01-09-2003 at 02:44 PM.

  2. #2
    Confused Magos's Avatar
    Join Date
    Sep 2001
    Location
    Sweden
    Posts
    3,145
    I suggest you use a sorting algorithm (bubble sort may be fine, though a little ineffective). But instead of using the = < > operators, use the strcmp() function. This function takes two strings and returns 0 if they are equal (=), a negative number if the first string is 'smaller' than the second (<) and a positive number if the first string is 'greater' than the second (>).
    MagosX.com

    Give a man a fish and you feed him for a day.
    Teach a man to fish and you feed him for a lifetime.

  3. #3
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    thanks for the help. I think i should be able work out the number of each word better now, but will i have to read all of the words into strings before working on them? If i had a normal set of strings i could probably do it the problem i am having now is that they are coming from a file. Any further suggestoins?

    Again thanks for all the help!

  4. #4
    Registered User
    Join Date
    Oct 2002
    Posts
    98
    I would approach the problem using a linked list.

    Create a structure like this
    Code:
    struct node{
      char string[64];
      int    count;
    
      struct node *next_node;
      };
    The idea would be to read each string from your file in turn; if the string already exists in your list of nodes, increment the counter for that string, if not, append it to the end of the list.

    You could then bubble sort the list either using the strcmp() method above, or using the count integer, depending on which result you wanted, then write the strings back out to a file.

    This does assume that you have some way of delimiting the strings in your file (are they all followed by a newline '\n' character?).

    If you need any help with how to code this, let me know which bit you're stuck on.

  5. #5
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    that sounds spot on for what i need. I have added '\n' character on the end of each string so that shouldn't be a problem. i have never reallu used linked lists before but i will look it up and give it a go!

    thanks a lot for your help, i may be back

    Matt

  6. #6
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    Oh dear! I must have read 10 different tutorials on linked lists but i still can't grasp how to implement them in this situation. I understand how can make the list, add the first word, get a second word and check it against thr first etc, but then i dont understand how go back to the top of the list to check through to see if words have already been used. Is this the prefered method of sorting a list from a file or does anyone else have any suggestions.

    Thanks for being so helpful.

    Matt

  7. #7
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    i have thought of another way of doing this but still i have a problem. basically i thought i could open the file for append then read in the first word. i used a integer as a pointer as to which word in the list i was on, incremented after each word was read. Before reading another word i used rewind(stream ) to go back to the start then the word was overwritten with "--empty--" The program should read the second word in, then compare if they are the same or not. If so i increment an integer that counts how many times the word has been used and the same process is performed of rewinding the file, looping through using the position integer to get to where i need to go and writing empty.

    This way i figured afer the eof was reached i could write the word and how many times it was used in a new file, then go through again on another word. However i implemented the first section of the process but the file doesn't seem to rewind. instead it writes "--empty--" at the end of the file.

    Can anyone help at all?

  8. #8
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    thanks for the reply salem, i am quite comfertable with searching a list and comparing strings in, and i guessed i could just do that and increment a counter everytime i find another instance of the word, however do you have any ideas as to how i can stop counting the same words twice?

    for instance if i had the list:
    computer
    pc
    computer
    apple
    laptop
    computer

    i could easily read in the first word 'computer' read down the list incremeneting everytime i read in a 'computer', however after checking for how many 'pc's there were, how can i stop it checking for 'computer's because of the second instance of the word? That is why i thought i could replace every word i logged as a repeat with --EMPTY-- but it doesn't seem to rewind and search properly under the append mode. I csn't find any other way of searching a text file

    many thanks for all the help
    Last edited by mpawsey; 01-09-2003 at 02:43 PM.

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    do
    {
        read a word
        if not at end of file
            see if word is in list of known words
            if not
                add word to list of known words
            else
                increment counter
    } while not at the end of the file
    
    close input file
    
    open file for writing
    for each word in list
        write word to file (and counter)
    
    close file
    Don't bother "sorting", just keep your list alphabatized. IE: Make your "add to list" function add them alphabeticly. Also, use double linked lists. They're infinitely easier than single linked lists, as far as insertion and modification go.

    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    Registered User
    Join Date
    Jan 2003
    Posts
    7
    unfortunately i dont know if i can add them alphabetically, i am taking a normal written text file and checking it letter by letter for valid characters. If the character is valic (A-Z,a-z,0-9,-,' ')then i add it to the list. if not (puctuation etc) i print a \n for a new line.

    I am intending to count how many unique words there are in a piece of text, then print out a list of each unique word and how many times it appears in the text as a list.

    at the moment i am completely out of ideas!

    thank you all so much again

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Encryption program
    By zeiffelz in forum C Programming
    Replies: 1
    Last Post: 06-15-2005, 03:39 AM
  2. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  3. Making a LIB file from a DEF file for a DLL
    By JMPACS in forum C++ Programming
    Replies: 0
    Last Post: 08-02-2003, 08:19 PM
  4. Hmm....help me take a look at this: File Encryptor
    By heljy in forum C Programming
    Replies: 3
    Last Post: 03-23-2002, 10:57 AM
  5. Need a suggestion on a school project..
    By Screwz Luse in forum C Programming
    Replies: 5
    Last Post: 11-27-2001, 02:58 AM