Thread: tokenizing...

  1. #1
    Registered User
    Join Date
    Oct 2001
    Posts
    2

    Smile tokenizing...

    hi, I'm trying to take in an input file, and then tokenizing each word in the file. The user then types in a word, let's say "the", and the output is supposed to be the indexes of the occurance of the word. I'm on the UNIX platform.

    eg. "this is the input of the file"
    output: the
    10 23
    this
    1
    .....etc etc.

    numbers, punctuation, and whitespaces are ignored.

    my problem is that the program is permanently reading from the same input file. If I try another input file the output would still be from the old file. eg. first input file has 2 "the"s. 2nd file has no "the"s. 2nd output would still give me the indexes of "the"s from the 1st file.

    Also, after searching for the first word, the pointer pch is "stuck" on the last token. I know it has something to do with the strtok function I'm using, but not sure exactly what the problem is.

    I haven't gotten to the part where I print out the indexes yet. (How would you do that anyway? The way my code is written I cna't even think of a way to keep track...HELP!!!)

    Please help.
    Please forgive if the code is too messy.

    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #include <iostream.h>
    #include <fstream.h>
    #include <ctype.h>

    void growArray(char *array);

    int main(int argc, char* argv[])
    {
    const int MAX_SIZE = 1000;
    char ch;
    char str[MAX_SIZE];
    char userInput[MAX_SIZE];

    for(int i=0; i<MAX_SIZE; i++){
    str[i]='\0';
    userInput[i]='\0';
    }

    if(argc<1){
    cerr << "USAGE: ./indexer file1" << endl;
    exit(1);
    }

    int counter = 0;

    for (int i = 1; i<argc; i++) {
    ifstream istrm(argv[i]);
    while (istrm.get(ch)){
    if(counter>=MAX_SIZE)
    growArray(str);
    str[counter]=ch;
    counter++;
    }
    }

    //cin.getline(userInput, MAX_SIZE);
    char * pch;
    pch = strtok(str," \n\t\"\'\\,.");
    do{
    while (pch != NULL){
    if(isspace(*pch) || ispunct(*pch)){
    pch = strtok(NULL, " \n\t\"\'\\,.");
    }
    else{
    if(strcmp(pch, userInput) == 0)
    cout << "Found one!" << endl;
    pch = strtok(NULL, " \n\t\"\'\\,.");
    //It seems that the pch pointer never goes back to the beginning of str...
    }
    }
    pch = strtok(str," \n\t\"\'\\,.");
    }while(cin.getline(userInput, MAX_SIZE));

    return 0;
    }


    /************************************************** ********************/
    /*THESE ARE MY FUNCTIONS */
    /************************************************** ********************/

    void growArray(char *array)
    {
    /*create a dynamic array that is bigger than passed in array; copies everything from passed in array to dynamic array; the effect is that the passed in array "grew"*/
    int size = sizeof array / sizeof array[0];
    char *pTemp = new char[size + 20];

    //transfer all of the object in the array to the objects in the temporary array
    for(int nCount = 0; nCount < size; nCount++)
    pTemp[nCount] = array[nCount];
    array = pTemp;
    }
    Last edited by IngramGc; 11-06-2001 at 04:09 AM.

  2. #2
    William
    Guest
    I've not looked at it all, but there is a least some errors in the growArray function.
    When you use sizeof on a char* it will return the size of the pointer (4 on a PC). It will not return the size of the table!!! You must pass in the number of elements in the table.
    Also, your function creates a new array, but the caller can not access it!!! You would have to return the new array (and delete the old one, except this won't work because it's an automatic array in your case...).

    Also, why do you loop over argc. Why not just use argv[1]???

    I think there's quite a few other problems (around strtok...). Correct the previous ones then let us know!!!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problem with string tokenizing
    By Mr_Miguel in forum C Programming
    Replies: 5
    Last Post: 11-29-2006, 02:02 PM
  2. string tokenizing string tok
    By bazzano in forum C Programming
    Replies: 3
    Last Post: 08-28-2005, 08:00 AM
  3. strtok tokenizing on spaces as well as my delimiter
    By snowblind37 in forum C++ Programming
    Replies: 2
    Last Post: 06-15-2004, 12:39 AM
  4. String Tokenizing
    By irncty99 in forum C++ Programming
    Replies: 21
    Last Post: 05-08-2003, 07:47 AM
  5. file tokenizing
    By Jammi in forum C++ Programming
    Replies: 0
    Last Post: 12-30-2002, 10:55 AM