Thread: Searching a file for a String Occurence

  1. #1
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Exclamation Searching a file for a String Occurence

    Okay, I can't read anymore about file I/O.

    I have the follow class:
    Code:
    class Student
    {
    private:
        std::string studentName;
        int numberDaysPresent;
        int numberDaysAbsent;
        int totalDaysOfClass;
        double currentGrade;
    }
    I read the data in from a comma delimited text file, and create 543 Student objects that contain that data.

    What I want to do now is search a different data file for all occurences of Student[x].studentName. But I do not want to load the other file completely in memory because it contains 11000 lines of data (and takes forever to load) in the following comma delimited format:
    DATE OF CLASS, STUDENT NAME, CLASS PARTICIPATION GRADE, QUIZ GRADE.

    I have been chewing on this one for several days, and haven't been able to figure it out. Any suggestions would be greatly appreciated.

    Cheers!

    EDIT:
    I then want to grab just the data from the file that relates to the matched student and assign it to the following class:
    Code:
    class OldStudentData
    {
        std::string dateOfClass;
        std::string studentName;
        double participationGrade;
        double quizGrade;
    }
    Last edited by Phyxashun; 01-07-2009 at 01:15 PM.

  2. #2
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    Open the file with ifstream, and chug on through with getline(), setting ',' as the delim. Read and check one field at a time -- or one line at a time.
    Give a larger portion of the file and I can be more specific.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  3. #3
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I would go with one line at a time. Read in a line into a string, type the magic word "find", and if successful then process the info.

  4. #4
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Smile

    Quote Originally Posted by CodeMonkey View Post
    Open the file with ifstream, and chug on through with getline(), setting ',' as the delim. Read and check one field at a time -- or one line at a time.
    Give a larger portion of the file and I can be more specific.
    Quote Originally Posted by tabstop View Post
    I would go with one line at a time. Read in a line into a string, type the magic word "find", and if successful then process the info.
    Both of those make more sense, I had been loading all of the data into the second class, and then comparing the two classes, which takes an eternity. I'll try the getline and find.

    Thanks!

  5. #5
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Lightbulb

    Hmmm, delimma, and I just figured the problem that I don't know how to solve.

    The second file isn't a comma delimited text file, it is a tab delimited binary file with the following struct:
    Code:
    struct ClassData
    {
        char dateOfClass[15];
        char studentName[25];
        double partGrade;
        double quizGrade;
    }
    I thought it was a text file, but I finally found the program that I used to originally write the data file. How would I read the data and compare the class to the binary data?

    Cheers!

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    tab-delimited binary? Really?

    Well, whatever you used to write it out, do the opposite to read it in. (fwrite->fread, etc.) You will have a char[] instead of a string, and then compare on that char array.

  7. #7
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    Quote Originally Posted by tabstop View Post
    tab-delimited binary? Really?

    Well, whatever you used to write it out, do the opposite to read it in. (fwrite->fread, etc.) You will have a char[] instead of a string, and then compare on that char array.
    Then I would have to do what I don't want to do: read the entire file into memory then compare the class against the data. That was taking upwards of 30-40 minutes. I will probably just read the binary in and write it as a comma delimited text file.

    One other question to throw out there, how in the world do you use a buffer to read from a file?

    Thanks in advance!

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Phyxashun View Post
    Then I would have to do what I don't want to do: read the entire file into memory then compare the class against the data.
    I never suggested such a thing. Read in one "line" of data, just like you were planning to do with getline. Nothing actually changes, here.

    Oh, and a buffer is where you put your data when you read it in. You read in one piece of data, and you put it somewhere, and you process it; and when you read the next piece of data in you put it in the same place.

  9. #9
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    Quote Originally Posted by tabstop View Post
    I never suggested such a thing. Read in one "line" of data, just like you were planning to do with getline. Nothing actually changes, here.

    Oh, and a buffer is where you put your data when you read it in. You read in one piece of data, and you put it somewhere, and you process it; and when you read the next piece of data in you put it in the same place.
    Hmmm, guess I need to go to sleep on this one, maybe I will be able to wrap my head around it tomorrow. Appreciate the help.

    Cheers!

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Phyxashun View Post
    Then I would have to do what I don't want to do: read the entire file into memory then compare the class against the data. That was taking upwards of 30-40 minutes. I will probably just read the binary in and write it as a comma delimited text file.

    One other question to throw out there, how in the world do you use a buffer to read from a file?

    Thanks in advance!
    Just HOW large is this file. (Or are you running this on a 1980's computer?).

    By the way, unless you have some good reason to believe that you will find the data in the first half of the file, you can expect that reading "one line" at a time will still average around 15-20 minutes, since on average you will read half the file, and worst case is still 30-40 minutes, since it will take that long to read the entire file (including when you search for something that you can't find).

    Unless of course you find a more clever way of storing your data - for example, if you are searching for a name, and the file is sorted in name-order, you could (assuming all lines are the same size) use a binary search - which allows you to search 1 million entries in about 20 comparisons.

    Databases uses a key<->index table that is sorted, so that you can search through for example "first name" to find "John" and then know where (at what index) you can find that data in the file. But that requires a fair bit more work.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    Quote Originally Posted by matsp View Post
    Just HOW large is this file. (Or are you running this on a 1980's computer?).

    By the way, unless you have some good reason to believe that you will find the data in the first half of the file, you can expect that reading "one line" at a time will still average around 15-20 minutes, since on average you will read half the file, and worst case is still 30-40 minutes, since it will take that long to read the entire file (including when you search for something that you can't find).
    Like I said in the beginning, the file is approximately 11000 entries in the specified format, but they are sorted by class date not by name. I will continue hacking away at the problem, once a find the solution that works for me; I'll post it.

    Thanks!

  12. #12
    Amazingly beautiful user.
    Join Date
    Jul 2005
    Location
    If you knew I'd have to kill you
    Posts
    254
    Can we see the code that you are using to load the entire file and then search it? Even for 11,000 entries, I can see this taking a few seconds, not a few minutes... There might be something wrong with the way you are loading it.
    Programming Your Mom. http://www.dandongs.com/

  13. #13
    Registered User
    Join Date
    Dec 2008
    Posts
    65
    I think I figured it out:
    Code:
    ClassData searchRecords(ClassData &obj, long recordNum)
    {
        ifstream fin(FILE_NAME2, ios::in | ios::binary);
        if(!fin)
        {
            cout << "Cannot open file.\n";
            exit(1);
        }
    
        fin.seekg(sizeof(ClassData) * recordNum, ios::beg);
        fin.read((char *) &obj, sizeof(ClassData));
        fin.close();
    
        if(!fin.good())
        {
            cout << "A file error occurred.\n";
            exit(1);
        }
        return obj;
    }
    I then used a for loop to search through everything read by the above function until I found matches then just stored that data in different member variables that I added to the original student class.

    Thanks for the help!

  14. #14
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Phyxashun View Post
    I think I figured it out:
    Code:
    ClassData searchRecords(ClassData &obj, long recordNum)
    {
        ifstream fin(FILE_NAME2, ios::in | ios::binary);
        if(!fin)
        {
            cout << "Cannot open file.\n";
            exit(1);
        }
    
        fin.seekg(sizeof(ClassData) * recordNum, ios::beg);
        fin.read((char *) &obj, sizeof(ClassData));
        fin.close();
    
        if(!fin.good())
        {
            cout << "A file error occurred.\n";
            exit(1);
        }
        return obj;
    }
    I then used a for loop to search through everything read by the above function until I found matches then just stored that data in different member variables that I added to the original student class.

    Thanks for the help!
    It would of course help if you didn't open, seek, and close the file for EVERY record - although it will work that way, it makes it a tad inefficient - it would be better to open the file in the function that calls the searchData function, and then close it when you have finished searching. You should also consider not returning the object, since you are already passing it in by reference - so why also make a copy for the calling function to the copy it again.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  15. #15
    Registered User
    Join Date
    Dec 2008
    Posts
    65

    Smile

    I changed it:
    Code:
    void searchRecords(ClassData &obj, long recordNum)
    {
        fin.seekg(sizeof(ClassData) * recordNum, ios::beg);
        fin.read((char *) &obj, sizeof(ClassData));
    }
    Thanks for the help!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inheritance Hierarchy for a Package class
    By twickre in forum C++ Programming
    Replies: 7
    Last Post: 12-08-2007, 04:13 PM
  2. Replies: 6
    Last Post: 01-03-2007, 03:02 PM
  3. C++ std routines
    By siavoshkc in forum C++ Programming
    Replies: 33
    Last Post: 07-28-2006, 12:13 AM
  4. Post...
    By maxorator in forum C++ Programming
    Replies: 12
    Last Post: 10-11-2005, 08:39 AM
  5. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM