Thread: Search text in file

  1. #1
    Village id10t
    Join Date
    May 2008
    Posts
    57

    Search text in file

    Ok, i started a new project. Basically its like the find function in Word. I want to open a text file and search for an occurence of a word. any tips? should the file first be read into a vector and then search the vector for the word? any hints will really be appreciated...

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    The simplest (but not the most effiicent) is to perform the search by reading one char at a time from the file, if it's matching, remember the position, otherwise keep reading.

    The complication comes with words that have repeated substrings, where you go past the beginning of the actual word when mismatching with a previous word, e.g. Searching for "abcab", and we find:
    ababcab
    The search will then consume ab, not match with c, but we have already read the a for "abcab". [There are longer examples where you read past more than one character].

    This is pretty obscure, but if ou want to be complete, you need to "go back" to where you started matching before you search again.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    You could take a look at this page on the Boyer-Moore fast string matching algorithm, as well as the links to other string matching algorithms.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Village id10t
    Join Date
    May 2008
    Posts
    57
    Code:
    while (in_stream>> input ) 
    {
        text.push_back(input);
        
    }
    Will this read charcater for charcater or word for word?

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by MarlonDean View Post
    Code:
    while (in_stream>> input ) 
    {
        text.push_back(input);
        
    }
    Will this read charcater for charcater or word for word?
    depends on what "input" is declared as, but if it's a char, yes.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Village id10t
    Join Date
    May 2008
    Posts
    57
    Code:
    vector<char> text;
    
    
    
    while (in_stream>> input ) 
    {
        text.push_back(input);
        
    }
    
    cout<<text.size()<<endl;
    
    for (counter=0;counter<text.size();counter++)
    {
        cout<<text[counter]<<" ";
    }    
    
    
    
    return 0;
    }
    Ok this is the backbone so far. I tell it to read from a file containing random words. but text.size() remains zero. for some reason the text is not being read into the vector. any suggestions?

    PS: I tested it for a text file containing numbers, works well...
    Last edited by MarlonDean; 05-16-2008 at 03:09 AM.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by MarlonDean View Post
    Code:
    vector<char> text;
    
    
    
    while (in_stream>> input ) 
    {
        text.push_back(input);
        
    }
    
    cout<<text.size()<<endl;
    
    for (counter=0;counter<text.size();counter++)
    {
        cout<<text[counter]<<" ";
    }    
    
    
    
    return 0;
    }
    Ok this is the backbone so far. I tell it to read from a file containing random words. but text.size() remains zero. for some reason the text is not being read into the vector. any suggestions?

    PS: I tested it for a text file containing numbers, works well...
    Right, so what type is the variable "input"?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    The larch
    Join Date
    May 2006
    Posts
    3,573
    May-be the file wasn't opened successfully?

    Or may-be you forgot to change the type of input to char?

    By the way, it seems that in_stream >> would strip away all whitespace characters. If you want to preserve them, you might try in_stream.get(input).

    Edit: or apply the noskipws format flag to in_stream first.

    By the way, you can also construct the vector from the input stream in the first place:
    Code:
    #include <iterator> //for istreambuf_iterator
    
        ifstream fin("input.txt");
        fin >> noskipws;
        vector<char> v((istreambuf_iterator<char>(fin)), istreambuf_iterator<char>());
       //v now contains all the characters in "input.txt"
    Last edited by anon; 05-16-2008 at 03:35 AM.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  9. #9
    Village id10t
    Join Date
    May 2008
    Posts
    57
    yes, yes! I declared input as type int instead of type char. and now it works much better, as exactly like anon predictated it throws away all my whitespace characters. now, because i want the end program to search for words, whitespaces are crucial. I still green to c++. Why does it do it? doesnt a space count as a character?

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by MarlonDean View Post
    yes, yes! I declared input as type int instead of type char. and now it works much better, as exactly like anon predictated it throws away all my whitespace characters. now, because i want the end program to search for words, whitespaces are crucial. I still green to c++. Why does it do it? doesnt a space count as a character?
    It does, but to many applications, skipping whitespace is exactly what you want to avoid having to write code to skip it within the applications that use the input. As described in anon's post, you can easily apply the "noskipws" to your input stream, so it's no big deal.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Village id10t
    Join Date
    May 2008
    Posts
    57
    Ok fantastic! I used the following and it captures the file exactly correct

    Code:
    while (in_stream>>noskipws>> input ) 
    {
        text.push_back(input);
        
    }
    now for the search function...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  2. search for text string in a file
    By basenews in forum C++ Programming
    Replies: 2
    Last Post: 05-03-2007, 05:15 AM
  3. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  4. Batch file programming
    By year2038bug in forum Tech Board
    Replies: 10
    Last Post: 09-05-2005, 03:30 PM