Thread: Split string up into single words

  1. #16
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by prog-bman
    You can also use getline on the stream to build a series of word.
    However, getline() only allows a single char as the delimiter, hence if a range of chars are needed to act as delimiters, C_ntua's suggestion with my proposed modification would be more applicable... or anon's suggestion of boost::split could be used instead.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #17
    Sweet
    Join Date
    Aug 2002
    Location
    Tucson, Arizona
    Posts
    1,820
    Indeed.

    I am sure someone could roll there own split function from getline or use the options you said above.
    Woop?

  3. #18
    The larch
    Join Date
    May 2006
    Posts
    3,573
    There's also Boost.Tokenizer which might be even more appropriate, since you don't want to get a list of words but each word individually to determine what to do with it.

    Sample usage:

    Code:
    #include<iostream>
    #include<boost/tokenizer.hpp>
    #include <boost/foreach.hpp>
    #include<string>
    
    void add_index_entry(const std::string& s, unsigned page_n)
    {
        std::cout << "adding: " << s << ": page " << page_n << '\n';
    }
    
    int main(){
        std::string line("Oh, this line - unfortunately - contains some punctuation...");
        unsigned page_n = 1;
    
        boost::tokenizer<> tokens(line);
        BOOST_FOREACH(const std::string& word, tokens) {
            if (word.size() > 3)
                add_index_entry(word, page_n);
        }
    }
    If the aim is not to implement everything yourself, boost - among other things - is rather helpful for string processing (splitting, case insensitive comparisons, etc)
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  4. #19
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Using boost is cheating

  5. #20
    Registered User
    Join Date
    Apr 2008
    Posts
    122
    Yeah sorry I can't use boost.

  6. #21
    Registered User
    Join Date
    Apr 2008
    Posts
    122
    Okay I found something. How could I use this:

    Code:
    /* strtok example */
    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
      char str[] ="- This, a sample string.";
      char * pch;
      printf ("Splitting string \"&#37;s\" into tokens:\n",str);
      pch = strtok (str," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      return 0;
    }
    This is my attempt at making it work but it doesn't:

    Code:
    void Index::addWord(string word, int pageNumber)
    {
        cout << word;
        char * pch;
        pch = &word[0];
        pch = strtok (pch," :(#;[]""\"()!`'?,.-");
        while (pch != NULL)
        {
            //printf ("%s\n",pch);
            pch = strtok (NULL, " :(#;[]""\"()!`'?,.-");
            data.push_back(pch);
        }
    }
    I need to take the passed string, split it into a character array, then store the split words into the vector.

  7. #22
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Perhaps the problem is that you are adding data after a call to strtok without checking what it returned. data.push_back should occur where your printf statement is.

    Still there is something immoral about gaining non-constant access to a string's internal buffer

    Another approach is to use std::algorithms - IMO, the iterator interface is a bit more convenient than the indices interface of std::string (no special npos to check for), and you can use predicates:

    Code:
    #include <iostream>
    #include <string>
    #include <functional>
    #include <algorithm>
    #include <cctype>
    using namespace std;
    void addWords(const string& line, int )
    {
        cout << line << '\n';
        std::string::const_iterator word_start = line.begin(), word_end = line.begin();
        
        while (
            word_start = std::find_if(word_end, line.end(), std::ptr_fun<int, int>(std::isalnum)),
            word_end = std::find_if(word_start, line.end(), std::not1(std::ptr_fun<int, int>(std::isalnum))),
            word_start != word_end
            ) {
            std::cout << std::string(word_start, word_end) << '\n';
        }
    }
    
    int main()
    {
        std::string s("- This is an example string (which contains punctuation)! -");
        addWords(s, 10);
    }
    (Pardon the somewhat unorthodox use of comma operator )
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String issues
    By The_professor in forum C++ Programming
    Replies: 7
    Last Post: 06-12-2007, 09:11 AM
  2. can anyone see anything wrong with this code
    By occ0708 in forum C++ Programming
    Replies: 6
    Last Post: 12-07-2004, 12:47 PM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 10:33 PM