Split string up into single words

This is a discussion on Split string up into single words within the C++ Programming forums, part of the General Programming Boards category; Originally Posted by prog-bman You can also use getline on the stream to build a series of word. However, getline() ...

  1. #16
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    20,955
    Quote Originally Posted by prog-bman
    You can also use getline on the stream to build a series of word.
    However, getline() only allows a single char as the delimiter, hence if a range of chars are needed to act as delimiters, C_ntua's suggestion with my proposed modification would be more applicable... or anon's suggestion of boost::split could be used instead.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  2. #17
    Sweet
    Join Date
    Aug 2002
    Location
    Tucson, Arizona
    Posts
    1,801
    Indeed.

    I am sure someone could roll there own split function from getline or use the options you said above.
    Woop?

  3. #18
    The larch
    Join Date
    May 2006
    Posts
    3,573
    There's also Boost.Tokenizer which might be even more appropriate, since you don't want to get a list of words but each word individually to determine what to do with it.

    Sample usage:

    Code:
    #include<iostream>
    #include<boost/tokenizer.hpp>
    #include <boost/foreach.hpp>
    #include<string>
    
    void add_index_entry(const std::string& s, unsigned page_n)
    {
        std::cout << "adding: " << s << ": page " << page_n << '\n';
    }
    
    int main(){
        std::string line("Oh, this line - unfortunately - contains some punctuation...");
        unsigned page_n = 1;
    
        boost::tokenizer<> tokens(line);
        BOOST_FOREACH(const std::string& word, tokens) {
            if (word.size() > 3)
                add_index_entry(word, page_n);
        }
    }
    If the aim is not to implement everything yourself, boost - among other things - is rather helpful for string processing (splitting, case insensitive comparisons, etc)
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  4. #19
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Using boost is cheating

  5. #20
    Registered User
    Join Date
    Apr 2008
    Posts
    122
    Yeah sorry I can't use boost.

  6. #21
    Registered User
    Join Date
    Apr 2008
    Posts
    122
    Okay I found something. How could I use this:

    Code:
    /* strtok example */
    #include <stdio.h>
    #include <string.h>
    
    int main ()
    {
      char str[] ="- This, a sample string.";
      char * pch;
      printf ("Splitting string \"&#37;s\" into tokens:\n",str);
      pch = strtok (str," ,.-");
      while (pch != NULL)
      {
        printf ("%s\n",pch);
        pch = strtok (NULL, " ,.-");
      }
      return 0;
    }
    This is my attempt at making it work but it doesn't:

    Code:
    void Index::addWord(string word, int pageNumber)
    {
        cout << word;
        char * pch;
        pch = &word[0];
        pch = strtok (pch," :(#;[]""\"()!`'?,.-");
        while (pch != NULL)
        {
            //printf ("%s\n",pch);
            pch = strtok (NULL, " :(#;[]""\"()!`'?,.-");
            data.push_back(pch);
        }
    }
    I need to take the passed string, split it into a character array, then store the split words into the vector.

  7. #22
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Perhaps the problem is that you are adding data after a call to strtok without checking what it returned. data.push_back should occur where your printf statement is.

    Still there is something immoral about gaining non-constant access to a string's internal buffer

    Another approach is to use std::algorithms - IMO, the iterator interface is a bit more convenient than the indices interface of std::string (no special npos to check for), and you can use predicates:

    Code:
    #include <iostream>
    #include <string>
    #include <functional>
    #include <algorithm>
    #include <cctype>
    using namespace std;
    void addWords(const string& line, int )
    {
        cout << line << '\n';
        std::string::const_iterator word_start = line.begin(), word_end = line.begin();
        
        while (
            word_start = std::find_if(word_end, line.end(), std::ptr_fun<int, int>(std::isalnum)),
            word_end = std::find_if(word_start, line.end(), std::not1(std::ptr_fun<int, int>(std::isalnum))),
            word_start != word_end
            ) {
            std::cout << std::string(word_start, word_end) << '\n';
        }
    }
    
    int main()
    {
        std::string s("- This is an example string (which contains punctuation)! -");
        addWords(s, 10);
    }
    (Pardon the somewhat unorthodox use of comma operator )
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

Page 2 of 2 FirstFirst 12
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String issues
    By The_professor in forum C++ Programming
    Replies: 7
    Last Post: 06-12-2007, 09:11 AM
  2. can anyone see anything wrong with this code
    By occ0708 in forum C++ Programming
    Replies: 6
    Last Post: 12-07-2004, 11:47 AM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 09:33 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21