Extract Words from String

This is a discussion on Extract Words from String within the C++ Programming forums, part of the General Programming Boards category; I have a string like this: Code: std::string Words = "One(1), Two(2)"; What I want to do now is to ...

  1. #1
    Registered User
    Join Date
    Dec 2007
    Posts
    383

    Extract Words from String

    I have a string like this:
    Code:
    std::string Words = "One(1), Two(2)";
    What I want to do now is to put "One" and "Two" to 2 other separate std::strings. So this meens to extract all words that is found before a "(".
    An important notice here is that I dont know how many words there is like this. It can be a maximum of 6 words. In this case 2, "One" and "Two".
    So if there is ex 4 words, I will put these 4 to a separate string.

    What technique is used for this. I beleive "find" is used in any way for this ?

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,318
    You can use find to look for the '(' character and extract everything before it. You'll also have to account for the comma and the space after it. If there is always ", " between entries then it shouldn't be too hard.

    Use substr() to extract the strings out once you know where the start and end of them are.

  3. #3
    Registered User
    Join Date
    Dec 2007
    Posts
    383
    I am trying to do something like this:

    Code:
    std::string Words = "One(1), Two(2)";
    std::string One, Two;
    
    if ( Words.find( "(" ) )
    {
    			
    Two = Words.substr(Words.rfind("("));
    
    }
    
    
     File << Two << '\n';
    The output of this is "(2)". I do know the logic what is happening from what I done but really dont know how to catch "Two" and "One". In a way I have only used substr() now.
    In the way I have done it substr() takes everything Before "(" away.
    Last edited by Coding; 03-06-2008 at 01:12 PM.

  4. #4
    Registered User
    Join Date
    Jan 2005
    Posts
    7,318
    find returns a location of the string that is found. It does not return a bool. You should save the return value of find, and that will be one past the last character of the substring you want.

    The first character of the first substring is the first character in the string, so that is easy.

    The version of substr you want to use takes the index of the first character and the total number of characters. All you need to do is do a little math to figure out how many characters to put in the substring based on the the index of the "(" (the result of find) and the index of the first character.

  5. #5
    Registered User
    Join Date
    Dec 2007
    Posts
    383
    It is the first time I extract a string. I quite of understand what to do but at the same time I dont know the details.
    When you meen "total number of characters" it this: Words.Length() ?
    I really dont know how to begin here. I am still stuck in my codeexample.

  6. #6
    Registered User
    Join Date
    Jan 2005
    Posts
    7,318
    In that sentence, total number of character referred to the total number of characters in the substring you are extracting. The first time through the string in this example, that will be 3 because "One" has three characters.

    You have to figure out the math that calculates the number of characters in the substring for a generic word. The tools you have are the index of the first letter in that word, and the index of the "(" which is after the last letter in the word. So what's the formula for getting the total number of characters in that word?

  7. #7
    Registered User
    Join Date
    Jan 2008
    Posts
    11
    I would use a for loop to read every letter gone through into a new string until a whitespace is encountered. Then begin to read into another string until another whitespace is found, and so on until the end of the initial string.

  8. #8
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,725
    I would use a for loop to read every letter gone through into a new string until a whitespace is encountered. Then begin to read into another string until another whitespace is found, and so on until the end of the initial string.
    It would be easier to use the find() member function of std::string. Remember, the number of chars between the space and the previous word varies as the length of the number in the parenthese varies.

    But yeah, use a loop, and to do that more easily, change the n number of variables to a std::vector<std::string>.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #9
    The larch
    Join Date
    May 2006
    Posts
    3,573
    You should also be able to split up strings using stringstreams and getline (which lets you specify which character would be treated as "ending" the line).
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  10. #10
    Registered User
    Join Date
    Dec 2007
    Posts
    383
    find() seems to be a good way to do it.
    For the string: "Number(1234)". I am looking for to extract "1234" and "Number"
    I have understand that I need to know where the positions is for ex "(" and ")".
    The code I have done so far look like this:
    Code:
    std::string Value10 = "Number(1234)";
    
    size_t found1;
    size_t found2;
    
    found1 = Value10.find("(");
    found2 = Value10.find(")");
    So "(" has position 6 and ")" has postion 11. When knowing these positions. What will be the next step to do to extract "1234" ?

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,725
    You can still use substr(), of course. By the way, you may want to use the version of find() that takes a second argument, namely the position from which to start searching. This would allow you to ignore the part of the string that has already been processed.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    Registered User
    Join Date
    Dec 2007
    Posts
    383
    I am not sure how the syntax look like for find() that takes a second argument.
    If I would write:
    Code:
    Value10.substr(found1);
    The output would be "(1234)". In this case () is still there. I could have written:
    Code:
    Value10.substr(found1 + 1)
    and have "1234)" but then still the ) is left.

  13. #13
    Registered User
    Join Date
    Dec 2007
    Posts
    383
    I think I found out a way to do it like this:

    Code:
    std::string Value10 = "Number(1234)";
    
    size_t found1;
    size_t found2;
    
    	found1 = Value10.find("(");
    	found2 = Value10.find(")");
    
    	ofstream exitfile;
    	exitfile.open("C:\\exitfile.txt");
    
    	std::string extracted = Value10.substr(found1 + 1, found2 - found1 - 1);
    
    				
    
            exitfile << extracted  << '\n';
    output: "1234"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ ini file reader problems
    By guitarist809 in forum C++ Programming
    Replies: 7
    Last Post: 09-04-2008, 06:02 AM
  2. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  3. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 04:45 PM
  4. Custom String class gives problem with another prog.
    By I BLcK I in forum C++ Programming
    Replies: 1
    Last Post: 12-18-2006, 02:40 AM
  5. Replies: 2
    Last Post: 05-05-2002, 01:38 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21