Thread: Pointer vs shared_ptr vs ?

  1. #1
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393

    Pointer vs shared_ptr vs ?

    I'm building a MyString class that has a `split' member function which should return a list of matches based on the argument given. To represent the list I am using the container std::vector<std::string>. The question is how to return this in a nice way. My first try loooked like this:

    Code:
    #include <iostream>
    #include <string>
    #include <vector>
    
    namespace tut {
        using std::vector;
        using std::string;
        class MyString {
          private:
            string data;
          public:
            MyString(string str) : data(str) {}
            vector<string> *split(string regexp) {
                (void)regexp;  // ignore for now
                vector<string> *vec = new vector<string>;
                // Push some example elements
                vec->push_back("one");
                vec->push_back("two");
                vec->push_back("three");
                return vec;
            }
        };
    }
    
    int main()
    {
        using namespace std;
        using namespace tut;
        {
            vector<string> *list = MyString("one, two, three").split(", ?");
            vector<string>::iterator it;
            for (it = list->begin(); it != list->end(); it++) {
                cout << *it << endl;
            }
            delete list;
        } 
        /* ... */
    }
    The idea is that list should be deallocated at the point /* ... */ is reached. As you can see, manually deleting list is required. Then I thought to use a smart pointers as follows:

    Code:
    namespace tut {
        using std::vector;
        using std::string;
        using std::shared_ptr;
        typedef shared_ptr<vector<string> > strvec_ptr;
        class MyString {
          private:
            string data;
          public:
            MyString(string str) : data(str) {}
            strvec_ptr split(string regexp) {
                (void)regexp;  // ignore for now
                strvec_ptr vec (new vector<string>);
                // Push some example elements
                vec->push_back("one");
                vec->push_back("two");
                vec->push_back("three");
                return vec;
            }
        };
    }
    Now the user of the class can use a strvec_ptr for the list, and the list should be freed automatically when it goes out of scope.
    Code:
        {
            strvec_ptr list = MyString("one, two, three").split(", ?");
            vector<string>::iterator it;
            for (it = list->begin(); it != list->end(); it++) {
                cout << *it << endl;
            }
        } 
        /* ... */
    Question 1: is this correct use of the shared_ptr?
    Question 2: Is there a better alternative I am overlooking?

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    One approach is to just return a std::vector<std::string>. The worry is that it could result in unnecessary copying, but return value optimisation can eliminate this, and then the use of move semantics introduced in C++11 could also make this no longer a concern.

    Another approach is to use an out parameter: the std::vector<std::string> is passed by reference.

    As for your std::shared_ptr approach: I think it is fine, if you really want to return a pointer, though std::unique_ptr might be more appropriate since you don't actually need a shared_ptr.

    EDIT:
    Come to think of it, have you considered an interface where instead of dealing directly with a std::vector<std::string>, you allow the caller to provide an output iterator. You then write each token to the output iterator as you go. This way, the caller is free to use say, a std::list<std::string> if he/she prefers, without having to transform the std::vector into a std::list. Furthermore, it sidesteps the fact that std::vector has more template parameters than just the element type.
    Last edited by laserlight; 01-26-2013 at 01:14 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Ok, thanks for the suggestion. Here is what I came up with

    Splitter.hxx:
    Code:
    #pragma once
    #include <string>
    #include <sstream>
    #include "pcre/pcrecpp.h"
    
    namespace tut {    
        using std::string;
        using std::ostringstream;
        using pcrecpp::StringPiece;
        using pcrecpp::RE;
    
        template <class T>
        class Splitter {
          private:
            T all_matches;
            typename T::iterator *split_it = NULL;
            bool split_it_is_begun = false;
    
          public:
            Splitter(string s, std::string delim, typename T::iterator *it) {
                split_it = it;
                split_it_is_begun = false; 
                StringPiece input(s);
                RE re(string("(.*?)") + delim);
    
                // mat is an element which comes before delim
                string mat;
                while (re.Consume(&input, &mat)) {
                    all_matches.push_back(mat);
                }
                
                // input has the final element that is after the last delim
                ostringstream oss;
                oss << input;
                all_matches.push_back(oss.str());
            }
            bool next(void) {
                if (!split_it_is_begun) {
                    split_it_is_begun = true;
                    *split_it = all_matches.begin();
                }else{
                    (*split_it)++;
                }
                if (*split_it == all_matches.end())
                    return false;
                return true;
            }
        };
    }
    ExtractWords.cpp (example program):
    Code:
    #include <iostream>
    #include <vector>
    #include <string>
    #include "Splitter.hxx"
    
    // Print words line by line. Delimeter is whitespace, comma or period
    int main()
    {
        using namespace std;
        using namespace tut;
        while (!cin.eof()) {
            string s;
            getline(cin, s);
            vector<string>::iterator it;
            Splitter<vector<string> > splitter(s, "[\\s,.]+", &it);
            while (splitter.next()) {
                if (*it == "")
                    continue;
                cout << *it << endl;
            }
        }
    }
    It works with at least vector<string> and list<string>, but it is currently kind of brain-dead because it constructs the entire list in memory in the constructor, so it can't currently be used to handle huge input. I think this should be done incrementally in the next member function.

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    Hmm, I think I prefer using the iterator as a type instead of a container type.

    Code:
    template <typename OutputIterator>
    OutputIterator Split (string chain, string link, OutputIterator out);
    And if the Split function deduces a type that does not meet the specifications of an output iterator, then it won't compile. I'm sure this can be adapted into a class if you want to, but in that case, having the results in a member container is not awful.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I'd actually consider taking a "lazy evaluation" approach, depending on what I was going to use it for...
    In other words split would return a pair of a custom iterator class, classified as an output_iterator. The iterators would keep a pointer to the original container, and hence can access the string, as well as some notion of where it is up to in the string. Upon incrementing one of these iterators, perform a regexp match on the remaining portion of the string. Dereferencing the iterator gives you the current match. The container has the rule that modifying the string invalidates these iterators.

    This might be a bit too flash for what you want, but if you want to try the idea then we can help you through it.

    Edit: Or you could do as whiteflags suggests while I was busy writing this.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by c99tutorial
    ExtractWords.cpp (example program):
    You made the classic mistake of using eof() to control a loop incorrectly. I understand that you're probably more familiar with C, but C has a similiarly common beginner mistake with respect to feof() that you are presumably aware of. Rather, use the return value of the getline call to control the loop.

    Quote Originally Posted by c99tutorial
    It works with at least vector<string> and list<string>, but it is currently kind of brain-dead because it constructs the entire list in memory in the constructor, so it can't currently be used to handle huge input.
    That is why I suggested that you "write each token to the output iterator as you go". Now, the problem is that you want to push_back elements to the std::vector, std::list, std::deque, etc. This can be done by using the std::back_inserter iterator adapter to get an output iterator that is an insert iterator, i.e., when you write to this kind of iterator, it inserts into the container. whiteflags' post #4 has an example of an interface that I had in mind.

    At the same time, the caller might be using say, an array of fixed size, and it able to guarantee that the number of tokens will not exceed the array size (or perhaps better yet, you could provide a interface to limit the number of tokens written). Then, the same write to an output iterator process would still work, even though arrays do not have a push_back member function.

    By the way, for objects that can be expensive to copy, we generally pass by const reference instead of passing by value. Then, iterators are generalisation of pointers, so we would not pass a pointer to an iterator, but rather pass the iterator itself, unless you really do intend to use call by reference to pass an iterator, yet have the argument be optional (by passing a null pointer).
    Last edited by laserlight; 01-26-2013 at 11:19 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Thanks for the suggestions. I should say that my goal is to use a regexp engine to process strings but to actually hide the regexp interface behind something simplified. Then I realized I don't really need an iterator or a vector at all. I just need two public member functions, `hasNext' and `getNext', and the constructor should take in a const reference to the string which will be splitted and the delimiter.

    Here is the simplified main program

    ExtractWords.cpp
    Code:
    #include <iostream>
    #include <string>
    #include "Splitter1.hxx"
    
    // Print words line by line. Delimeter is whitespace, comma or period
    int main()
    {
        using namespace std;
        using namespace tut;
        while (!cin.eof()) {
            string s;
            getline(cin, s);
            Splitter splitter(s, "[\\s,.]+");
            while (splitter.hasNext()) {
                string t = splitter.getNext();
                if (t == "")
                    continue;
                cout << t << endl;
            }
        }
    }
    Now I can no longer see why main should need an iterator. The class does not need a vector or any kind of container, either.

    Splitter1.hxx
    Code:
    #pragma once
    #include <string>
    #include <sstream>
    #include <exception>
    #include "pcre/pcrecpp.h"
    
    namespace tut {    
        using std::string;
        using std::ostringstream;
        using std::exception;
        using pcrecpp::StringPiece;
        using pcrecpp::RE;
        
        class NoMoreMatches : public exception {};
        class InvalidDelimiter : public exception {};
        
        class Splitter {
          private:
            string *mat_in;
            string *mat_out;
            RE *re;
            StringPiece *input;
            bool hasNext_ = false;
            bool isLast_ = false;
            
            void swap_pointers(string *&a, string *&b) {
                string *tmp = a;
                a = b;
                b = tmp;
            }
            void update_mat_in(void) {
                // consume an element which comes before delim
                if (re->Consume(input, mat_in)) {
                    isLast_ = false;
                    hasNext_ = true;
                    return;
                }
                isLast_ = true;
                hasNext_ = true;
    
                // get the final element that is after the last delim
                ostringstream oss;
                oss << *input;
                mat_in->assign(oss.str());        
            }
            
          public:
            Splitter(const string &s, string delim) {
                if (delim == "")
                    throw new InvalidDelimiter;
                input = new StringPiece(s);
                re = new RE(string("(.*?)") + delim);
                mat_in = new string();
                mat_out = new string();
                update_mat_in();
            }
            ~Splitter() {
                delete input;
                delete re;
                delete mat_in;
                delete mat_out;
            }
            bool hasNext(void) {
                return hasNext_;
            }
            string getNext(void) {
                if (!hasNext_)
                    throw new NoMoreMatches;
                swap_pointers(mat_in, mat_out);
                if (isLast_) {
                    hasNext_ = false;
                    return *mat_out;
                }
                update_mat_in();
                return *mat_out;
            }
        };
    }

  8. #8
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by c99tutorial
    Then I realized I don't really need an iterator or a vector at all. I just need two public member functions, `hasNext' and `getNext', and the constructor should take in a const reference to the string which will be splitted and the delimiter.
    Well, that is an iterator with a different interface (hasNext and getNext) from the one inspired by pointer syntax (designate a one past the end iterator and use ++iterator) that is used in the C++ standard library.

    Quote Originally Posted by c99tutorial
    Here is the simplified main program
    You really should fix the eof() thing, in case it creeps into your actual code.

    By the way, your Splitter class probably does not need all those pointer member variables. Take advantage of RAII instead, then you don't need to worry about manual memory management because std::string already handles memory management for you. As it stands, your class is missing a copy constructor and copy assignment operator, but if you did not have those pointers with manual memory management, you would not need them and could get rid of the destructor definition too.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #9
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Quote Originally Posted by laserlight View Post
    By the way, your Splitter class probably does not need all those pointer member variables. Take advantage of RAII instead
    Ok, I think I'm getting this...

    ExtractWords.cpp
    Code:
    #include <iostream>
    #include <string>
    #include <exception>
    #include "Splitter2.hxx"
    
    class IOError : public std::exception {};
    class UnknownError : public std::exception {};
    
    // Print words line by line. Delimeter is whitespace, comma or period
    int main()
    {
        using namespace std;
        using namespace tut;
        string s;
        while (getline(cin, s)) {
            Splitter splitter(s, "[\\s,.]+");
            while (splitter.hasNext()) {
                string t = splitter.getNext();
                if (t == "")
                    continue;
                cout << t << endl;
            }
        }
        if (cin.bad())
            throw new IOError;
        else if (!cin.eof())
            throw new UnknownError;
    }
    Splitter2.hxx
    Code:
    #pragma once
    #include <string>
    #include <sstream>
    #include <exception>
    #include "pcre/pcrecpp.h"
    
    namespace tut {    
        using std::string;
        using std::ostringstream;
        using std::exception;
        using pcrecpp::StringPiece;
        using pcrecpp::RE;
        
        class NoMoreMatches : public exception {};
        class InvalidDelimiter : public exception {};
        
        class Splitter {
          private:
            StringPiece input;
            RE re;
            string mat_in;
            string mat_out;
            bool hasNext_ = false;
            bool isLast_ = false;
            
            void update_mat_in(void) {
                // consume an element which comes before delim
                if (re.Consume(&input, &mat_in)) {
                    isLast_ = false;
                    hasNext_ = true;
                    return;
                }
                isLast_ = true;
                hasNext_ = true;
    
                // get the final element that is after the last delim
                ostringstream oss;
                oss << input;
                mat_in.assign(oss.str());        
            }
            
          public:
            Splitter(const string &s, string delim) :
                input(s),
                re(string("(.*?)") + delim)
            {
                if (delim == "")
                    throw new InvalidDelimiter;
                update_mat_in();
            }
            bool hasNext(void) {
                return hasNext_;
            }
            string getNext(void) {
                if (!hasNext_)
                    throw new NoMoreMatches;
                mat_in.swap(mat_out);
                if (isLast_) {
                    hasNext_ = false;
                    return mat_out;
                }
                update_mat_in();
                return mat_out;
            }
        };
    }
    No more pointers. Also taking advantage of std::string's swap member function.

  10. #10
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Allow me some notes

    Doing a quick search, there seems to be a FindAndConsume function if it does what it says I would assume you can just do
    Code:
            string getNext(void) {
                if (!hasNext_)
                    throw new NoMoreMatches;
                string output;
                re.FindAndConsume(&input, &output);
                return output; //EDIT on this
            }
    EDIT: Sorry, I realized it is not the simple, the delimiter will remain so you won't get a split. But you know how long the delimiter is so you can get the substring. Second error of mine was that it doesn't check for the very last part which could easily be solved by checking the return of FindAndConsume and returning input on false (or something between these lines)

    Note that there is a double copy, one internally and one when you return. A simplified function means that a return value optimization scheme would have more chances to be performed so this is a critical point to optimize in general. I am not actually familiar with this header but there has to be a way to do this without needing a stringstream and swapping in/out strings and stuff like that...

    Code:
    if (delim == "")
        throw new InvalidDelimiter;
    What is the behavior of having delim to "" ? If it just returns the whole string but it doesn't crash, then you shouldn't throw an error as you would make it worse. Either you will crash the entire program, either you will require the use of catching the exception which is costly and adds more lines, either someone will just check the delimiter before passing it to the Splitter constructor, in which case it is checked twice. For an utility class it is better not to have exceptions in such cases but let the programmer that will use this class decide the error checking.

    The same could be said for getNext(). You check the hasNext() to call getNext() and then internally inside getNext() and throw an error. Why check it twice? The iterator way is to get the next result with hasNext() and store it in a private field and getNext() just returns that field. This allows you to avoid the double checking. But essentially if you use the getNext() scheme just do something like this to simplify
    Code:
    string result;
    while (splitter.getNext(&result)) {
                if (result == "")
                    continue;
                cout << result << endl;
    }
    of course, if you don't like the "" in general as a result, you can internally skip it in getNext(). Then you can then even use it to indicate ending
    Code:
    while (true) {
               string t = splitter.getNext()
               if (t == "")
                    break;
                cout << t << endl;
    }
    In C++11 you could optimize:
    A. Move semantics (though probably not a big difference)
    Code:
    Splitter(const string &s, string&& delim)
    B. Use <regex> and then you can use regex_search which can do what you want
    Last edited by C_ntua; 01-27-2013 at 11:16 PM.

  11. #11
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Not that it really matters, but in case you find it interesting...
    In C++, and indeed many other languages also, it's more efficient to check for a string's length being zero, than it is to construct a new string from an empty string literal and then compare them.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  12. #12
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Thanks for the suggestions. The purpose of this was not to make it the most efficient. It was to make using the function convenient. For example, I tried <regex> from C++11 and couldn't get it to work properly. At least with g++ it doesn't; maybe it's not stable yet. For an empty delimiter the result is undefined (until I can come up with a reasonable way to define it), which is why I thought throwing an exception would be convenient.

    This function is provided by Java and allows an empty delimiter -- in this case it splits the string into character-by-character. However, this result doesn't really make any kind of sense, because valid delimiters like "," should always produce some empty strings for some input such as

    "a,b,c,,,,d,e,f,"

    Maybe a reasonable definition of splitting on a delimiter of "" could return an endless sequence of empty strings. This would make more sense to me but would not be useful at all. In fact, tt'd probably be better to crash the program in this case since it's probable a programmer's error.

  13. #13
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    By the way, the Boost.StringAlgo library might be of interest to you.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. std::tr1::shared_ptr question
    By StainedBlue in forum C++ Programming
    Replies: 13
    Last Post: 07-18-2010, 11:48 AM
  2. RCPtr ?= shared_ptr
    By MarkZWEERS in forum C++ Programming
    Replies: 1
    Last Post: 03-19-2009, 09:14 AM
  3. When to use shared_ptr?
    By hpesoj in forum C++ Programming
    Replies: 15
    Last Post: 07-22-2008, 04:33 AM
  4. Is shared_ptr dangerous?
    By Elysia in forum C++ Programming
    Replies: 19
    Last Post: 04-07-2008, 02:02 AM
  5. boost::shared_ptr
    By IfYouSaySo in forum C++ Programming
    Replies: 2
    Last Post: 01-30-2006, 12:00 AM