Thread: Tiny thing in a simple parser - C++

  1. #1
    Registered User
    Join Date
    Nov 2019
    Posts
    135

    Tiny thing in a simple parser - C++

    Hello everyone.

    Generally, the staff has recommended us to use Boost to parse the file, but I've installed it and not succeeded to implement anything with it.

    So I have to parse a CSV file line-by-line, where each line is of 2 columns, separated of course by a comma. Each of these two columns is a digit. I have to take the integral value of these two digits and use them to construct my Fractal objects at the end.

    The first problem is: a line can start with white-spaces and it's okay, my solution is okay with that, but a line can also end with white-space (that is, after the second digit) - here my parser treats it as if it's an invalid input.

    The second problem is: The file can look like for example so:
    Code:
    1,1
    <HERE WE HAVE A NEWLINE>
    <HERE WE HAVE A NEWLINE>
    This format of file is okay. But my solution prints the respective 1,1 fractal 3 times, where a correct solution has to print it only once.

    The third problem is: The file can look like:
    Code:
    1,1
    <HERE WE HAVE A NEWLINE>
    1,1

    This is supposed to be an invalid input but my solution treats it like a correct one - and just skips over the middle NEWLINE.

    Maybe you can guide me how to fix these issues, it would really help me as I'm struggling with this exercise for 3 days from morning to evening.

    This is my code:

    Code:
    #include <iostream>
    #include "Fractal.h"
    #include <fstream>
    #include <stack>
    #include <sstream>
    
    
    const char *usgErr = "Usage: FractalDrawer <file path>\n";
    
    
    const char *invalidErr = "Invalid input\n";
    
    
    const char *VALIDEXT = "csv";
    
    
    const char EXTDOT = '.';
    
    
    const char COMMA = ',';
    
    
    const char MINTYPE = 1;
    
    
    const char MAXTYPE = 3;
    
    
    const int MINDIM = 1;
    
    
    const int MAXDIM = 6;
    
    
    const int NUBEROFARGS = 2;
    
    
    int main(int argc, char *argv[])
    {
    
    
        if (argc != NUBEROFARGS)
        {
            std::cerr << usgErr;
            std::exit(EXIT_FAILURE);
        }
    
    
        std::stack<Fractal *> resToPrint;
        std::string filepath = argv[1]; // Can be a relative/absolute path
    
    
        if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
        {
            std::cerr << invalidErr;
            exit(EXIT_FAILURE);
        }
    
    
        std::stringstream ss; // Treat it as a buffer to parse each line
        std::string s; // Use it with 'ss' to convert char digit to int
    
    
        std::ifstream myFile; // Declare on a pointer to file
        myFile.open(filepath); // Open CSV file
    
    
        if (!myFile) // If failed to open the file
        {
            std::cerr << invalidErr;
            exit(EXIT_FAILURE);
        }
    
    
        int type = 0;
        int dim = 0;
    
    
        while (myFile.peek() != EOF)
        {
            getline(myFile, s, COMMA); // Read to comma - the kind of fractal, store it in s
            ss << s << WHITESPACE; // Save the number in ss delimited by ' ' to be able to perform the double assignment
            s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
            getline(myFile, s, NEWLINE); // Read to NEWLINE - the dim of the fractal
            ss << s;
            ss >> type >> dim; // Double assignment
            s.clear(); // We don't want to save this number in s anymore as we won't it to be assigned somewhere else
    
    
            if (ss.peek() != EOF || type < MINTYPE || type > MAXTYPE || dim < MINDIM || dim > MAXDIM) 
            {
                std::cerr << invalidErr;
                std::exit(EXIT_FAILURE);
            }
    
    
            resToPrint.push(FractalFactory::factoryMethod(type, dim));
            ss.clear(); // Clear the buffer to update new values of the next line at the next iteration
        }
    
    
        while (!resToPrint.empty())
        {
            std::cout << *(resToPrint.top()) << std::endl;
            resToPrint.pop();
        }
    
    
        myFile.close();
    
    
        return 0;
    }

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    Maybe something like this:
    Code:
    #include <iostream>
    #include <sstream>
    #include <string>
    #include <vector>
     
    const int MaxType = 6;
    const int MaxDim  = 4;
     
    struct Row {
        int type;
        int dim;
        Row(int t, int d) : type{t}, dim{d} {}
    };
     
    void err(const char *msg, const std::string& line, int lineno) {
        std::cerr << "Line " << lineno << ": " << msg << ": " << line << '\n';
    }
     
    int main() {
        //std::ifstream in("input_file");    
        std::istream& in = std::cin; // test with std::cin
     
        std::vector<Row> input;
     
        int lineno = 0;
        for (std::string line; std::getline(in, line); ) {
            std::istringstream sin(line);
            ++lineno;
     
            int type = 0, dim = 0;
            char ch = 0;
     
            if (!(sin >> ch)) // blank line
                continue;
            sin.putback(ch);
     
            if (!(sin >> type) || type < 1 || type > MaxType) {
                err("bad type", line, lineno);
                continue; // or exit
            }
     
            if (!(sin >> ch) || ch != ',') {
                err("missing comma", line, lineno);
                continue; // or exit
            }
     
            if (!(sin >> dim) || dim < 1 || dim > MaxDim) {
                err("bad dim", line, lineno);
                continue; // or exit
            }
     
            input.push_back(Row(type, dim));
        }
     
        for (const auto& row: input)
             std::cout << row.type << " : " << row.dim << '\n';
    }
    Last edited by john.c; 01-04-2020 at 05:33 PM.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  3. #3
    Registered User
    Join Date
    Nov 2019
    Posts
    135
    @john.c

    Thank you for this solution!

    Actually, now it seems that the first two problems are corrected, but still the third problem - the same issue.

    And another problem seems to occur with your suggestion actually,
    lines like:
    Code:
    1,1  something
    or
    Code:
    1,1something
    are considered invalid input - but it outputs the respective fractal without any error.

    This is the update code:

    Code:
    #include <iostream>
    #include "Fractal.h"
    #include <fstream>
    #include <stack>
    #include <sstream>
    
    
    const char *usgErr = "Usage: FractalDrawer <file path>\n";
    
    
    const char *invalidErr = "Invalid input\n";
    
    
    const char *VALIDEXT = "csv";
    
    
    const char EXTDOT = '.';
    
    
    const char COMMA = ',';
    
    
    const char MINTYPE = 1;
    
    
    const char MAXTYPE = 3;
    
    
    const int MINDIM = 1;
    
    
    const int MAXDIM = 6;
    
    
    const int NUBEROFARGS = 2;
    
    
    int main(int argc, char *argv[])
    {
    
    
        if (argc != NUBEROFARGS)
        {
            std::cerr << usgErr;
            std::exit(EXIT_FAILURE);
        }
    
    
        std::string filepath = argv[1];
    
    
        if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
        {
            std::cerr << invalidErr;
            exit(EXIT_FAILURE);
        }
    
    
        std::ifstream in(filepath);
    
    
        std::stack<Fractal *> resToPrint;
    
    
        int lineno = 0;
        for (std::string line; std::getline(in, line);)
        {
            std::istringstream sin(line);
            ++lineno;
    
    
            int type = 0, dim = 0;
            char ch = 0;
    
    
            if (!(sin >> ch)) // blank line
            {
                continue;
            }
            sin.putback(ch);
    
    
            if (!(sin >> type) || type < 1 || type > 3)
            {
                std::cerr << invalidErr;
                exit(EXIT_FAILURE);
            }
    
    
            if (!(sin >> ch) || ch != ',')
            {
                std::cerr << invalidErr;
                exit(EXIT_FAILURE);
            }
    
    
            if (!(sin >> dim) || dim < 1 || dim > 6)
            {
                std::cerr << invalidErr;
                exit(EXIT_FAILURE);
            }
    
    
            resToPrint.push(FractalFactory::factoryMethod(type, dim));
        }
    
    
        while (!resToPrint.empty())
        {
            std::cout << *(resToPrint.top()) << std::endl;
            resToPrint.pop();
        }
    
    
        in.close();
    
    
        return 0;
    }

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    To treat extra chars at the end as invalid, put this before the push_back line:
    Code:
            if (sin >> ch) {
                err("extra characters at end", line, lineno);
                continue; // or exit
            }
    I don't understand question 3. You say it's invalid but you don't say why. Is it because there's two blank lines in a row? Or because of duplicate input (1,1 was entered twice) ? Or what?
    A little inaccuracy saves tons of explanation. - H.H. Munro

  5. #5
    Registered User
    Join Date
    Nov 2019
    Posts
    135
    @john.c

    As for the extra chars - amazing!

    As for the last problem - we can have multiple empty lines, but if there is any line which is non-empty after an empty line - that's an invalid input. That's the whole issue.

    That is, once we encounter an empty line - the next contents of the file must only contain empty lines.

  6. #6
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    So blank lines are only allowed at the end of the file?
    That seems very strange to me.
    You should check to make sure that's correct.
    However, I'll post some code to do that in a few minutes ... or more ... depending on bugs ....
    A little inaccuracy saves tons of explanation. - H.H. Munro

  7. #7
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    How about this:
    Code:
    #include <iostream>
    #include <sstream>
    #include <string>
    #include <vector>
     
    const int MaxType = 6;
    const int MaxDim  = 4;
     
    struct Row {
        int type;
        int dim;
        Row(int t, int d) : type{t}, dim{d} {}
    };
     
    void err(const char *msg, const std::string& line, int lineno) {
        std::cerr << "Error on line " << lineno << ": " << msg << ": " << line << '\n';
    }
     
    int main() {
        //std::ifstream in("input_file");    
        std::istream& in = std::cin; // test with std::cin
     
        std::vector<Row> input;
     
        int lineno = 0;
        bool blank_line_seen = false;
        for (std::string line; std::getline(in, line); ) {
            std::istringstream sin(line);
            ++lineno;
     
            char ch = 0;
            if (!(sin >> ch)) {
                blank_line_seen = true;
                continue;
            }
            else if (blank_line_seen) {
                err("non-blank lines after blank line(s)", line, lineno);
                break; // or exit
            }
     
            sin.putback(ch);
     
            int type = 0, dim = 0;
     
            if (!(sin >> type) || type < 1 || type > MaxType) {
                err("bad type", line, lineno);
                continue; // or exit
            }
     
            if (!(sin >> ch) || ch != ',') {
                err("missing comma", line, lineno);
                continue; // or exit
            }
     
            if (!(sin >> dim) || dim < 1 || dim > MaxDim) {
                err("bad dim", line, lineno);
                continue; // or exit
            }
     
            if (sin >> ch) {
                err("extra characters at end", line, lineno);
                continue; // or exit
            }
     
            input.push_back(Row(type, dim));
        }
     
        for (const auto& row: input)
             std::cout << row.type << " : " << row.dim << '\n';
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

  8. #8
    Registered User
    Join Date
    Nov 2019
    Posts
    135
    @john.c

    Yea... That's how their idiotic solution works - but I'll verify it again.

    You're amazing - thank you so much john!!!!!!!!!!!!!!!!!!!

    It's like 4am now so I'll try to understand your solution step by step tomorrow.

    Again, really appreciate your huge help.

  9. #9
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    Okay. I'll try to add some comments to the code to help you out. However, I'm very lazy when it comes to comments. But I'll try.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Usually, the recommendation is to use a CSV parsing library because CSV has various variants and the formats can be complex and fickle. In this case though, I agree that just writing the parser yourself will do, but then because the format is rather straightforward, it may be even better to express it as a regular expression as it will likely be a readable regex pattern rather than one that requires considerable deciphering. For example, drawing on john.c's code in post #2:
    Code:
    #include <iostream>
    #include <regex>
    #include <string>
    #include <vector>
    
    const int MaxType = 6;
    const int MaxDim  = 4;
    
    struct Row
    {
        int type;
        int dim;
        Row(int t, int d) : type{t}, dim{d} {}
    };
    
    void err(const char *msg, const std::string& line, int lineno)
    {
        std::cerr << "Line " << lineno << ": " << msg << ": " << line << '\n';
    }
    
    int main()
    {
        const std::regex pattern{"\\s*(\\d),(\\d)\\s*"};
    
        //std::ifstream in("input_file");    
        std::istream& in = std::cin; // test with std::cin
    
        std::vector<Row> input;
    
        int lineno{0};
        bool found_blank{false};
        std::string line;
        while (getline(in, line))
        {
            ++lineno;
    
            if (line.empty())
            {
                found_blank = true;
                continue;
            }
    
            if (found_blank)
            {
                err("input encountered after blank line", line, lineno);
                continue;
            }
    
            std::smatch matches;
            if (!(regex_match(line, matches, pattern) && matches.size() == 3))
            {
                err("invalid input", line, lineno);
                continue;
            }
    
            auto type{stoi(matches.str(1))};
            if (type < 1 || type > MaxType)
            {
                err("bad type", line, lineno);
                continue;
            }
    
            auto dim{stoi(matches.str(2))};
            if (dim < 1 || dim > MaxDim)
            {
                err("bad dim", line, lineno);
                continue;
            }
    
            input.emplace_back(type, dim);
        }
    
        for (const auto& row: input)
        {
            std::cout << row.type << " : " << row.dim << '\n';
        }
    }
    The advantage of this approach is that if you want to allow multiple digits for type or dim, you can just change \\d to say, \\d+, and adjust the maximums accordingly. Likewise, allowing whitespace before or after the comma just requires a change to the regex pattern rather than the addition of any whitespace parsing logic.
    Last edited by laserlight; 01-04-2020 at 08:11 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #11
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    I didn't even consider a regex. Duh! (And I used to use Perl quite a bit, so I love regexes.)

    Likewise, allowing whitespace before or after the comma just requires a change to the regex pattern rather than the addition of any whitespace parsing logic.
    My code allows whitespace on either side of the comma or integers. I assumed that to be what was desired, but I see now that it is ambiguous whether or not that is allowed.

    @HelpMeC, Note the use of emplace_back, which is potentially more efficient than the push_back that I used since it constructs the object in place instead of constructing it and then copying it.

    @laserlight, But does push_back sometimes do something like that anyway?
    Also, line.empty() may not be correct, although again I am assuming something, that the line may contain spaces but nothing else. Specification details are a bugger.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by john.c
    @HelpMeC, Note the use of emplace_back, which is potentially more efficient than the push_back that I used since it constructs the object in place instead of constructing it and then copying it.

    @laserlight, But does push_back sometimes do something like that anyway?
    Yes, push_back has an overload that takes an rvalue reference such that a move could be done, so creating a temporary object to pass to push_back is roughly as efficient as using emplace_back with the arguments to create that tempory object through perfect forwarding, except that emplace_back saves on repeating yourself by having to type out the constructor invocation when the destination type is already known.

    In this case Row's move semantics is the same as its copy semantics though, but it is still roughly as efficient because with push_back we get the two ints copied to create the Row, then two ints copied to copy the Row, whereas with emplace_back with get two ints passed by reference (which costs about the same as copying them), then two ints copied to create the Row. So, emplace_back ends up being just syntactic sugar.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Registered User
    Join Date
    Dec 2017
    Posts
    1,626
    The "syntactic sugar" (mmmmmm, sugar) is reason enough to use it. Good ol' perfect-forwarding. Gotta love it.

    But is there ever a reason to use push_back instead? Should we always use emplace_back?
    A little inaccuracy saves tons of explanation. - H.H. Munro

  14. #14
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Yes, i.e., sometimes you really want to copy an existing object to push a copy to the back of the container, rather than construct a new object to append to the container. That said, there may be cases where emplace_back really is more efficient, e.g., the object is both expensive to copy and move, but cheap to construct, whereas if the object is expensive to construct, then emplace_back should be at least as efficient as push_back (unless you're copying an existing object and somehow copying is cheaper than constructing a new object, but that sounds like a contrived case).
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  15. #15
    Registered User
    Join Date
    Nov 2019
    Posts
    135
    @john.c - Two questions:
    1. Is there any special difference between doing that:
    Code:
    std::istringstream sin(line);
    or that:

    Code:
    std::istringstream sin;
    sin >> line;
    2. What's supposed to be the return value of such statement?

    Code:
    sin>> ch
    And what's the meaning of this command? Suppose sin contains this data: "John". Then after this invocation ch = 'n'?
    Because I can't fully understand what's happened here: sin.putback(ch)


    @laserlight - Actually we have learnt regex at the previous semester in Java - but that was when I had no time to spend time to cover this topic.
    Can you only explain how would you use this regex to define Maxima?

    Another question is regarding that:

    Code:
    auto dim{stoi(matches.str(2))};
    I don't think I'll use this solution because as I said, I'm not so friendly with regexes, nor with things like stoi (which demands us to handle exceptions), but I'll be glad to understand the syntax and semantics of this line (what's going on there?)

    Another thing, I think in my case of using a stack and a clone method - I can't replace the push method by something like emplace_back.




    BTW, I have also to validate that the path (can be relative/absolute) received is of a CSV format.
    My validation here is sufficient?

    Code:
    std::string filepath = argv[1];
    
    
        if (filepath.substr(filepath.find_last_of(EXTDOT) + 1) != VALIDEXT)
        {
            std::cerr << invalidErr;
            exit(EXIT_FAILURE);
        }
    
    
        std::ifstream in(filepath);

    Thank you so much friends!
    Last edited by HelpMeC; 01-05-2020 at 06:47 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Simple parser - K&R ex6-2
    By lukasjob in forum C Programming
    Replies: 3
    Last Post: 11-16-2011, 08:46 AM
  2. Simple parser
    By lruc in forum C Programming
    Replies: 5
    Last Post: 11-19-2009, 12:19 AM
  3. Thread safety for tiny simple functions
    By CodeMonkey in forum C++ Programming
    Replies: 16
    Last Post: 12-31-2008, 12:20 AM
  4. not able to understand this tiny tiny method
    By noobcpp in forum C++ Programming
    Replies: 5
    Last Post: 10-20-2008, 10:42 AM
  5. Open Source Tiny Simple TFTP Server
    By Geolingo in forum C++ Programming
    Replies: 1
    Last Post: 03-28-2004, 03:27 PM

Tags for this Thread