Thread: reading s-expressions in C++ - problem with improper lists

  1. #1
    Registered User
    Join Date
    Feb 2022
    Posts
    45

    reading s-expressions in C++ - problem with improper lists

    I am writing the (rather simplistic) reader code for `L, a Lisp dialect written in C++, and have managed to get it working in all of the tests I've written, except for one: the case of an improper list (a Lisp list in which the last pair is a an ordered pair - that is to say, the CDR of the last pair is an atom rather than a null pointer) with more than two members, such as


    Code:
    (foo bar . quux)
    In the test results, it is coming out as simply
    Code:
    (foo bar   quux)

    While this is a minor issue, it is still something I would like to resolve, and getting more eyes on a problem is always a good idea.


    Code:
    #ifndef ATOM_H
    #define ATOM_H
    
    
    #include <string>
    #include <cstdint>
    
    
    
    
    class Atom
    {
    public:
        Atom() {};
        virtual ~Atom() {};
        virtual std::string to_string() {return "";};
    };
    
    
    
    
    class Symbol: public Atom
    {
    private:
        std::string literal;
    
    
    
    
    public:
        Symbol(std::string s): literal(s) {};
        std::string value() {return literal;};
        virtual std::string to_string() {return literal;};
    };
    
    
    class Dot: public Atom
    {
    private:
    
    
    public:
        Dot() {};
    };
    
    
    
    
    class RightParen: public Atom
    {
    private:
    
    
    public:
        RightParen() {};
    };
    
    
    
    
    class Number: public Atom
    {
    private:
    
    
    public:
        Number() {};
    };
    
    
    
    
    class Integer64: public Number
    {
    private:
        long long literal;
    public:
        Integer64(long long i): literal(i) {};
        long long value() {return literal;};
        virtual std::string to_string() {return std::to_string(literal);};
    };
    
    
    
    
    class FP_Double: public Number
    {
    private:
        double literal;
    public:
        FP_Double(double i): literal(i) {};
        double value() {return literal;};
        virtual std::string to_string() {return std::to_string(literal);};
    };
    
    
    
    
    class Pair: public Atom
    {
    protected:
        Atom *car, *cdr;
    public:
        Pair(): car(nullptr), cdr(nullptr) {};
        Pair(Atom* head): car(head), cdr(nullptr) {};
        Pair(Atom* head, Atom* tail): car(head), cdr(tail) {};
        virtual ~Pair();
        virtual std::string to_string();
        void set_car(Atom* a) {car = a;};
        void set_cdr(Atom* a) {cdr = a;};
        Atom* get_car() {return car;};
        Atom* get_cdr() {return cdr;};
    };
    
    
    #endif



    Code:
    #include <iostream>
    #include <string>
    #include <typeinfo>
    #include "atom.h"
    
    
    Pair::~Pair()
    {
        if (car != nullptr)
        {
            delete car;
            car = nullptr;
        }
        if (cdr != nullptr)
        {
            delete cdr;
            cdr = nullptr;
        }
    }
    
    
    
    
    std::string to_string_list_helper(Pair *p)
    {
    
    
        std::string midpoint = " . ";
    
    
        if (p == nullptr)
        {
            return "";
        }
    
    
        if (p->get_car() == nullptr)
        {
            if (p->get_cdr() == nullptr)
            {
                return "";
            }
            else
            {
                return "()" + midpoint + p->get_cdr()->to_string();
            }
        }
        else if (p->get_cdr() == nullptr)
        {
            return p->get_car()->to_string();
        }
        else
        {
            if (typeid(*(p->get_cdr())) == typeid(Pair))
            {
                return p->get_car()->to_string()
                       + " " + to_string_list_helper(dynamic_cast<Pair *>(p->get_cdr()));
            }
            else
            {
                return p->get_car()->to_string() + midpoint + p->get_cdr()->to_string();
            }
        }
    }
    
    
    
    
    std::string Pair::to_string()
    {
        return "(" + to_string_list_helper(dynamic_cast<Pair *>(this)) + ")";
    }



    Code:
    #include <fstream>
    #include <ios>
    #include <iostream>
    #include <string>
    #include <sstream>
    #include <typeinfo>
    #include "atom.h"
    #include "read.h"
    
    
    
    
    void read_src_file(std::stringstream& src, std::ifstream& src_file)
    {
        while (src_file)
        {
            src << src_file.rdbuf();
        }
    }
    
    
    
    
    
    
    Atom* read_expression(std::stringstream& src)
    {
        if (src.eof())
        {
            throw new unexpected_end_of_file_exception();
        }
    
    
        char ch;
        src >> ch;
    
    
        if (std::iswspace(ch))
        {
            return read_expression(src);
        }
        else if (ch == '.')
        {
            return new Dot();
        }
        else if (ch == '(')
        {
            return read_list(src);
        }
        else if (ch == ')')
        {
            return new RightParen();
        }
        else if (std::isdigit(ch))
        {
            return read_number(ch, src);
        }
        else
        {
            return (read_symbol(ch, src));
        }
    }
    
    
    
    
    Atom* read_list(std::stringstream& src)
    {
        Atom* head = read_expression(src);
    
    
        if (head == nullptr)
        {
            return nullptr;
        }
    
    
        if(typeid(*head) == typeid(RightParen))
        {
            return nullptr;
        }
    
    
        Atom* tail = read_expression(src);
    
    
        if (typeid(*tail) == typeid(Dot))
        {
            return new Pair(head, read_expression(src));
        }
        else if(typeid(*tail) == typeid(RightParen))
        {
            return new Pair(head);
        }
        else
        {
            return new Pair(head, new Pair(tail, read_list(src)));
        }
    }
    
    
    
    
    
    
    Atom* read_symbol(char start_ch, std::stringstream& src)
    {
        char ch = start_ch;
        std::string ostr = "";
    
    
        while (!src.eof())
        {
            ostr += ch;
            char temp = src.peek();
    
    
            if (std::iswspace(temp) || temp == '(' || temp == ')')
            {
                break;
            }
            src >> ch;
        }
        return new Symbol(ostr);
    }
    
    
    
    
    
    
    Atom* read_number(char start_ch, std::stringstream& src)
    {
        char ch = start_ch;
        std::string ostr = "";
        bool fp = false;
    
    
        while (!src.eof())
        {
            if (ch == '.')
            {
                if (fp)
                {
                    throw new invalid_numeric_value_exception();
                }
                else
                {
                    fp = true;
                }
            }
            ostr += ch;
            char temp = src.peek();
            if (std::isdigit(temp) || temp == '.')
            {
                src >> ch;
            }
            else
            {
                break;
            }
        }
    
    
        if (fp)
        {
            return new FP_Double(std::strtod(ostr.c_str(), nullptr));
        }
        else
        {
            return new Integer64(std::strtol(ostr.c_str(), nullptr, 10));
        }
    }

    The Doctest test in question is:


    Code:
        TEST_CASE("improper list of three elements")
        {
            std::stringstream src;
            src << "(foo bar . quux)";
            Atom* test = read_expression(src);
            Pair* test_list = dynamic_cast<Pair*>(test);
            CHECK(typeid(*(test_list->get_car())) == typeid(Symbol));
            Symbol* test_car = dynamic_cast<Symbol*>(test_list->get_car());
            CHECK(test_car->value() == "foo");
            CHECK(typeid(*(test_list->get_cdr())) == typeid(Pair));
            Pair* test_cdr = dynamic_cast<Pair*>(test_list->get_cdr());
            Atom* test_cadr = test_cdr->get_car();
            CHECK(typeid(*test_cadr) == typeid(Symbol));
            Symbol* test_symbol = dynamic_cast<Symbol*>(test_cadr);
            CHECK(test_symbol->value() == "bar");
            Atom* test_cddr = test_cdr->get_cdr();
            test_symbol = dynamic_cast<Symbol*>(test_cddr);
            std::cout << typeid(*test_cddr).name() << std::endl;
            CHECK(typeid(*test_cddr) == typeid(Symbol));
            CHECK(test->to_string() == "(foo bar . quux)");
            delete test;
        }
    The Doctest output is:
    Code:
    $ bin/test-expr-reading [doctest] doctest version is "2.4.9"
    [doctest] run with "--help" for options
    4Pair
    ===============================================================================
    tests/test-expr-reading.cpp:274:
    TEST SUITE: reading s-expressions and converting them to strings
    TEST CASE:  improper list of three elements
    
    
    tests/test-expr-reading.cpp:291: ERROR: CHECK( typeid(*test_cddr) == typeid(Symbol) ) is NOT correct!
      values: CHECK( {?} == {?} )
    
    
    tests/test-expr-reading.cpp:294: ERROR: CHECK( test->to_string() == "(foo bar . quux)" ) is NOT correct!
      values: CHECK( (foo bar  quux) == (foo bar . quux) )
    
    
    ===============================================================================
    [doctest] test cases: 25 | 24 passed | 1 failed | 0 skipped
    [doctest] assertions: 73 | 71 passed | 2 failed |
    [doctest] Status: FAILURE!

  2. #2
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    I was able to fix the faults, and even some additional issues which cropped up later. While this is only the first step in the project, it is nice to see that I was able to get it working.

  3. #3
    Registered User
    Join Date
    Dec 2017
    Posts
    1,628
    Does your new code properly handle:
    "()"
    "(())"
    "((()))"
    "(()())"
    "(().())"
    etc.

    How about:
    "((a . b) (c . d))"
    "((a . b) . (c . d))"

    What about erroneous input, like (a . . b) ?

    For future reference, especially for a program as small and simple as this, you should put the code together in a single file as a runnable program so that it's easy for us to test. And don't bother with doctest. A sufficient main could be:
    Code:
    int main() {
        std::stringstream src;
        src << "(foo bar . quux)";
        Atom* test = read_expression(src);
        std::cout << test->to_string() << '\n';
        delete test;
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

  4. #4
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    Quote Originally Posted by john.c View Post
    Does your new code properly handle:
    "()"
    "(())"
    "((()))"
    "(()())"
    "(().())"
    etc.
    I hadn't tried those, and indeed they failed when I added them. I will work on those.

    Quote Originally Posted by john.c View Post
    How about:
    "((a . b) (c . d))"
    "((a . b) . (c . d))"
    Those cases are tested for, and working.

    Quote Originally Posted by =john.c View Post
    What about erroneous input, like (a . . b) ?
    I still need to add additional tests for those cases, though I have at least one such case successfully tested:

    Code:
        TEST_CASE("malformed improper list")    {
            std::stringstream src;
            src << "(foo bar . baz quux)";
            REQUIRE_THROWS(read_expression(src));
        }
    Quote Originally Posted by =john.c View Post
    For future reference, especially for a program as small and simple as this, you should put the code together in a single file as a runnable program so that it's easy for us to test. And don't bother with doctest. A sufficient main could be:
    Code:
    int main() {
        std::stringstream src;
        src << "(foo bar . quux)";
        Atom* test = read_expression(src);
        std::cout << test->to_string() << '\n';
        delete test;
    }
    I have to disagree on this part, as this is only the first stage of a much longer project, and I find using Doctest useful in organizing the tests.

  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    1,628
    I have to disagree on this part, as this is only the first stage of a much longer project, and I find using Doctest useful in organizing the tests.
    You are completely missing the point.
    I'm not saying that's the way you should structure your actual program.
    I'm saying that if you want someone to help you then you need to make it as easy as possible.
    If possible, we should be able to copy/paste the code and run it.
    You didn't post something that could be run easily.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  6. #6
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    Ah, I did miss your point, sorry about that.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Beginner's Calculator: Problem with expressions
    By SpudCake in forum C++ Programming
    Replies: 8
    Last Post: 10-12-2011, 05:36 AM
  2. detab---improper spacing
    By dgoodmaniii in forum C Programming
    Replies: 2
    Last Post: 11-15-2009, 08:15 AM
  3. Regular Expressions (regex.h) small problem
    By _Marcel_ in forum C Programming
    Replies: 0
    Last Post: 03-31-2009, 05:13 AM
  4. Improper use of typedef
    By aama100 in forum C++ Programming
    Replies: 3
    Last Post: 01-27-2008, 04:50 PM
  5. improper pointer/ integer combination?
    By pinkpenguin in forum C Programming
    Replies: 4
    Last Post: 11-16-2005, 03:47 PM

Tags for this Thread