Thread: Potential of a compiler that creates the executable at once

  1. #1
    Registered User
    Join Date
    Oct 2021
    Posts
    138

    Potential of a compiler that creates the executable at once

    A couple of months ago, I found out about a language called [Vox](GitHub - MrSmith33/vox: Vox language compiler. AOT / JIT / Linker. Zero dependencies) which uses a design that I haven't seen before by any other compiler which is to not create object files and then link them together but instead, always create an executable at once. This means that every time we change something in our code, we have to recompile the whole thing. Naturally, you will say that this is a huge problem because we will have to wait a lot of times every time we make a small change to our project but here is the thing... With this design, the compilation times can become really really fast (of course the design of the compiler matters too)!

    At some point about 3 months ago, the creator of the language said that at that point, Vox can compile 1.2M LoC/S which is really really fast and this is a point that 99% of the projects will not reach so your project will always compiler in less than a second no matter what! What is even more impressive is that Vox is single thread so when parsing the files for symbols and errors, why could get a much bigger performance boost if we had multithread support!

    Of course, not creating object files and then link them means that we don't have to create a lot of object files and then link them all into a big executable but rather start creating this executable and add everything up. You can understand how this can save a lot of time! And CPUs are so fast in our days that we can compile Million lines of code in less than a second using multi-thead support so even then very rare huge projects will compile very fast.

    What's even more impressive is that Vox is not even the fastest compiler out there. TCC is even faster (about 4-5 times)! I have personally tried to see how fast TCC is able to compile using my CPU which is Ryzen 5 2400G. I was able to compile 4M LoC in 700ms! Yeah, the speeds are crazy! And my CPU is an average one, if you were to build a PC now, you would get something that is at least 20% faster with at least 2 more threads!

    However, this is not the best test. This was only an one-line functions that had the same assembly code in it without any preprocess and libraries linked so I don't know if this played any role but that was 8 files using 8 threads and the speed is just unreal! And TCC DOES create object files and then links them. How faster it could be if it used the same design Vox uses (And how slower would Vox be if it used the same design regular compilers use?)?

    Of course, TCC doesn't produce optimized code but still, even when compared with GCC's "-O0", it generates code 4-7 times faster than GCC so if TCC could optimize code as much as GCC and was using the design Vox used, I can see it been able to compile around 1-1.5M LoC/s!

    I am personally really interested and inspired to make my own compiler by this design. This design also solves a lot of problems that we would have to take into account in the other classic method. One thing that I thought was the ability to be able to also export your project as a library (mostly shared/dynamic) so in case you have something really huge like 10+M LoC (Linux kernel I'm talking to you!), you could split it to "sub projects" that will be libraries and then link them all together.

    Another idea would be to check the type of the files that are passed to the compiler and if they are source files, do not create object files as they would not be kept anyways. So the following would apply:

    Code:
    my_lang -c test3.lang // compile mode! Outputs the object files "test3.o"
    
    my_lang -c test1.lang test2.lang test3.o -o=TEST // Create executable.
    "test1.lang" and "test2.lang" are source files so we won't create object files for them
    but rather will go straight to create a binary out of them.
    "test3.o" is an object files so we will "copy-past" its symbols
    to the final binary file.
    This is probably the best of both worlds!

    So I thought about sharing this and see what your thoughts are!

  2. #2
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    To paraphrase myself, designing a compiler is comparable to developing an operating system or an RDBMS - while developing a simple compiler isn't especially difficult (either using lexer and parser generators, or writing a recursive-descent parser by hand), a production-grade one is a major undertaking. Designing a new language is itself a similarly difficult undertaking - a simple/minimalistic language is generally a lot easier to design than a complex one. Designing a standard library for the new language is a similarly complex task whose difficulty scales with one's ambition.

    This isn't to discourage you - I am in a similar situation, though I have been stuck fro several years do to motivation problems. You may not face that difficulty, in which case it is entirely within your grasp, if you put in the time and effort, and understand that you will probably go down several garden paths and blind alleys before settling on a final design.

    That having been said, do you need any advice? My first suggestion, aside from the obvious ones (e.g., use version control software and an offsite repo, things like that), is to read up on the subject as much as you can. To paraphrase myself again, few if any topics in computer science are as well-studied as compiler design. The literature on the topic is virtually endless.

    There are countless books and tutorials on crafting a compiler, some of which are free and others readily available at university libraries if you have access to one. One which I have been meaning to read is Crafting Interpreters by Robert Nystrom, though I cannot say how good it is. The classic textbook on the topic is, of course, the Dragon Book, Compilers: Principles, Techniques, and Tools by Aho, Sethi and Ullman. However, even the most recent edition is rather old now, and it isn't the easiest textbook to read. I personally prefer Modern Compiler Design by Grune et. al., as it is more approachable and covers topics that weren't current when the Dragon Book was last updated.

    For a different, and somewhat simpler, approach than those, you could try reading "An Incremental Approach to Compiler Design" by Abdulziz Ghoulom, which gives an alternative method of writing a simple compiler; the focus is on the Scheme language as both the implementation language and the language being compiled, but much of it is applicable more generally (and I would honestly recommend taking a look at the Lisp family of languages anyway, as a way of broadening your views on what can and cannot make a good language). While it is not a final step, it may be a worthwhile side trip on your journey.

    As for how TCC does it, my guess is three-fold: first, it is probably a single-target compiler, which simplifies the overall design by cutting out the need for separate front and back ends. Second, it probably has a more tuned lexer, which aside from optimization is the more time consuming part of a compiler. Third, it presumably compiles directly to an executable, whereas GCC compiles to assembly code and by default pipes the output directly to GAS and from there to LD.
    Last edited by Schol-R-LEA-2; 02-10-2022 at 12:19 PM.

  3. #3
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    To paraphrase myself, designing a compiler is comparable to developing an operating system or an RDBMS - while developing a simple compiler isn't especially difficult (either using lexer and parser generators, or writing a recursive-descent parser by hand), a production-grade one is a major undertaking. Designing a new language is itself a similarly difficult undertaking - a simple/minimalistic language is generally a lot easier to design than a complex one. Designing a standard library for the new language is a similarly complex task whose difficulty scales with one's ambition.
    Believe it or not, my motivation to make a language was to create the perfect language but the reason compilation speeds matter is because I wanted to at some point create an OS with it and I was inspired by *BSD and Gentoo based distros that compile stuff from source so I want a language that will compile very very fast so this can become the norm for everyone with all the advantages it brings!

    This isn't to discourage you - I am in a similar situation, though I have been stuck fro several years do to motivation problems. You may not face that difficulty, in which case it is entirely within your grasp, if you put in the time and effort, and understand that you will probably go down several garden paths and blind alleys before settling on a final design.
    Well, motivation is my problem too. I kinda try to find stuff on my own and write the frontend from scratch realizing how to do that as I go. Tho it seems that this results to my code been harder to debug and my getting stressed out a lot and wanting to give up. Would you like to talk me about the design you have come so far? I could also share mine too! If you are interested, you can give me your email so I could reach to you. Of course if you prefer another way, let me know!

    That having been said, do you need any advice? My first suggestion, aside from the obvious ones (e.g., use version control software and an offsite repo, things like that), is to read up on the subject as much as you can. To paraphrase myself again, few if any topics in computer science are as well-studied as compiler design. The literature on the topic is virtually endless.
    I'm glad that that this topic is so well-studied but reading and theory are my weak points. Well, I just wished that there was a book that straightly showed you how to design a VERY SIMPLE language in assembly and showed the code from the beginning to the end. I mean, theory is as important but come on! Who wants to read 10 pages of theory to get 10 lines of code out of it. I like writing code and reading small to medium function and understand what they do. I need and like theory but not at the amount people do it in these books. That's also why I don't give money for any book. I could also think the theory by myself tbh after learning the general structure. I just need to see the implementation. I need to learn how to write code. Things that are common will also transfer to other things as well. If I learn how to create a good parsers (for example), I will be able to write a parser for a compiler, for a file format, for my own little domain-spefic language, and for anything in general.

    Yet, the idea behind a parser is so so simple. It's should have been the easiest thing to do that even a stupid person could do but guess what... My code has bugs!!! You wouldn't expect that right? So yeah, I just losing my hope and I'm also turning away from programming in general as I can't find any source that talks about practice and shows the source coed and I have to figure out myself how to turn theory into practice and If I have to figure out things by myself, then why do I read a book? And especially If I paid to buy it....

    I have been burned so many times with reading stuff and spending hours just to realize that it goes to nothing in the end and I'm just wasting my time. At this point I'm just very picky and afraid to commit to anything because I just prefer to relax and listen to some music that read a book that will not offer my something in the end. Like you said in the comment (in the rephrase), I'm hurting myself by trying to figure out things but I can't seem to find anything that makes me feel like doing any progress.

    There are countless books and tutorials on crafting a compiler, some of which are free and others readily available at university libraries if you have access to one. One which I have been meaning to read is Crafting Interpreters by Robert Nystrom, though I cannot say how good it is. The classic > textbook on the topic is, of course, the Dragon Book, Compilers: Principles, Techniques, and Tools by Aho, Sethi and Ullman. However, even the most recent edition is rather old now, and it isn't the easiest textbook to read. I personally prefer Modern Compiler Design by Grune et. al., as it is more approachable and covers topics that weren't current when the Dragon Book was last updated.

    For a different, and somewhat simpler, approach than those, you could try reading "An Incremental Approach to Compiler Design" by Abdulziz Ghoulom, which gives an alternative method of writing a simple compiler; the focus is on the Scheme language as both the implementation language and the language being compiled, but much of it is applicable more generally (and I would honestly recommend taking a look at the Lisp family of languages anyway, as a way of broadening your views on what can and cannot make a good language). While it is not a final step, it may be a worthwhile side trip on your journey.
    Thank you! "Crafting Interpreters" is amazing because it is actually a book that shows the code but the thing is that I want to write a compiler and not an interpreter. The problem with that the first chapter (in the C version) talks about the VM and I don't know if the following chapters that talk about the frontend build on it so I don't know if it is the best thing to read. I just checked "modern compiler design" and it seems to be amazing! It is modern so I suppose good competitive practices and it seems to explain things greatly and there is the code (in C!!) that turns the theory into practice it seems to be my dream book! I will read it and I hope everything goes well! Thanks a lot!

    As for how TCC does it, my guess is three-fold: first, it is probably a single-target compiler, which simplifies the overall design by cutting out the need for separate front and back ends. Second, it probably has a more tuned lexer, which aside from optimization is the more time consuming part of a compiler. Third, it presumably compiles directly to an executable, whereas GCC compiles to assembly code and by default pipes the output directly to GAS and from there to LD.


    Yeah, even if TCC and GCC had the same backend, the fact that GCC does 2 extra steps drastically worsens (is this even a word?) the compilation times.

    TCC does: `Source code -> Object file -> Linking`
    And GCC does: `Source code -> IR -> Assembly -> Object file -> Linking`
    Vox does: `Source code -> Linking (actually not linking as there is nothing to link )`

    And TCC can compile and link code faster than "NASM/GAS" and "LD" can do which even adds to it! Not generating assembly can also be faster because the compiler can optimize it. 1 line of code in see can translate to 20 (ok maybe I'm saying too much) of code in assembly. So the compiler will read less stuff and create the instructions (in binary) directly rather than having to read the resulting files in assembly that will contain 10 times more text than the original source file.

    Of course this not the only reason, hence why Vox is not faster than TCC (and keep in mind the Vox language has a design that allows for better compilation times) but this plays a role to why the speed between TCC and GCC is 4-7 (or even better?) times even when GCC doesn't do any optimizations.

    Just my two cents here! See? told you I'm good with theory
    Last edited by rempas; 02-11-2022 at 08:14 AM.

  4. #4
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    Quote Originally Posted by rempas View Post

    Would you like to talk me about the design you have come so far? I could also share mine too! If you are interested, you can give me your email so I could reach to you. Of course if you prefer another way, let me know!
    I'd be willing to discuss it here, a bit, though honestly I am not sure how much my own experiences will help you. Most of my interest is in the Lisp family of languages, and the language which I have been trying to design is intended as a system language with an s-expression syntax (the runtime support would be in the form of various libraries; the use of macros would allow the compiler itself to be extended suitably, or at least that is my intention). Unfortunately, I have repeatedly developed and then scrapped various ideas of how to do this, with very little long-term progress.

    I've considered many other sorts of languages over the years, but the over-all results were less than promising.

    I did write a simple compiler once, for a course on the topic about 14 years ago, but it was written in Python and implemented a subset of the ancient Algol-60 language. I never even finished the compiler, getting only the most basic syntax implemented. I am frankly somewhat embarrassed by some of the parts of it, especially how I misunderstood how to effectively implement a Deterministic Finite State Automaton (the design I used works after a fashion, but is completely impractical for a serious compiler).

    I will try to answer whatever questions you have as best I can, but I can only do so much I am afraid.
    Last edited by Schol-R-LEA-2; 02-11-2022 at 05:24 PM.

  5. #5
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    I'd be willing to discuss it here, a bit, though honestly I am not sure how much my own experiences will help you. Most of my interest is in the Lisp family of languages, and the language which I have been trying to design is intended as a system language with an s-expression syntax (the runtime support would be in the form of various libraries; the use of macros would allow the compiler itself to be extended suitably, or at least that is my intention). Unfortunately, I have repeatedly developed and then scrapped various ideas of how to do this, with very little long-term progress.

    I've considered many other sorts of languages over the years, but the over-all results were less than promising.

    I did write a simple compiler once, for a course on the topic about 14 years ago, but it was written in Python and implemented a subset of the ancient Algol-60 language. I never even finished the compiler, getting only the most basic syntax implemented. I am frankly somewhat embarrassed by some of the parts of it, especially how I misunderstood how to effectively implement a Deterministic Finite State Automaton (the design I used works after a fashion, but is completely impractical for a serious compiler).

    I will try to answer whatever questions you have as best I can, but I can only do so much I am afraid.
    Actually, I didn't thought about making questions but about showing you my ideas about the design of my new language so I was just curious if you wanted to see what I'm about now in the design. However, as it seems that you are mostly interested in the Lisp family of languages, you may not be interested to see. So yeah, I thought about a talk and If you are interested to see my ideas and tell me your thoughts and suggestions, I would be really glad to set up a file to demonstrate what I have come up until now. I could also give the syntax file I have created for Neovim . Let me know if you are interested! If not, thank you for everything and I wish you the best!

    P.S. I don't think you should be embarrassed because your projects is still something compared to me not doing anything yet. Keep up man!!!

  6. #6
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    I would love to see your work, thank you. I may be focusing on Lisps, but I have a general interest in new programming languages.

  7. #7
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    I would love to see your work, thank you. I may be focusing on Lisps, but I have a general interest in new programming languages.
    Thank you! I saw about LISP-like languages and I've seen a post about the words of Richard Stallman saying that if you don't know LISP you don't know how it is to be able to have the ability to express yourself and solve you problem the way you want. I don't remember his words exactly but it made me interested in LISP but ended up not giving a lot of attention because I was (and I want to believe and hope that I'm not anymore) lazy at the time. I also looked at Haskell. Tbh, I found the classic programming language program structure and syntax to be a little bit problematic. I may look at functional language syntax in the future as I want something unique and different for my language.

    Now, for the current syntax, I will write a file and upload it as soon as I can. I see that we can start a conversation here so I will keep you updated when I'm ready!

  8. #8
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Here is the link to the file, hope you like it!

    Upload files for free - LANG_PRESENT - ufile.io

  9. #9
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    OK, let me make a few comments right away.


    • Since you seem to have a lot of the basic design set, I would suggest that you start working on both a lexer grammar and a more complete syntax grammar for the language. This is a necessary step in implementation, and could help clarify some of your ideas.
    • For the base numeric types, perhaps you could follow the approach taken by Ada and PL/1, and have primary int and float types which take a size argument, e.g., int<32>, float<32>, etc. This would allow you to specify non-byte sizes which the compiler could enforce. You could even add an optional start and offset points, which would allow for arbitrary ranges. For example, int<12, -2000, 2000> would give an integer value with a range (-2000, 2000). The compiler could then determine the best fit in terms of bytes from that.
    • If you follow the previous suggestion, then you will want to be able to have user-defined integer and float sub-types.
    • While you may find them confusing, unsigned types are pretty crucial for systems programming, as many of the fields needed in (i.e.) page tables have to be unsigned. Fortunately, the previous suggestion gives a simple solution to that - you could simply define a type with an starting index of 0 for any unsigned types.
    • I would recommend using UTF-8 characters rather than strict ASCII. While this does mean handling multi-byte characters, limiting your character set to ASCII is likely to come back to bite you later. Better still, you might want to allow the character model to be defined by the program, with a system default being available.



    I will take a deeper dive into the document later when I have some more time.

  10. #10
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Thank you for the suggestions man!!!


    • Since you seem to have a lot of the basic design set, I would suggest that you start working on both a lexer grammar and a more complete syntax grammar for the language. This is a necessary step in implementation, and could help clarify some of your ideas.
    I already did something actually but it turns out that it was hard to work with so I'm starting over. I'm reading the "compiler design" book so I suppose this will help me! Again, that's thanks to you



    • For the base numeric types, perhaps you could follow the approach taken by Ada and PL/1, and have primary int and float types which take a size argument, e.g., int<32>, float<32>, etc. This would allow you to specify non-byte sizes which the compiler could enforce. You could even add an optional start and offset points, which would allow for arbitrary ranges. For example, int<12, -2000, 2000> would give an integer value with a range (-2000, 2000). The compiler could then determine the best fit in terms of bytes from that.
    • If you follow the previous suggestion, then you will want to be able to have user-defined integer and float sub-types.
    I will have a look on that one. If I understand it correctly, this is like "bitfields" from C. I will see how this can be implemented in low level machine code. But this may not be a feature in release.



    • While you may find them confusing, unsigned types are pretty crucial for systems programming, as many of the fields needed in (i.e.) page tables have to be unsigned. Fortunately, the previous suggestion gives a simple solution to that - you could simply define a type with an starting index of 0 for any unsigned types.
    Yeah, I should support that no matter how much confusing is seems probably. The problem is that I don't (yet) understand how it works under the hood. The machine doesn't have e disguise between "positive" and "negative" values. The machine just sees an array or "0" and "1". If you see the assembly output of a compiler (say GCC) for unsigned numbers, you will understand what I'm saying. For example, try this code:

    Code:
    int main() {   unsigned char val = 250;   val = 10; }
    Compile with: `gcc -S test.c` and you will see that "250" has been taken as "-6". Cool ha?



    • I would recommend using UTF-8 characters rather than strict ASCII. While this does mean handling multi-byte characters, limiting your character set to ASCII is likely to come back to bite you later. Better still, you might want to allow the character model to be defined by the program, with a system default being available.
    Using "UTF-8" where exactly? In the parser or in the data types (char, string)? If you mean the latter, from what I know, the Unicode encoding is chosen from the operating system. The machine just sees an array of bytes and then they OS decides how it will display the characters. If you are interested, see the post and replies here: How to print unicode characters (no library)? - D Programming Language Discussion Forum

    I will take a deeper dive into the document later when I have some more time.
    Take your time! Thanks a lot for everything!

  11. #11
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    Quote Originally Posted by rempas View Post
    Yeah, I should support that no matter how much confusing is seems probably. The problem is that I don't (yet) understand how it works under the hood. The machine doesn't have e disguise between "positive" and "negative" values. The machine just sees an array or "0" and "1". If you see the assembly output of a compiler (say GCC) for unsigned numbers, you will understand what I'm saying. For example, try this code:

    Code:
    int main() {   unsigned char val = 250;   val = 10; }
    Compile with: `gcc -S test.c` and you will see that "250" has been taken as "-6". Cool ha?
    OK, I am not certain I can fully explain the various methods used to represent signed numbers in general, but I can tell you that nearly every modern CPU architecture uses a method called "two's complement" to represent signed integer values, while floating-point numbers generally use an approach called "signed magnitude" as defined by certain IEEE standards.

    In 2's complement formats, for a given bit-width, positive numbers are represented by a range covering all but the most significant bit (MSB); thus, for one byte, a positive value would be one where the 8th bit is clear:

    0XXXXXXX - where a valid positive number is between 0 and 127

    Only negative numbers use the 8th bit, making it easy to determine if a number is negative or not.

    Now, the naive approach to this might be to simply use the 8th bit as a sign bit, meaning that the absolute value of a given number has the same representation whether positive or negative. However, this approach requires additional hardware support, and also has a problem with having both +0 and -0, as well as other flaws which I don't think I understand well enough to explain.

    The next solution is to take the complement of the positive number and use that as the negative, the equivalent or applying [i](n XOR 11111111) to the value (for byte values). For example,

    01101011

    would become

    10010100

    This retains the ability to determine sign by checking the MSB, but doesn't require special hardware for handling sign (though most instruction sets do have a 'set to negative' instruction or standard sequence for producing a negative value). The problem with this is that it still has both a +0 and a -0.

    The solution to this is to take the complement, then adding 1 to the result, meaning that the value above in 2's complement becomes:

    10010101

    The advantage here is that not only does zero have a single representation (since 11111111 + 1 wraps around to become 00000000), but due to some mathematical magic I don't properly understand, allows the exact same addition, subtraction, multiplication, and division hardware to work the same regardless of whether the value is signed or unsigned.

  12. #12
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    OK, I am not certain I can fully explain the various methods used to represent signed numbers in general, but I can tell you that nearly every modern CPU architecture uses a method called "two's complement" to represent signed integer values, while floating-point numbers generally use an approach called "signed magnitude" as defined by certain IEEE standards.

    In 2's complement formats, for a given bit-width, positive numbers are represented by a range covering all but the most significant bit (MSB); thus, for one byte, a positive value would be one where the 8th bit is clear:

    0XXXXXXX - where a valid positive number is between 0 and 127

    Only negative numbers use the 8th bit, making it easy to determine if a number is negative or not.

    Now, the naive approach to this might be to simply use the 8th bit as a sign bit, meaning that the absolute value of a given number has the same representation whether positive or negative. However, this approach requires additional hardware support, and also has a problem with having both +0 and -0, as well as other flaws which I don't think I understand well enough to explain.

    The next solution is to take the complement of the positive number and use that as the negative, the equivalent or applying [i](n XOR 11111111) to the value (for byte values). For example,

    01101011

    would become

    10010100

    This retains the ability to determine sign by checking the MSB, but doesn't require special hardware for handling sign (though most instruction sets do have a 'set to negative' instruction or standard sequence for producing a negative value). The problem with this is that it still has both a +0 and a -0.

    The solution to this is to take the complement, then adding 1 to the result, meaning that the value above in 2's complement becomes:

    10010101

    The advantage here is that not only does zero have a single representation (since 11111111 + 1 wraps around to become 00000000), but due to some mathematical magic I don't properly understand, allows the exact same addition, subtraction, multiplication, and division hardware to work the same regardless of whether the value is signed or unsigned.
    Thank you! I'm actually aware of 2's complement. I made my research and it seems that there are instruction that treat numbers as unsigned and other that treat numbers as signed. So this is probably how it works! It will play a little bit and I'll found out! Thanks a lot!

  13. #13
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    BTW, aside from C, C++, Python, D, and Rust, what languages have do you (even if only a little bit)? I am specifically wondering if you are familiar with the Wirth languages (particularly Pascal, Modula-2, and Oberon), or the closely related Ada. While these sorts of languages are out of favor today, it is still worth knowing something about them.

    Also, are you at all familiar with either Forth, Smalltalk, or Prolog? They take very different directions from conventional languages, and even a limited familiarity with them might be worth having.

    As I've said before, my own interests mainly lie with Scheme, and one of the major advantages of Scheme is that the core language is very easy to learn in a short time, but at the same time it is an amazingly flexible and powerful language in several ways (though it has several problems as well, most of which relate to the rather minimalist standard library and the incompatibilities in the various implementations). You might want to take a few days to familiarize yourself with some Scheme implementation or dialect thereof (Racket is probably the most practical choice, since that seems to be the best supported lately) just to fill in the blanks on the topic a bit.

  14. #14
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by Schol-R-LEA-2 View Post
    BTW, aside from C, C++, Python, D, and Rust, what languages have do you (even if only a little bit)? I am specifically wondering if you are familiar with the Wirth languages (particularly Pascal, Modula-2, and Oberon), or the closely related Ada. While these sorts of languages are out of favor today, it is still worth knowing something about them.

    Also, are you at all familiar with either Forth, Smalltalk, or Prolog? They take very different directions from conventional languages, and even a limited familiarity with them might be worth having.

    As I've said before, my own interests mainly lie with Scheme, and one of the major advantages of Scheme is that the core language is very easy to learn in a short time, but at the same time it is an amazingly flexible and powerful language in several ways (though it has several problems as well, most of which relate to the rather minimalist standard library and the incompatibilities in the various implementations). You might want to take a few days to familiarize yourself with some Scheme implementation or dialect thereof (Racket is probably the most practical choice, since that seems to be the best supported lately) just to fill in the blanks on the topic a bit.
    I have not seen any language that you mentioned (except for some Lisps and Racket but even them, only a little bit). Other than that, I have seen a lot of languages. With random order:

    Vox, Lua, JS, Java, Emacs Lisp, Crystal, Ruby, Haskell, Scala, Typescipt, Go, V, Nim, Vala, Zig, Rescript, Kotlin, C#, OCaml, X86_X64 assembly (LOL!!!) and probably others that I don't even remember....

    I got away from most of them mostly because of slow compilation times (anything slower than GCC) or slow runtime performance (everything that runs more than 2 time slower than GCC -O3). From others because of other reasons? The big problem is that I haven't find something specific to do with each language. I was always searching for the best programming language™️ for general purpose and I start to realize that even if I was to find that, I wouldn't had something to do with it. So I need libraries, frameworks and tools to get interested and work with. That or programming is not for me BUT there were times that I've wrote code and I felt a BEAUTIFUL felling so I really really really don't want to leave programming. I think I must find what I enjoy and start overthinking but stop overthinking in general is the hardest thing to do for me. Do you think that there is a psychological condition for that and than I need help? No seriously now, I'm asking this unironically...

    I will check all the languages you mentioned, especially Racket! I think I will also give another chance (ore the only chance cause I left some languages for stupid reasons) to some of the languages that I mentioned most notably JS/Typescript (for the library collection), Lua, Nim and Zig.

  15. #15
    Registered User
    Join Date
    Feb 2022
    Posts
    45
    I've come up with a partial token grammar for the language as you've described it, in EBNF notation, though some of it will need to be expanded upon or changed as the language design progresses. This should give you a leg up on defining the lexical analyzer when the time comes to implement it.

    Code:
    token ::= <keyword> | <number> | <char> | <string> | <identifier> |
              <assignment-op> | <arithmetic-op> | <bitwise-op> | <logical-op> | 
              <paren> | <colon> | <right-arrow> | <angle-bracket> | <brace> | 
              <comma> | <period> 
    keyword ::= <base-type> | "if" | "elif" | "else" | 
                "loop" | "while" | "once" | "for" | 
                "fn" | "let" | "mut" | "return" | "object" | 
                "import" | "as" | "alias" |
                "and" | "or" | "not"
    base-type ::= "i8"| "i16" | "i32" | "i64" |"f32" | "f64" | 
                  "bool" | "char" | "string" | "ptr" | "Array" | "Vector"
    identifer ::= <alpha><alphanum>*
    alpha ::= "A" | "a" |"B" | "b" | "C" | "c"  ... | Z" | "z"
    alphanum ::= <alpha> | <digit>
    bit ::= "0" | "1"
    octal-digit ::= <bit> | "2" | "3" | "4" | "5" | "6" | "7" 
    digit ::= <octal-digit> | "8" | "9"
    hex-digit ::= <digit> | "A" | "a" |"B" | "b" | "C" | "c" | "D" | "d" | "E" | "e" | "F" | "f"
    number ::= <integer> | <float>
    integer ::= <digit>+ | "0b" <bit> | "0" <octal-digit> | "0x" <hex-digit> 
    float ::= <integer> <period> <integer> {("E"|"e") <integer>}
    char ::= <quote> {<printable-character>} <quote>
    string ::= <double-quote> {(<printable-character> | '\"')}* <double-quote>
    assignment-op ::= "="
    arithmetic-op ::= "+" | "-" | "*" | "/"
    bitwise-op ::= "<<" | ">>" | "&" | "|"
    logical-op ::= "==" | "<" | ">" | "<=" | ">="
    paren ::= <lparen> | <rparen>
    lparen ::= "("
    rparen ::= ")"
    angle-bracket ::= <langle> | <rangle>
    lparen ::= "<"
    rparen ::= ">"
    brace ::= <lbrace> | <rbrace>
    lbrace ::= "{"
    rbrace ::= "}"
    colon ::= ":"
    right-arrow ::= "->"
    comma ::= ","
    period ::= "."
    quote ::= "'"
    double-quote ::= '"'
    Last edited by Schol-R-LEA-2; 02-15-2022 at 08:52 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. potential problems?
    By deepcode in forum C Programming
    Replies: 4
    Last Post: 08-11-2010, 02:04 PM
  2. C# - What potential does it hold?
    By dannysmith in forum C# Programming
    Replies: 14
    Last Post: 11-18-2006, 02:23 PM
  3. calling an executable from an executable
    By dee in forum C Programming
    Replies: 4
    Last Post: 01-10-2004, 01:32 PM
  4. Command line executable not a KDE executable?
    By FillYourBrain in forum Linux Programming
    Replies: 3
    Last Post: 10-03-2003, 12:01 PM

Tags for this Thread