Thread: Transpile to C or create a full compiler (with an assembler and linker)?

  1. #1
    Registered User
    Join Date
    Oct 2021
    Posts
    138

    Transpile to C or create a full compiler (with an assembler and linker)?

    I'm thinking about creating my own "programming language" and I came with two methods of doing it:

    The first way is to create a transpiler. It will take a text file and spit C source code that can then get compiled with an existing C compiler and produce the final executable/library.

    The second way is to create a complete toolchain. It will include an assembler and a linker thus it will have no dependencies. This was inspired by the tiny c compiler (tcc) and I will look at the source code and figure out how to do that. I will also have to learn assembly first cause with my current knowledge, I can't do a lot.

    Of course there are advantages and disadvantages for both of the methods and I will list them bellow. Note that the advantages of one, are the disadvantages of the other so I won't duplicate the text.

    TRANSPILER:
    1. Easier to create. It will be just the parser and it will "traslate" it to the equivalent C source code. The other option will require me weeks (if not years) to even understand how it works.
    2. The C compiler will do the final work so the backend will be able to produce better (faster runtime) source code that I will probably
    3. Having the ability to use a C compiler, means that we can use something that compiles fast (tcc) when the performance doesn't matter (or won't make a noticeable difference) and mix it with something that produces faster code (gcc -Ofast) for the source files that contain code that can be optimized

    COMPILER:
    1. Faster compilation times as the code will go directly from text to binary with nothing in the middle (see how fast tcc compiles!!).
    2. No dependencies. This is mostly for "flexing" but it may even be practical useful in the case that someone wants to create their own OS (not linux distro, OS) and there would be no need for porting other tools.
    3. Full control over anything and I'll be the one deciding how simple and how complex each thing will be. So in general less overhead. It may even allow me to be able to have more "features" that will translate better and produce faster code!?

    This is all I could think of. What are your thoughts?

  2. #2
    Registered User
    Join Date
    Apr 2021
    Posts
    138
    It is not always possible to write a transpiler for a programming language. For example, dynamic languages such as Python or Lisp generally operate on a machine model that is not compatible with the C machine model.

    In those cases, you generally have to write either an interpreter, or a compiler to some form of intermediate code such as Python's byte code.

    If your new language is intended to compile down to "bare metal" like C++ or C or rust, then you're looking at writing a transpiler plus whatever support library. The alternative might be to use the LLVM library and compile down past C into the intermediate-representation. This is what compilers like rust have done successfully.

    I'd say go for it, so long as you are sure the target environments are harmonious.

  3. #3
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    I would suggest using binutils instead of writing your own assembler.

    Binutils
    - GNU Project - Free Software Foundation


    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  4. #4
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by aghast View Post
    It is not always possible to write a transpiler... dynamic languages... operate on a machine model that is not compatible with the C machine model.
    Thanks! I'm aware of that, I'm going to write one that would transpile to C so I don't have a problem with that one. My question is if I should prefer writing a compile, assembler or linker from scratch instead. What's your thoughts on this one?

    Quote Originally Posted by aghast View Post
    I'd say go for it, so long as you are sure the target environments are harmonious.
    Oh! What do you mean with the term "harmonious". I mean, I know what that word means but how would an environment that would not be harmonious would be? Can you please make an example so I can properly understand?

  5. #5
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by stahta01 View Post
    I would suggest using binutils instead of writing your own assembler.

    Tim S.
    Thanks a lot for the suggestion Tim! The things is, binutils are slow and I really care about the compilation speeds so If I'm not going to create a transpiler, I'm going to go the full way!

  6. #6
    Registered User
    Join Date
    Apr 2021
    Posts
    138
    Quote Originally Posted by rempas View Post
    Oh! What do you mean with the term "harmonious". I mean, I know what that word means but how would an environment that would not be harmonious would be? Can you please make an example so I can properly understand?
    If you're transpiling to C, the output will be C code, no matter how complicated or awkwardly-written.

    However, as I pointed out, there are languages with features that you just cannot effectively "write" in C. You can model them using virtual machines, but you can't write them directly. Consider garbage collection. You can model some kinds of GC using reference counting, but that is fairly deterministic and doesn't completely deal with cycles. So if your language is heavily dependent on GC, I'd suggest that you not try to solely transpile your code. Writing an interpreter or a vm + compiler might be a better approach.

    Similar situations pertain for things like co-routines and exceptions: depending on what your new language's specification says about them, you may be able to implement them or model them using C + a support library, or you may find that you need a more supportive implementation. If your language's model of these things is not "harmonious" with C, then transpiling directly to C would be a lose.

    In particular, be wary of stack frames. Modern C compilers can be quite aggressive about eliminating all sorts of stack baggage when even low levels of optimization is turned on. If your "solution" for your language's features depends on being able to unwind the C runtime stack, it will be a fragile solution at best. And if you end up implementing your own "stack frames" as generated local variables or function calls, then you're well on your way to not transpiling any more...

  7. #7
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by aghast View Post
    If you're transpiling to C, the output will be C code, no matter how complicated or awkwardly-written.

    However, as I pointed out, there are languages with features that you just cannot effectively "write" in C. You can model them using virtual machines, but you can't write them directly. Consider garbage collection. You can model some kinds of GC using reference counting, but that is fairly deterministic and doesn't completely deal with cycles. So if your language is heavily dependent on GC, I'd suggest that you not try to solely transpile your code. Writing an interpreter or a vm + compiler might be a better approach.

    Similar situations pertain for things like co-routines and exceptions: depending on what your new language's specification says about them, you may be able to implement them or model them using C + a support library, or you may find that you need a more supportive implementation. If your language's model of these things is not "harmonious" with C, then transpiling directly to C would be a lose.

    In particular, be wary of stack frames. Modern C compilers can be quite aggressive about eliminating all sorts of stack baggage when even low levels of optimization is turned on. If your "solution" for your language's features depends on being able to unwind the C runtime stack, it will be a fragile solution at best. And if you end up implementing your own "stack frames" as generated local variables or function calls, then you're well on your way to not transpiling any more...
    Thanks tho I don't want to create my own language anymore. Have an amazing day my friend!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Create a pointer to a full text line
    By DecoratorFawn82 in forum C++ Programming
    Replies: 3
    Last Post: 06-04-2013, 10:43 AM
  2. Replies: 3
    Last Post: 12-28-2011, 07:49 AM
  3. How to Create .H files and use Assembler functions?
    By luisvalencia in forum C Programming
    Replies: 7
    Last Post: 04-10-2005, 11:47 PM
  4. Difference between a compiler and an assembler?
    By exluddite in forum Tech Board
    Replies: 2
    Last Post: 05-10-2004, 03:20 PM
  5. assembler and linker stuff...question
    By dirkduck in forum A Brief History of Cprogramming.com
    Replies: 2
    Last Post: 12-17-2001, 12:10 AM

Tags for this Thread