Thread: Writing a language compiled to / on top of C

  1. #1
    Registered User
    Join Date
    May 2010
    Posts
    120

    Writing a language compiled to / on top of C

    Not sure if this is the right place to ask this, but lets give it a try.

    I want to create a language that is built on top of C: it will be compiled to C code and then use the C compiler normally, so it would be a preprocessor on steroids. I have some questions though:

    One thing that worries me is the maximum length of identifiers in C. Since the language is built on top of C, many features will revolve around name dressing, which means that name sizes in the final C source might be a problem. What is generally the maximum length of identifiers? Is that standardized? Any sugestions on how to tackle this problem?

    Do you think this could work, or is the C compiler / debuggers too specialized to give coherent error messages with a very diferent syntax? I know you can change the way the compiler/debugger sees lines with the #line directive, but would it be reasonable to have these every two lines or so throughout the whole file? As for errors, I can handle error checking on my compiler to help this problem.

    I want this to work with every standard C compiler if possible, so any other sugestions you might want to give?

    Thank you in advance.

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    What is generally the maximum length of identifiers? Is that standardized?
    Yes, it is standardized. In C11 for example, the minimum limit of significant characters is 63 for internal identifiers and 31 for external identifiers.

    Do you think this could work, or is the C compiler / debuggers too specialized to give coherent error messages with a very diferent syntax?
    That would certainly be an issue, and if the language is vastly different you'd probably want to pursue a dedicated debugger front end rather than relying on existing ones.

    I want this to work with every standard C compiler if possible, so any other sugestions you might want to give?
    It's a "simple" matter of making sure your output conforms to the C standard and doesn't rely on any implementation quirks.
    My best code is written with the delete key.

  3. #3
    Registered User
    Join Date
    May 2010
    Posts
    120
    Quote Originally Posted by Prelude View Post
    Yes, it is standardized. In C11 for example, the minimum limit of significant characters is 63 for internal identifiers and 31 for external identifiers.
    Hmm, that doesn't give me a lot of space... Are there any "hacks" that I can use to get by this limit, or am I out of luck?

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Hmm, that doesn't give me a lot of space...
    Just how much metadata are you adding? If it's so much that you anticipate hitting an implementation limit, it would probably be better to manage the metadata in separate objects of the symbol table rather than directly in the identifier string. This will complicate variable objects considerably in the output code, but at least you'd move the limit from the back-end C compiler to your own.
    My best code is written with the delete key.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    120
    Quote Originally Posted by Prelude View Post
    Just how much metadata are you adding?
    Lots. I decided that if I want even a simple feature like function overloading, I would have to add the types of every argument to the name of the function. I don't see many ways around this, since it is the only way the object can be referenced from other translation units. I also thought about handling these in internal tables, but that's out of the question because the data has to be visible to the exterior. I thought about hashing, but that doesnt seem such a good idea.

    Maybe I could handle all the translation units of a project at once, this way I could retain a name table. Or even have an external name table file in the source folder. Then I could add functionality to make names external to the "project", which the programmer would use to give actual C compatible names to the identifiers to "export". I could use just about anything for names internal to the project this way. Yeah, I'll figure it out.

    Thanks for the help.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > I want to create a language that is built on top of C: it will be compiled to C code and then use
    > the C compiler normally, so it would be a preprocessor on steroids.
    It's talk like this that wound up giving us C++
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    May 2010
    Posts
    120
    Don't worry, I'll be careful... MOM...

  8. #8
    Registered User
    Join Date
    Mar 2015
    Location
    BE
    Posts
    66
    I am curious .. Is that the way PHP is created ? Is it built on top of C and some components of C++ ?

  9. #9
    Registered User
    Join Date
    May 2010
    Posts
    120
    Quote Originally Posted by Carnotter View Post
    I am curious .. Is that the way PHP is created ? Is it built on top of C and some components of C++ ?
    That doesn't seem right. PHP is not a compiled language, it's an interpreter. The interpreter might be built in C, but PHP code is transformed into HTML. Maybe there are diferent implementations, but I highly doubt that is a thing.

  10. #10
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    Well, I don't see why there's need for long(as non-standard) identifiers. For example, for simple overloading you could do:
    Code:
    // Input code
    void print(int);
    void print(int, int, int);
    void print(char, short, unsigned, float);
    // Output code
    void print_i(int);
    void print_iii(int, int, int);
    void print_csuf(char, short, unsigned, float);
    As you can see, there's no significant increase in the size of the identifiers.
    Devoted my life to programming...

  11. #11
    Registered User
    Join Date
    May 2010
    Posts
    120
    Problem is when we start to use structures. In that case there are no options but to include the whole structure name in the identifier.

  12. #12
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by shiroaisu View Post
    That doesn't seem right. PHP is not a compiled language, it's an interpreter. The interpreter might be built in C, but PHP code is transformed into HTML. Maybe there are diferent implementations, but I highly doubt that is a thing.
    I'm trying to look up "does PHP transform to HTML" but I'm not seeing anything about that. Do you have any sources on that? I'm curious what PHP would look like in HTML considering HTML isn't a programming language but is just a set of markup.

  13. #13
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MutantJohn
    I'm trying to look up "does PHP transform to HTML" but I'm not seeing anything about that. Do you have any sources on that?
    PHP code is not transformed into HTML as part of the stages of compilation/interpretation, but rather HTML is a typical (but not the only possible) output of a PHP script. I believe PHP is normally compiled to some kind of bytecode/opcode that is then interpreted, though unless you use a bytecode/opcode cache this compilation will happen repeatedly.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  14. #14
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Quote Originally Posted by laserlight View Post
    PHP code is not transformed into HTML as part of the stages of compilation/interpretation, but rather HTML is a typical (but not the only possible) output of a PHP script. I believe PHP is normally compiled to some kind of bytecode/opcode that is then interpreted, though unless you use a bytecode/opcode cache this compilation will happen repeatedly.
    Right. I was gonna say, PHP actually does stuff, you know?

  15. #15
    Registered User MacNilly's Avatar
    Join Date
    Oct 2005
    Location
    CA, USA
    Posts
    466
    Quote Originally Posted by MutantJohn View Post
    I'm trying to look up "does PHP transform to HTML" but I'm not seeing anything about that. Do you have any sources on that? I'm curious what PHP would look like in HTML considering HTML isn't a programming language but is just a set of markup.
    I think he means the output of a PHP script is HTML. Obviously PHP cannot be transformed to HTML because HTML is not a programming language.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 18
    Last Post: 11-11-2012, 11:06 PM
  2. Replies: 1
    Last Post: 07-26-2010, 11:14 AM
  3. What Language are you writing in?
    By C-Compiler in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 12-24-2008, 11:51 AM
  4. Replies: 1
    Last Post: 03-12-2008, 12:10 AM
  5. writing an interpreted language
    By Aran in forum C++ Programming
    Replies: 26
    Last Post: 06-18-2002, 11:32 AM