Thread: String parser

  1. #1
    ---
    Join Date
    May 2004
    Posts
    1,379

    String parser

    I'm writing a code formatter to use on the board (similar to Lucky's but not C++). All is going well but I have a few problems which you will probably notice straight away.
    1. Only one keyword is being formatted per line.
    this is because I have a series of if-else-if statements so they all get skipped.
    2. Keywords are being found inside other words.
    3. Currently the single line comments only work when on a line by itself But I haven't really looked into that properly yet. Here is a sample of some formatted code

    Code:
    #include <iostream>
    #include <cstdio>
    // This is a comment
    int main(int argc, char **argv){ // <- problem 1
      int numbers[20], total;
      for(int i = 0; i <= 19; i++){ 
        numbers[i] = i;
        total += i;
      }
    
      printf("This is a string\n");  // <- problem 2
      return 0;
    }
    Anyway, the most important one for me to fix right now is 1. Here is a sample of my code:
    Code:
    void parse_string(char *str, FILE **out){
      /* constants used for strings, comments and preprocessor parsing */
      const int ON   = 1;
      const int OFF  = 0;
      const int WAIT = 11;
    
      /* ptr to keyword in the string */
      char *ptr = NULL;
      int key_size = 0;
      int is_prepro = 0;
      int is_string = 0;
      int is_sline_com = 0;
      int i,j;
    
      for(i=0;i<strlen(str);i++){
        /* Check keywords */
        if(ptr = strstr(str,"char")){
          key_size = 4;
        }
        else if(ptr = strstr(str,"int")){
          key_size = 3;
        }
        else if(ptr = strstr(str,"return")){
          key_size = 6;
        }
        /* continues on... */
    
        /* skipping preprocessor, comment and string parsing code */
    
        /* If the address of str[i] is at the keyword detected... */    
        if(&str[i] == ptr){
            /* If single line comment is OFF */
            if(is_sline_com == OFF){
              /* Enter blue label */
              fprintf(*out,"<blue label here>");
              /* print the keyword */
              for(j=0;j<key_size;j++,i++){
                fputc(str[i],*out);
              }
              /* print end label */
              fprintf(*out,"<end label>");  
            }
          }  
          /* If no key, print the next character */
          fputc(str[i],*out);
    Last edited by sand_man; 08-04-2005 at 08:16 PM.

  2. #2
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    You need a while loop which operates on the remainder of what strstr returned.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  3. #3
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    I belive this might solve your problem number 1.
    Code:
    #include <stdio.h>
    #include <string.h>
    #define KEY_TABLE_SIZE 3
    
    static char* keywords[KEY_TABLE_SIZE]={"char", "int", "return"};
    
    void parse_string(char *str, FILE **out){
      /* constants used for strings, comments and preprocessor parsing */
      const int ON   = 1;
      const int OFF  = 0;
      const int WAIT = 11;
    
      /* ptr to keyword in the string */
      char *ptr = NULL;
      char *next_ptr = NULL;
      int key_size = 0;
      int is_prepro = 0;
      int is_string = 0;
      int is_sline_com = 0;
      int i,j;
      int key_index = 0;
      int str_index = 0;
      int last_str_index = 0;
      int str_len = strlen(str);
    
      while(str_index < str_len){
        /* The smallest ptr for now is the last position '\0' */
        ptr = &str[str_len];
        /* Search for smallest ptr */
        for(key_index=0;key_index<KEY_TABLE_SIZE;key_index++){
          next_ptr = strstr(&str[str_index], keywords[key_index]);
          if(next_ptr != NULL){
    	if(next_ptr < ptr){
    	  ptr = next_ptr;
    	  key_size = strlen(keywords[key_index]);
    	}
          }
        }
    
        str_index = ptr-str;
    
        /* continues on... */
    
        /* skipping preprocessor, comment and string parsing code */
        printf("Keysize is %d\n", key_size);
        printf("Imprimindo de %d a %d\n", last_str_index, str_index);
        /* print not keyword characters */
        for(j=last_str_index;j<str_index;j++){
          fputc(str[j],*out);
        }
        last_str_index = str_index + key_size;	
    
        /* if we found a keyword */
        if( str_index < str_len ){
          /* If single line comment is OFF */
          if(is_sline_com == OFF){
    	/* Enter blue label */
    	fprintf(*out,"<blue label here>");
    	/* print the keyword */
    	for(j=0;j<key_size;j++,str_index++){
    	  fputc(str[str_index],*out);
    	}
    	/* print end label */
    	fprintf(*out,"<end label>");  
          }
        }    
      }	
    }
    
    int main(int argc, char argv**)
    {
      char line[200];
      FILE* in = fopen("input.cpp", "r");
      FILE* out = fopen("out.cpp", "w+");
      while( fgets(line, 200, in) ){
        puts(line);
        parse_string(line, &out);
      }
    
      return 0;
    }
    I have tested the code and seems to work fine. Note however that it is a brute force algorithm, highly inneficient. Just for having some geek fun I could try to implement a better algorithm that I know. Anyway, for small codes this should be quick. Any problem with this code let me know.
    Last edited by Mortissus; 08-04-2005 at 09:29 PM.

  4. #4
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    Also, the second problem is easy to fix. You could check for whitespaces at the begin and end of the found keyword. Besides, I am sure you know the strtok function, that might help you.

  5. #5
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    This is my half-assed and still broken version. It needs work. Who'll finish it for me?

    BTW, I use it like this:
    Code:
    H:\Projects\Toolkit\stdc\test>colorize < colorize.c
    #include <stdio.h>
    #include <string.h>
    
    struct sFormat
    {
       const char  *start;
       const char  *end;
    };
    
    struct sKeyword
    {
       const char            *value;
       const struct sFormat  *format;
    };
    
    static const struct sFormat Format[] =
    {
       { "",   ""}, /* keywords */
       { "",    ""}, /* functions */
       { "",  ""}, /* preprocessor */
       { "", NULL},           /* comment start */
       { NULL,                ""},     /* comment end */
       { "",    ""},     /* standard macros */
       { "",      ""},     /* typedefs */
    };
    
    static const struct sKeyword Keyword[] =
    {
       { "#include", &Format[2]},
       { "#define",  &Format[2]},
       { "#if",      &Format[2]},
       { "#endif",   &Format[2]},
       { "#else",    &Format[2]},
       { "defined",  &Format[2]},
       { "#error",   &Format[2]},
       { "#file",    &Format[2]},
       { "#line",    &Format[2]},
       { "##",       &Format[2]},
       { "#",        &Format[2]},
    
       { "main",     &Format[1]},
       { "sprintf",  &Format[1]},
       { "fprintf",  &Format[1]},
       { "printf",   &Format[1]},
       { "sscanf",   &Format[1]},
       { "fscanf",   &Format[1]},
       { "scanf",    &Format[1]},
       { "fgets",    &Format[1]},
       { "fgetc",    &Format[1]},
       { "getchar",  &Format[1]},
       { "putchar",  &Format[1]},
       { "fputs",    &Format[1]},
       { "fputc",    &Format[1]},
       { "malloc",   &Format[1]},
       { "calloc",   &Format[1]},
       { "realloc",  &Format[1]},
       { "free",     &Format[1]},
       { "atoi",     &Format[1]},
       { "strtol",   &Format[1]},
       { "strtoul",  &Format[1]},
       { "strtod",   &Format[1]},
       { "strlen",   &Format[1]},
       { "strcpy",   &Format[1]},
       { "strcmp",   &Format[1]},
       { "strcat",   &Format[1]},
       { "strncpy",  &Format[1]},
       { "strncmp",  &Format[1]},
       { "strncat",  &Format[1]},
       { "memcpy",   &Format[1]},
       { "memcmp",   &Format[1]},
       { "memset",   &Format[1]},
    
       { "auto",     &Format[0]},
       { "break",    &Format[0]},
       { "case",     &Format[0]},
       { "char",     &Format[0]},
       { "const",    &Format[0]},
       { "continue", &Format[0]},
       { "default",  &Format[0]},
       { "double",   &Format[0]},
       { "do",       &Format[0]},
       { "else",     &Format[0]},
       { "enum",     &Format[0]},
       { "extern",   &Format[0]},
       { "float",    &Format[0]},
       { "for",      &Format[0]},
       { "goto",     &Format[0]},
       { "if",       &Format[0]},
       { "int",      &Format[0]},
       { "long",     &Format[0]},
       { "register", &Format[0]},
       { "return",   &Format[0]},
       { "short",    &Format[0]},
       { "signed",   &Format[0]},
       { "sizeof",   &Format[0]},
       { "static",   &Format[0]},
       { "struct",   &Format[0]},
       { "switch",   &Format[0]},
       { "typedef",  &Format[0]},
       { "union",    &Format[0]},
       { "unsigned", &Format[0]},
       { "void",     &Format[0]},
       { "volatile", &Format[0]},
       { "while",    &Format[0]},
    
       { "/*",       &Format[3]},
       { "*/",       &Format[4]},
    
       { "EOF",      &Format[5]},
       { "NULL",     &Format[5]},
       { "BUFSIZ",   &Format[5]},
       { "stdin",    &Format[5]},
       { "stdout",   &Format[5]},
       { "stderr",   &Format[5]},
       { "size_t",   &Format[5]},
    
       { "0",        &Format[6]},
    };
    
    int main(void)
    {
       char *ch, buffer [ BUFSIZ ];
       while ( fgets(buffer, sizeof buffer, stdin) )
       {
          for ( ch = buffer; *ch; ++ch )
          {
             size_t i;
             for ( i = 0; i < sizeof Keyword / sizeof *Keyword; ++i )
             {
                size_t len = strlen(Keyword[i].value);
                if ( strncmp(ch, Keyword[i].value, len) == 0 )
                {
                   if ( Keyword[i].format->start )
                   {
                      fputs(Keyword[i].format->start, stdout);
                   }
                   fputs(Keyword[i].value, stdout);
                   if ( Keyword[i].format->end )
                   {
                      fputs(Keyword[i].format->end, stdout);
                   }
                   ch += len;
                   break;
                }
             }
             fputc(*ch, stdout);
          }
       }
       return 0;
    }
    As you can see, it too is far from complete.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  6. #6
    ---
    Join Date
    May 2004
    Posts
    1,379
    Yours looks a lot nice than mine. Thanks for the ideas everyone. I'll take another look at it tomorrow.

  7. #7
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Code:
    int main(int argc, char argv**)
    Is that valid? Or is this meant:
    Code:
    int main(int argc, char **argv)
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  8. #8
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    Quote Originally Posted by dwks
    Code:
    int main(int argc, char argv**)
    Is that valid? Or is this meant:
    Code:
    int main(int argc, char **argv)
    Anh... sorry????

  9. #9
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    I liked this idea so much I wrote my own code formatter.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  10. #10
    ---
    Join Date
    May 2004
    Posts
    1,379
    well, show use what you got

  11. #11
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    /* what goes before and after everything else in the file */
    const char startend[2][100] = {
        "
    Code:
    \n"
    Code:
    ,
        "
    \n" }; /* what surrounds each keyword */ const char surround[3][100] = { "[color=", /* before keyword*/ "]", /* after colour */ ""[/color], /* after keyword */ }; /* the valid keywords with their colours */ const char keyword[][100] = { "void", "blue", "char", "blue", "short", "blue", "int", "blue", "long", "blue", "float", "blue", "double", "blue", "unsigned", "blue", "signed", "blue", "sizeof", "blue", "const", "blue", "volatile", "blue", "while", "blue", "do", "blue", "for", "blue", "if", "blue", "else", "blue", "continue", "blue", "break", "blue", "return", "blue", "switch", "blue", "case", "blue", "default", "blue", "goto", "blue", "struct", "blue", "union", "blue", "typedef", "blue", "#define", "green", "#include", "green", "#ifdef", "green", "#ifndef", "green", "#if", "green", "#endif", "green", "#undef", "green", "#error", "green", "#line", "green", "#elif", "green", "#else", "green", "" }; const char slcomment[][100] = { "//", "green" }; const char mlcomment[][100] = { "/*", "*/", "green" }; const char stringcol[100] = { "red" }; const char numbercol[100] = { "black" }; const char newline[100] = { "\n" }; FILE *get_input_file(void); FILE *get_output_file(void); FILE *open_input_file(char *s); FILE *open_output_file(char *s); void parse_file(FILE *fp, FILE *out); void parse_line(char *s, FILE *out); int is_usage(char *s); int print_usage(void); void chomp(char *s); int main(int argc, char *argv[]) { char s[100]; FILE *fp, *out = stdout; if(argc == 1) { if(!(fp=get_input_file())) return 1; if(!(out=get_output_file())) return 1; } else { if(is_usage(argv[1])) return print_usage(); if(!(fp=open_input_file(argv[1]))) return 1; if(argc == 3 && !(out=open_output_file(argv[2]))) return 1; } parse_file(fp, out); fclose(fp); return 0; } FILE *get_input_file(void) { char s[100]; FILE *fp; printf("\nEnter the name of the input file:\n"); fgets(s, sizeof(s), stdin); chomp(s); if(!(fp=open_input_file(s))) return 0; return fp; } FILE *get_output_file(void) { char s[100]; FILE *fp; printf("\nEnter the name of the output file (blank for screen):\n"); fgets(s, sizeof(s), stdin); chomp(s); if(!*s) return stdout; if(!(fp=open_output_file(s))) return 0; return fp; } FILE *open_input_file(char *s) { FILE *fp; if((fp=fopen(s, "rt")) == NULL) { fprintf(stderr, "Can't open input file\n"); return 0; } return fp; } FILE *open_output_file(char *s) { FILE *fp; if((fp=fopen(s, "rt")) != NULL) { fclose(fp); fprintf(stderr, "Output file already exists. Overwrite? (Y/N) "); if(tolower(getchar()) != 'y') return 0; } if((fp=fopen(s, "wt")) == NULL) { fprintf(stderr, "Can't open output file\n"); return 0; } return fp; } void parse_file(FILE *fp, FILE *out) { char s[1000], *p; int x, y, hasnl; fputs(startend[0], out); while(!feof(fp)) { *s = 0; fgets(s, sizeof(s), fp); hasnl = (strchr(s, '\n')?1:0); chomp(s); if(!*s) { if(hasnl) fputs(newline, out); continue; } parse_line(s, out); if(hasnl) fputs(newline, out); } fputs(startend[1], out); } void parse_line(char *s, FILE *out) { int x, before = 1; char *p, *t; static enum {NORMAL, STRING, MULTILINE} mode = NORMAL; size_t len; while(*s) { if(mode == MULTILINE) { if(p=strstr(s, mlcomment[1])) { for(t = s; t != p+2; fputc(*t++, out)) ; fputs(surround[2], out); s = t; mode = NORMAL; continue; } else { fputs(s, out); break; } } else if(mode == STRING) { if(*s && *(s+1)) { p = s; do { p = strchr(p==s?p:p+1, '"'); } while(p && *(p-1) == '\\'); } else break; if(!p) { fputs(s, out); break; } for(t = s; t != p+1; fputc(*t++, out)) ; fputs(surround[2], out); s = t; mode = NORMAL; continue; } if(*s == '"') { fputs(surround[0], out); fputs(stringcol, out); fputs(surround[1], out); fputc(*s++, out); mode = STRING; continue; } if(strncmp(s, mlcomment[0], 2) == 0) { fputs(surround[0], out); fputs(mlcomment[2], out); fputs(surround[1], out); mode = MULTILINE; continue; } for(x = 0; keyword[x*2][0]; x ++) { len = strlen(keyword[x*2]); if(strncmp(s, keyword[x*2], len) == 0 && before && (!*(s+len) || !isalnum(*(s+len)))) { fputs(surround[0], out); fputs(keyword[x*2+1], out); fputs(surround[1], out); fputs(keyword[x*2], out); fputs(surround[2], out); s += len; break; } } if(keyword[x*2][0]) continue; if(strncmp(s, slcomment[0], 2) == 0) { fputs(surround[0], out); fputs(slcomment[1], out); fputs(surround[1], out); fputs(s, out); fputs(surround[2], out); break; } if(before && isdigit(*s)) { fputs(surround[0], out); fputs(numbercol, out); fputs(surround[1], out); while(isalnum(*s)) fputc(*s++, out); fputs(surround[2], out); } before = !isalnum(*s); fputc(*s++, out); } } int is_usage(char *s) { if(strcmp(s, "-?") == 0 || strcmp(s, "-h") == 0 || strcmp(s, "-v") == 0) return 1; return 0; } int print_usage(void) { printf("\nusage: codeform [infile [outfile]]\n"); printf("\ncodeform formats your code in nice colours.\n"); return 0; } void chomp(char *s) { while(*s && *s != '\n') s++; *s = 0; }
    Formatted with itself.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  12. #12
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    You know what I really wish more syntax highlighters had? Different colors for different levels of nested parentheses. e.g.:
    if(a || (b && c))

    Sometimes I hate counting parentheses to make sure I closed everything at the right place or to see if I have enough or too many. That example doesn't really make it look useful, but for more complex statements it would be a boon.
    If you understand what you're doing, you're not learning anything.

  13. #13
    ---
    Join Date
    May 2004
    Posts
    1,379
    I know what you mean it is a pretty good idea.
    Dave's and dwks's code looks a lot cleaner than my own. Maybe I should restart. (not that I have even touched this since I posted it)

  14. #14
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Here is my version 1.03:

    Code:
    [/color]reen]#include &lt;stdio.h&gt;
    [/color]reen]#include &lt;stdlib.h&gt;
    [/color]reen]#include &lt;string.h&gt;
    
    [/color]reen]#define VERSION 103
    
    [/color]reen]#define STRINGLEN 100
    [/color]reen]#define LINELEN 1000
    
    /* what goes before and after everything else in the file */
    const char startend[2][STRINGLEN] = {
        "[ c o d e ]\n",
        "[ / c o d e ]\n"
    };
    
    /* what surrounds each keyword */
    const char surround[3][STRINGLEN] = {
        "[color=",  /* before keyword*/
        "]",             /* after colour */
        ""[/color],         /* after keyword */
    };
    
    /* the valid keywords with their colours */
    const char keyword[][STRINGLEN] = {
        "void", "blue",
        "char", "blue",
        "short", "blue",
        "int", "blue",
        "long", "blue",
        "float", "blue",
        "double", "blue",
        "unsigned", "blue",
        "signed", "blue",
    
        "sizeof", "blue",
    
        "const", "blue",
        "volatile", "blue",
        "static", "blue",
        "auto", "blue",
    
        "while", "blue",
        "do", "blue",
        "for", "blue",
        "if", "blue",
        "else", "blue",
        "continue", "blue",
        "break", "blue",
    
        "return", "blue",
    
        "switch", "blue",
        "case", "blue",
        "default", "blue",
        "goto", "blue",
    
        "struct", "blue",
        "union", "blue",
        "enum", "blue",
        "typedef", "blue",
    
        ""
    };
    
    /* like keywords, except don't worry about these in the middle of a word */
    const char unkeyword[][STRINGLEN] = {
        "[", "pink",
        "]", "pink",
        ""
    };
    
    /* characters to search for (and replace with) */
    const char sreplace[][STRINGLEN] = {
        "{", "{",
        "}", "}",
        "#", "#",
        "[", "[",
        "]", "]",
        "{", "{",
        "}", "}",
        "[", "[",
        "]", "]",
        "#", "#",
        "\", "\\",
        "^", "^",
        "|", "|",
        "~", "~",
        "&lt;", "&lt;",
        "&gt;", "&gt;",
        "[ c o d e ]", "[ c o d e ]",
        "[ / c o d e ]", "[ / c o d e ]",
        ""
    };
    
    /* the valid single-line comments */
    const char slcomment[][STRINGLEN] = {
    olor][color=green]//", "green",
        "#", "green",
        ""
    };
    
    /* the multi-line comment[s] */
    const char mlcomment[][STRINGLEN] = {
        "/*", "*/", "green",
        ""
    };
    
    /* the strings (same as multiline-comments, but with backslash checking) */
    const char mlstring[100][STRINGLEN] = {
        "\"", "\"", "red",
        "'", "'", "red",
        ""
    };
    
    /* colour of numbers */
    const char numbercol[STRINGLEN] = {
        "darkblue"
    };
    
    /* what a newline is replaced with */
    const char newline[STRINGLEN] = {
        "\n"
    };
    
    enum mode_t {
        MODE_NORMAL,
        MODE_STRING,
        MODE_MULTILINE,
    };
    
    enum loop_t {
        LOOP_NORMAL,
        LOOP_CONTINUE,
        LOOP_BREAK
    };
    
    FILE *get_input_file(void);
    FILE *get_output_file(void);
    FILE *open_input_file(char *s);
    FILE *open_output_file(char *s);
    void parse_file(FILE *fp, FILE *out);
    void parse_line(char *s, FILE *out);
    enum loop_t mode_multiline(char **s, FILE *out, enum mode_t *mode, int whichml);
    enum loop_t mode_string(char **s, FILE *out, enum mode_t *mode, char *start, int whichml);
    enum loop_t find_keywords(char **s, FILE *out, int before);
    enum loop_t find_unkeywords(char **s, FILE *out);
    int is_usage(char *s);
    int print_usage(void);
    void chomp(char *s);
    
    int main(int argc, char *argv[]) {
        FILE *fp, *out = stdout;
    
        if(argc == 1) {
            if(!(fp=get_input_file())) return 1;
            if(!(out=get_output_file())) return 1;
        }
        else {
            if(is_usage(argv[1])) return print_usage();
    
            if(!(fp=open_input_file(argv[1]))) return 1;
    
            if(argc == 3 && !(out=open_output_file(argv[2]))) return 1;
        }
    
        parse_file(fp, out);
        fclose(fp);
    
        return 0;
    }
    
    FILE *get_input_file(void) {
        char s[STRINGLEN];
        FILE *fp;
    
        printf("\nEnter the name of the input file:\n");
        fgets(s, sizeof(s), stdin);
        chomp(s);
    
        if(!(fp=open_input_file(s))) return 0;
    
        return fp;
    }
    
    FILE *get_output_file(void) {
        char s[STRINGLEN];
        FILE *fp;
    
        printf("\nEnter the name of the output file (blank for screen):\n");
        fgets(s, sizeof(s), stdin);
        chomp(s);
    
        if(!*s) return stdout;
    
        if(!(fp=open_output_file(s))) return 0;
    
        return fp;
    }
    
    FILE *open_input_file(char *s) {
        FILE *fp;
    
        if((fp=fopen(s, "rt")) == NULL) {
            fprintf(stderr, "Can't open input file\n");
            return 0;
        }
    
        return fp;
    }
    
    FILE *open_output_file(char *s) {
        FILE *fp;
    
        if((fp=fopen(s, "rt")) != NULL) {
            fclose(fp);
            fprintf(stderr, "Output file already exists. Overwrite? (Y/N) ");
            if(tolower(getchar()) != 'y') return 0;
        }
    
        if((fp=fopen(s, "wt")) == NULL) {
            fprintf(stderr, "Can't open output file\n");
            return 0;
        }
    
        return fp;
    }
    
    void parse_file(FILE *fp, FILE *out) {
        char s[LINELEN], t[LINELEN], *p, *tp;
        int x, y, hasnl;
        size_t len;
    
        fputs(startend[0], out);
    
        while(!feof(fp)) {
            s[0] = 0;
            fgets(s, sizeof(s), fp);
            hasnl = (strchr(s, '\n')?1:0);
            chomp(s);
            if(!*s) {
                if(hasnl) fputs(newline, out);
                continue;
            }
    
            for(x = 0; sreplace[x*2][0]; x ++) {
                if(p=strstr(s, sreplace[x*2])) {
                    strcpy(t, p+strlen(sreplace[x*2]));
                    strncpy(p, sreplace[x*2+1], strlen(sreplace[x*2+1]));
    
                    for(y = p-s+strlen(sreplace[x*2+1]), tp = t; *tp; y ++) {
                        s[y] = *tp++;
                    }
    
                    s[y] = 0;
    
                    x = 0; continue;
                }
            }
    
            parse_line(s, out);
    
            if(hasnl) fputs(newline, out);
        }
    
        fputs(startend[1], out);
    }
    
    void parse_line(char *s, FILE *out) {
        int x;
        char *start = s;
        static enum mode_t mode = MODE_NORMAL;
        static int whichml, before = 1;
    
        while(*s) {
            if(mode == MODE_MULTILINE) {
                if(mode_multiline(&s, out, &mode, whichml) == LOOP_CONTINUE) continue;
                else break;
            }
            else if(mode == MODE_STRING) {
                if(mode_string(&s, out, &mode, start, whichml) == LOOP_CONTINUE) continue;
                else break;
            }
    
            for(x = 0; mlstring[x*3][0]; x ++) {
                if(strncmp(s, mlstring[x*3], strlen(mlstring[x*3])) == 0) {
                    fputs(surround[0], out);
                    fputs(mlstring[x*3+2], out);
                    fputs(surround[1], out);
                    fputc(*s++, out);
                    mode = MODE_STRING;
                    whichml = x;
                    break;
                }
            }
    
            if(mlstring[x*3][0]) continue;
    
            for(x = 0; mlcomment[x*3][0]; x ++) {
                if(strncmp(s, mlcomment[x*3], strlen(mlcomment[x*3])) == 0) {
                    fputs(surround[0], out);
                    fputs(mlcomment[2], out);
                    fputs(surround[1], out);
                    mode = MODE_MULTILINE;
                    whichml = x;
                    break;
                }
            }
    
            if(mlcomment[x*3][0]) continue;
    
            if(find_unkeywords(&s, out) == LOOP_CONTINUE) continue;
            if(find_keywords(&s, out, before) == LOOP_CONTINUE) continue;
    
            for(x = 0; slcomment[x*2][0]; x ++) {
                if(strncmp(s, slcomment[x*2], strlen(slcomment[x*2])) == 0) {
                    fputs(surround[0], out);
                    fputs(slcomment[x*2+1], out);
                    fputs(surround[1], out);
                    fputs(s, out);
                    fputs(surround[2], out);
                    break;
                }
            }
    
            if(slcomment[x*2][0]) break;
    
            if(before && isdigit(*s)) {
                fputs(surround[0], out);
                fputs(numbercol, out);
                fputs(surround[1], out);
    
                while(isalnum(*s)) fputc(*s++, out);
    
                fputs(surround[2], out);
                continue;
            }
    
            before = !isalnum(*s);
            fputc(*s++, out);
        }
    }
    
    enum loop_t mode_multiline(char **s, FILE *out, enum mode_t *mode, int whichml) {
        char *p, *t;
    
        if(p=strstr(*s, mlcomment[whichml*3+1])) {
            for(t = *s; t != p+2; fputc(*t++, out)) ;
    
            fputs(surround[2], out);
            *s = t;
            *mode = MODE_NORMAL;
    
            return LOOP_CONTINUE;
        }
        else fputs(*s, out);
    
        return LOOP_BREAK;
    }
    
    enum loop_t mode_string(char **s, FILE *out, enum mode_t *mode, char *start, int whichml) {
        char *p, *t;
    
        if(**s) {
            p = *s;
            do {
                p = strstr(p==*s?p:p+1, mlstring[whichml*3+1]);
            } while(p && *(p-1) == '\\' && !(p == start ||
                (*(p-1) != '\\' || (p-1 != start && *(p-2) == '\\'))) );
        }
        else return LOOP_BREAK;
    
        if(!p) {
            fputs(*s, out);
            return LOOP_BREAK;
        }
    
        for(t = *s; t != p+1; fputc(*t++, out)) ;
    
        fputs(surround[2], out);
        *s = t;
        *mode = MODE_NORMAL;
    
        return LOOP_CONTINUE;
    }
    
    enum loop_t find_keywords(char **s, FILE *out, int before) {
        int x;
        size_t len;
    
        for(x = 0; keyword[x*2][0]; x ++) {
            len = strlen(keyword[x*2]);
    
            if(strncmp(*s, keyword[x*2], len) == 0
                && before && (!*(*s+len) || !isalnum(*(*s+len)))) {
    
                fputs(surround[0], out);
                fputs(keyword[x*2+1], out);
                fputs(surround[1], out);
                fputs(keyword[x*2], out);
                fputs(surround[2], out);
    
                *s += len;
                break;
            }
        }
    
        return keyword[x*2][0] ? LOOP_CONTINUE : LOOP_NORMAL;
    }
    
    enum loop_t find_unkeywords(char **s, FILE *out) {
        int x;
        size_t len;
    
        for(x = 0; unkeyword[x*2][0]; x ++) {
            len = strlen(unkeyword[x*2]);
    
            if(strncmp(*s, unkeyword[x*2], len) == 0) {
                fputs(surround[0], out);
                fputs(unkeyword[x*2+1], out);
                fputs(surround[1], out);
                fputs(unkeyword[x*2], out);
                fputs(surround[2], out);
    
                *s += len;
                break;
            }
        }
    
        return unkeyword[x*2][0] ? LOOP_CONTINUE : LOOP_NORMAL;
    }
    
    int is_usage(char *s) {
        if(strcmp(s, "-?") == 0
            || strcmp(s, "-h") == 0
            || strcmp(s, "-v") == 0
            || strcmp(s, "--help") == 0
            || strcmp(s, "--version") == 0) return 1;
    
        return 0;
    }
    
    int print_usage(void) {
        printf("\ncodeform v%i.%02i by DWK\n", VERSION/100, VERSION%100);
        printf("\nusage: codeform [infile [outfile]]\n");
        printf("\nIf no arguments are specified, you are prompted for both.");
        printf("\nIf no outfile is specified, the screen is used instead.\n");
        printf("\ncodeform formats your code in nice colours.\n");
    
        return 0;
    }
    
    void chomp(char *s) {
        while(*s && *s != '\n') s++;
    
        *s = 0;
    }
    It's not fully modularized yet. I can get it to produce HTML output just by changing the vars at the top.

    And of course, [ c o d e ] means without the spaces.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Please check my C++
    By csonx_p in forum C++ Programming
    Replies: 263
    Last Post: 07-24-2008, 09:20 AM
  2. String Class
    By BKurosawa in forum C++ Programming
    Replies: 117
    Last Post: 08-09-2007, 01:02 AM
  3. Compile Error that i dont understand
    By bobthebullet990 in forum C++ Programming
    Replies: 5
    Last Post: 05-05-2006, 09:19 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM