Thread: General question about undefined behavior

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22

    General question about undefined behavior

    Does compiler based undefined behavior exist, under the following
    condition...

    If under the circumstances a program compiled with a C compiler compiles ( compliant with the ANSI C standard ) without error or warning and executes without run-time errors, but exhibits a strange behavior that if diagnosed with a program such as gdb shows faults directly to a compiler's library, would this constitute as compiler based undefined behavior?

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    What do you mean by "compiler based undefined behavior"? It sounds like you are just describing a case where there is a bug in a standard library implementation.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Actually it sounds like you have a bug in your code that manifests when a standard library function is called. Nothing wrong with your compiler. You can probably do a backtrace to find what function that you wrote is causing the crash. Hopefully the bug is near there.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  4. #4
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    The only broken library function I'm aware of is the ASCII (8-bit) version of _cgets() in Visual Studio (2005 or later, and perhaps earlier) (the unicode (16-bit) version _cgetws() works). This was reported to Microsoft years ago, but was never fixed (at least not in VS 2010). It works fine with Visual C / C++ 4.1 and older versions. I'm not sure when it was first broken.

    Normally undefined behavior refers to statements like j = j++ + j++; where the order of evaluation is not defined.
    Last edited by rcgldr; 06-10-2013 at 11:51 PM.

  5. #5
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Let me rephrase my question...

    Probably what I mean by compiler based undefined behavior is actually what is called implementation defined behavior

    Lets say, not all compiler implementations can recognize outside of the scope of what is an error defined in ANSI C standard and that not all forms of undefined behavior can be traced to their origin. What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur. For example, a program exhibits a behavior that you think is undefined, but actually its not undefined behavior since that's the way the compiler implementation decides ( and most likely every time in that circumstance ) that that is how your program is going become a binary executable. This is what I call a compiler based undefined behavior, since all compilers are programs too.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by King Mir
    Actually it sounds like you have a bug in your code that manifests when a standard library function is called.
    Heh, I interpreted it as a hypothetical question such that the program is known to be correct and kjwilliams just wants to know if the standard library bug constitutes "compiler based undefined behavior".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur.
    Yes, that would indeed be implementation defined behaviour, and it is not undefined behaviour. Relying on such behaviour means a potential bug in your code, not in the compiler or standard library implementation.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Well the thing is that this is not a hypothetical question, I am writing a console program using DJGPP ( a port of GCC ) for MS-DOS. For the past month, even after upgrading the compiler after the most recent bug fixes, the behavior still occurs. Well , today I found a way around the problem, but I still wondering what was causing the problem was since I cant tell what it is. My program is actually a interpreter , I am designing a programming language for MS-DOS, which is intended to be used from a bootable USB flash drive used in PCs. My program uses non-standard C functions that are part of DJGPP implementation. Well, I found what the undefined behavior was, which the compiler was allowing.

    In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
    of a string ( in my programming language statement ) that is passed to it. So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
    I still have to do that...

    Thats why I wanted to know in the first place if there is such a thing as compiler based undefined behavior or implementation defined behavior.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    Well, I found what the undefined behavior was, which the compiler was allowing.
    Please don't use "undefined behaviour" as a synonym for "behaviour that I did not expect" or "behaviour that I consider to be a bug in a library that I am using".

    Quote Originally Posted by kjwilliams
    In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
    of a string ( in my programming language statement ) that is passed to it.
    I note that strtok cannot be used to tokenise two strings at the same time (with interleaved usage, under normal circumstances), so if clrscr calls strtok for some reason, this could be a problem.

    Quote Originally Posted by kjwilliams
    So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
    I still have to do that...
    Good idea. Do that, but stop calling it "undefined behaviour", unless you're going to qualify it as "undefined behaviour with respect to the conio.h library documentation". After all, the contents of conio.h is undefined with respect to the standard, and that is what we refer to when we talk about "undefined behaviour".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Undefined behaviour <=> Expect the Unexpected.

  11. #11
    Registered User ledow's Avatar
    Join Date
    Dec 2011
    Posts
    435
    In general, using strtok while passing out to other functions is generally a bad idea. It basically uses a "global variable" to keep track of its state, so any other code using strtok will wipe out your usage of it in the meantime. Tokenize the string, get the information you want, then act on those tokens without using strtok any more.

    Additionally, conio.h is DOS-specific and nothing to do with C standards. It's "unexpected" behaviour you're experiencing.

    That said, there's no reason why clrscr should interfere with strtok (I can't think of a sensible reason that clrscr would call it, but it might well be trashing RAM that affects strtok) but if you aren't properly initialising the conio functions, then you're likely to run into problems.

    I would suggest you post the parts of your code that are having the problem. Don't "edit" them, don't show us something that doesn't error, show us the code that does something unexpected and your workaround. Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else.

    - Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
    - A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.

  12. #12
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    @ledow

    Code:
    short int string_parser(char *userstring, char *target, char magic, short int word)
    {
    
      static char subject[PARSE_SIZE_LIMIT]; subject[0] = '\0';
    
    
      char *w = &magic;//assign the address of magic to w
    
      char a = 0,b = 0,c = 0,x = 0,z = 0;//temp variables
    
      short int length = 0,orig = 0;// get the length of the string
      short int offset = 0,y = 1;//segment control
      char *p;//used for assigning pointer address returned by strtok()
    
    
      //** format subject array of 'natural' garbage with spaces & reset x **
      //for (x = 0;x < PARSE_SIZE_LIMIT; x++) { subject[x] = ' '; } x = 0;
    
    
      orig = strlen(userstring);//get size length target array
    
    
      //copy and reformat the userstring into, 0 - PARSE_SIZE_LIMIT ; format.
    
      //for lines larger or equal than the size limit....
      if (orig >= (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
        { subject[x] = userstring[x]; }
      }
    
      //for lines less than the size limit....
      if (orig < (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= orig; x++)
        { subject[x] = userstring[x]; }
      }
    
      subject[x+1] = '\0';//add terminating null character
    
      //if (word == -1) { printf("<%s> ",subject); }//temp - keep
    
      length = strlen(subject);//get string length of subject[];
    
      //count where all the "magic" characters are
      for(x = 0;x < length;x++)
      {
         //if the subject character = "magic" character then count it
         if(subject[x] == magic)
         {
           a = x;//set a to x , to match where x is currently at.
    
           // check for no characters between "magic" characters if x > 0
           // if x = 0 then do nothing....
           if(x > 0)
           {
             c = a - b;// c = the new count (of x) - the old count (of x)
    
            /*
    
             a magic character must have a difference between it's last
             position (if there is one) and its new position ( which is greater
             than 1 ) to be counted as magic character. Therefore, if "#" is a
             magic character and a string is, "##magic:" it would be discounted.
             However, a string such as,"# #magic:" would be counted as two
             magic characters
    
            */
    
             if(c > 1) { y++; } /* if c > 1 then add 1 to y */
           }
    
           b = a;//set b to a , to save the previous count.
         }
      }
    
      //if the original length of the userstring is (PARSE_SIZE_LIMIT - 1)
      //then subtract 1
      if (orig >= (PARSE_SIZE_LIMIT - 1)) { y -= 1; }
    
    
      //the value of word finalizes the work of string_parser
    
    
      //return the number parts in the string
      if(word == -1) { return y; }
    
      //return the modified user string
      if(word == 0) { strcpy(target,subject); }
    
    
      //** this is where strtok() is used **
    
    
      //copy the first word in the string (must be done to find the next words)
      if(word == 1)
      {
        p = strtok(subject,w);
        strcpy(target,p);
      }
    
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    
      return 0;
    }

  13. #13
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Code:
      //for lines larger or equal than the size limit....
      if (orig >= (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
        { subject[x] = userstring[x]; }
      }
    
      //for lines less than the size limit....
      if (orig < (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= orig; x++)
        { subject[x] = userstring[x]; }
      }
    
      subject[x+1] = '\0';//add terminating null character
    You have a buffer overflow in both cases. Use a debugger to step through your loops to find out the value of "x" after both loops.
    You really need to be more careful if you use <= in for-loops.

    Code:
      //** this is where strtok() is used **
    
      //copy the first word in the string (must be done to find the next words)
      if(word == 1)
      {
        p = strtok(subject,w);
        strcpy(target,p);
      }
    
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    When word > 1 strtok returns garbage or NULL because you never initialise it with a string to tokenize. You only do that when word == 1.

    Furthermore, "w" is not a C-string but a pointer to a single character.

    Bye, Andreas

  14. #14
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Quote Originally Posted by ledow View Post
    In general, using strtok while passing out to other functions is generally a bad idea. It basically uses a "global variable" to keep track of its state, so any other code using strtok will wipe out your usage of it in the meantime. Tokenize the string, get the information you want, then act on those tokens without using strtok any more.

    Additionally, conio.h is DOS-specific and nothing to do with C standards. It's "unexpected" behaviour you're experiencing.

    That said, there's no reason why clrscr should interfere with strtok (I can't think of a sensible reason that clrscr would call it, but it might well be trashing RAM that affects strtok) but if you aren't properly initialising the conio functions, then you're likely to run into problems.

    I would suggest you post the parts of your code that are having the problem. Don't "edit" them, don't show us something that doesn't error, show us the code that does something unexpected and your workaround. Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else.
    Well the way that I design code before I implement it into a larger program, is that I test it alone - in this circumstance from other function calls the way that it is set up is like this :

    Code:
     
    
    #define PARSE_SIZE_LIMIT 81
    
    // in a function call 
    
    char x[PARSE_SIZE_LIMIT]; x[0] = '\0';
    char y[PARSE_SIZE_LIMIT]; y[0] = '\0';
    unsigned short int counta;
    
    //lets assume x now has some string of text, in the char array and were going to pass it to string_parser
    
    counta = string_parser(x,y,'#',-1);//find out how many '#' there are ... 
    
    //based on counta  you will have 0,1, or more parts .......
    if your text string was like :

    #cabbage: red green #gofish:

    it would find : two string parts:
    #cabbage: red green
    #gofish:

    back to the code to show how the string is parsed :
    Code:
    string_parser(x,y,'#',1);//word now = 1 to initialize the first token search
    string_parser(x,y,'#',2);//word now is greater than 1 to get the next token search
    I can do this again and again as long as there is another '#' that exists in the string to find but thats what counta is for

    let me post part 2 of this here
    to show you how it works

  15. #15
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Part 2 :

    @ledow, et al

    this is a ANSI C program that is portable so you can compile it if you wish

    Code:
    //re-testing parse_text for WATT
    
    #include <stdio.h>
    #include <string.h>
    
    // original design setting:
    #define PARSE_SIZE_LIMIT 81
    
    short int string_parser(char *, char *, char, short int);//TEXT_PARSER
    short int kfgets(char *);//KFGETS
    short int ynreturn(void);//yes/no option
    
    int main (void)
    {
       char target[PARSE_SIZE_LIMIT]; target[0] = '\0';
       char worda[PARSE_SIZE_LIMIT]; worda[0] = '\0';
       char uword[PARSE_SIZE_LIMIT]; uword[0] = '\0';
       short int count,countb,x,y;
    
       do
       {
    
        do
        {
          printf("\n#>");
          kfgets(target);
    
    
    
          count = string_parser(target,worda,'#',-1);
    
          if(count == 0) { printf("you must enter something using \'#\'\n"); }
    
        } while (count == 0);
    
        string_parser(target,worda,'#',0);
        printf("you entered : %s\n",worda);
    
    
        //count # parts
    
        printf("found %d parts by \'#\'\n\n",count);
    
        for(x = 1;x <= count;x++)
        {
          string_parser(worda,uword,'#',x);
    
          printf("part#%d : %s\n",x,uword);
        }
    
    
        //count spaced parts
    
        count = string_parser(worda,uword,' ',-1);
    
        printf("found %d parts by \' \'\n\n",count);
    
        for(x = 1;x <= count;x++)
        {
          string_parser(worda,uword,' ',x);
    
          printf("part#%d : %s\n",x,uword);
        }
    
        printf("\ntest again (y/n)?");
    
       } while((ynreturn()) == 1);
    
    
    
       return 0;
    }
    
    
    short int string_parser(char *userstring, char *target, char magic, short int word)
    {
      //subject must be static
      static char subject[PARSE_SIZE_LIMIT];//stores 80 chars +1 for the null character
    
      char *w = &magic;//assign the address of magic to w
    
      char a = 0,b = 0,c = 0,x = 0,z = 0;//temp variables
    
      short int length = 0,orig = 0;// get the length of the string
      short int offset = 0,y = 1;//segment control
      char *p;//used for assigning pointer address returned by strtok()
    
      //** format subject array of 'natural' garbage **
    
      for (x = 0;x < PARSE_SIZE_LIMIT; x++) { subject[x] = ' '; }
      x = 0;
    
      orig = strlen(userstring);//get size length target array
    
      //offset the copy of the userstring by 1 if the first character is not magic
    
      //if(userstring[0] != magic)
      //{
         //printf("offset detected\r\n");//temp
         //offset = 1;
         //subject[0] = magic;
      //}
    
      //copy and reformat the userstring into, 0 - PARSE_SIZE_LIMIT ; format.
    
      //for lines larger or equal than the size limit....
      if (orig >= (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
        { subject[x] = userstring[x]; }
      }
    
      //for lines less than the size limit....
      if (orig < (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= orig; x++)
        { subject[x] = userstring[x]; }
      }
    
      subject[x+1] = '\0';//add terminating null character
    
      //if (word == -1) { printf("<%s> ",subject); }//temp - keep
    
      length = strlen(subject);//get string length of subject[];
    
      //count where all the "magic" characters are
      for(x = 0;x < length;x++)
      {
         //if the subject character = "magic" character then count it
         if(subject[x] == magic)
         {
           a = x;//set a to x , to match where x is currently at.
    
           // check for no words between spaces if x > 0
           // if x = 0 then do nothing....
           if(x > 0)
           {
             c = a - b;// c = the new count (of x) - the old count (of x)
    
             //a magic character must have a difference between it's last
             //position (if there is one) and its new position greater
             //than 1 character to be counted as magic character.
             //Therefore, if "#" is a magic character and a string is,
             //"##magic:" it would be discounted. A string such "# #magic:"
             //would be counted as two magic characters
    
             if(c > 1) { y++; } /* if c > 1 then add 1 to y */
           }
    
           b = a;//set b to a , to save the previous count.
         }
      }
    
      //if the original length of the userstring is (PARSE_SIZE_LIMIT - 1)
      //then subtract 1
      if (orig >= (PARSE_SIZE_LIMIT - 1)) { y -= 1; }
    
      //return the number parts in the string
      if(word == -1)
      {
        //printf("= %d\r\n",y);//temp - keep
        return y;
      }
    
      //return the modified user string
      if(word == 0) { strcpy(target,subject); }
    Before I had DJGPP - was using a really old ( obsolete ) compiler , Borland Turbo C++ for DOS v3.0 which used the code as an example on how to use strtok(); but it works even under DJGPP
    so to continue the rest of my function listing ....

    Code:
      //copy the first word in the string (must be done to find the next words)
      if(word == 1)
      {
        p = strtok(subject,w);
        strcpy(target,p);
      }
    
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    
      return 0;
    }
    
    //generic string prompt - size was eliminated for TC++
    short int kfgets(char *target)
    {
       short int a;//temp. variables
       char line[81];//temp string storage
    
       //user must provide prompt for whatever information wanted from the user
    
       //format the strings
       for(a = 0;a < 81; a++) { line[a] = '\0';}
    
       fgets(line , 81, stdin);
       a = strlen(line);
       line[a-1] = '\0';//get rid of the newline character added by fgets
    
       strcpy(target,line);// copy line to the target array
    
       return 0;
    }
    
    short int ynreturn(void)
    {
    
        short int result = 0;
        char x[3];
    
        while (fgets(x,3,stdin) != NULL && x[1] != '\n');
    
        switch(x[0])
        {
           case('Y'):
           case('y'):
           {
              result = 1;
              break;
           }
    
           case('N'):
           case('n'):
           {
              result = 0;
              break;
           }
    
           default:
           {
              result = 0;
              break;
           }
        }
    
        return result;
    }
    so what I am trying to do in my bigger program is break up strings by '#' and then by ' '.....

    so in my big program a text line such as :

    #clear:

    ...causes

    #newline: 1

    ...to be parsed as :

    #newline:

    using the '#' as the token
    before its parsed with ' ' as the token

    The reason that I started this post as undefined behavior is that DJGPP was compiling my big program to allow parsing of the text line

    #newline: 1

    as

    #newline: 1

    other compiliations via DJGPP would do the previous behavior even though the only change
    I was doing was commenting out cprintf statements that would show me what the value of
    what was going on durring the process as a means of debugging my larger program.
    So the way my bigger program works

    is that in text file is that #newline: 1 is parsed as #newline: 1 if it comes before #clear: which
    is parsed as #clear using '#' as the token. if the two lines are switched the other way around then
    the problem occurs.

    In the end, my bigger program compiles fine no matter what ... which makes trying to find what IS causing the undefined behavior to be troublesome

    Im going to be posting my bigger program source code listing on sourceforge as soon I finish understanding git. I actually have a functioning program that I saved that was compiled with
    the Borland compiler, that works.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Undefined behavior
    By jim mcnamara in forum C Programming
    Replies: 2
    Last Post: 02-18-2013, 11:14 PM
  2. Static vs. Dynamic Arrays, Getting Undefined Behavior
    By StefPrez in forum C++ Programming
    Replies: 11
    Last Post: 01-28-2012, 11:39 PM
  3. Is x=x++; Undefined Behavior?
    By envec83 in forum C Programming
    Replies: 5
    Last Post: 10-04-2011, 01:27 AM
  4. Undefined behavior from VC6 to 2k5
    By m37h0d in forum C++ Programming
    Replies: 10
    Last Post: 06-22-2011, 07:56 PM
  5. openGL: textures, gluLookAt, and undefined behavior
    By MK27 in forum Game Programming
    Replies: 7
    Last Post: 04-28-2009, 10:12 AM

Tags for this Thread