Thread: General question about undefined behavior

  1. #1
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22

    General question about undefined behavior

    Does compiler based undefined behavior exist, under the following
    condition...

    If under the circumstances a program compiled with a C compiler compiles ( compliant with the ANSI C standard ) without error or warning and executes without run-time errors, but exhibits a strange behavior that if diagnosed with a program such as gdb shows faults directly to a compiler's library, would this constitute as compiler based undefined behavior?

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    What do you mean by "compiler based undefined behavior"? It sounds like you are just describing a case where there is a bug in a standard library implementation.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Actually it sounds like you have a bug in your code that manifests when a standard library function is called. Nothing wrong with your compiler. You can probably do a backtrace to find what function that you wrote is causing the crash. Hopefully the bug is near there.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  4. #4
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    The only broken library function I'm aware of is the ASCII (8-bit) version of _cgets() in Visual Studio (2005 or later, and perhaps earlier) (the unicode (16-bit) version _cgetws() works). This was reported to Microsoft years ago, but was never fixed (at least not in VS 2010). It works fine with Visual C / C++ 4.1 and older versions. I'm not sure when it was first broken.

    Normally undefined behavior refers to statements like j = j++ + j++; where the order of evaluation is not defined.
    Last edited by rcgldr; 06-10-2013 at 11:51 PM.

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by King Mir
    Actually it sounds like you have a bug in your code that manifests when a standard library function is called.
    Heh, I interpreted it as a hypothetical question such that the program is known to be correct and kjwilliams just wants to know if the standard library bug constitutes "compiler based undefined behavior".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Let me rephrase my question...

    Probably what I mean by compiler based undefined behavior is actually what is called implementation defined behavior

    Lets say, not all compiler implementations can recognize outside of the scope of what is an error defined in ANSI C standard and that not all forms of undefined behavior can be traced to their origin. What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur. For example, a program exhibits a behavior that you think is undefined, but actually its not undefined behavior since that's the way the compiler implementation decides ( and most likely every time in that circumstance ) that that is how your program is going become a binary executable. This is what I call a compiler based undefined behavior, since all compilers are programs too.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur.
    Yes, that would indeed be implementation defined behaviour, and it is not undefined behaviour. Relying on such behaviour means a potential bug in your code, not in the compiler or standard library implementation.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Well the thing is that this is not a hypothetical question, I am writing a console program using DJGPP ( a port of GCC ) for MS-DOS. For the past month, even after upgrading the compiler after the most recent bug fixes, the behavior still occurs. Well , today I found a way around the problem, but I still wondering what was causing the problem was since I cant tell what it is. My program is actually a interpreter , I am designing a programming language for MS-DOS, which is intended to be used from a bootable USB flash drive used in PCs. My program uses non-standard C functions that are part of DJGPP implementation. Well, I found what the undefined behavior was, which the compiler was allowing.

    In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
    of a string ( in my programming language statement ) that is passed to it. So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
    I still have to do that...

    Thats why I wanted to know in the first place if there is such a thing as compiler based undefined behavior or implementation defined behavior.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    Well, I found what the undefined behavior was, which the compiler was allowing.
    Please don't use "undefined behaviour" as a synonym for "behaviour that I did not expect" or "behaviour that I consider to be a bug in a library that I am using".

    Quote Originally Posted by kjwilliams
    In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
    of a string ( in my programming language statement ) that is passed to it.
    I note that strtok cannot be used to tokenise two strings at the same time (with interleaved usage, under normal circumstances), so if clrscr calls strtok for some reason, this could be a problem.

    Quote Originally Posted by kjwilliams
    So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
    I still have to do that...
    Good idea. Do that, but stop calling it "undefined behaviour", unless you're going to qualify it as "undefined behaviour with respect to the conio.h library documentation". After all, the contents of conio.h is undefined with respect to the standard, and that is what we refer to when we talk about "undefined behaviour".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    Registered User ledow's Avatar
    Join Date
    Dec 2011
    Posts
    435
    In general, using strtok while passing out to other functions is generally a bad idea. It basically uses a "global variable" to keep track of its state, so any other code using strtok will wipe out your usage of it in the meantime. Tokenize the string, get the information you want, then act on those tokens without using strtok any more.

    Additionally, conio.h is DOS-specific and nothing to do with C standards. It's "unexpected" behaviour you're experiencing.

    That said, there's no reason why clrscr should interfere with strtok (I can't think of a sensible reason that clrscr would call it, but it might well be trashing RAM that affects strtok) but if you aren't properly initialising the conio functions, then you're likely to run into problems.

    I would suggest you post the parts of your code that are having the problem. Don't "edit" them, don't show us something that doesn't error, show us the code that does something unexpected and your workaround. Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else.

    - Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
    - A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.

  11. #11
    Registered User
    Join Date
    Sep 2011
    Location
    Athens , Greece
    Posts
    357
    Undefined behaviour <=> Expect the Unexpected.

  12. #12
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    @ledow

    Code:
    short int string_parser(char *userstring, char *target, char magic, short int word)
    {
    
      static char subject[PARSE_SIZE_LIMIT]; subject[0] = '\0';
    
    
      char *w = &magic;//assign the address of magic to w
    
      char a = 0,b = 0,c = 0,x = 0,z = 0;//temp variables
    
      short int length = 0,orig = 0;// get the length of the string
      short int offset = 0,y = 1;//segment control
      char *p;//used for assigning pointer address returned by strtok()
    
    
      //** format subject array of 'natural' garbage with spaces & reset x **
      //for (x = 0;x < PARSE_SIZE_LIMIT; x++) { subject[x] = ' '; } x = 0;
    
    
      orig = strlen(userstring);//get size length target array
    
    
      //copy and reformat the userstring into, 0 - PARSE_SIZE_LIMIT ; format.
    
      //for lines larger or equal than the size limit....
      if (orig >= (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
        { subject[x] = userstring[x]; }
      }
    
      //for lines less than the size limit....
      if (orig < (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= orig; x++)
        { subject[x] = userstring[x]; }
      }
    
      subject[x+1] = '\0';//add terminating null character
    
      //if (word == -1) { printf("<%s> ",subject); }//temp - keep
    
      length = strlen(subject);//get string length of subject[];
    
      //count where all the "magic" characters are
      for(x = 0;x < length;x++)
      {
         //if the subject character = "magic" character then count it
         if(subject[x] == magic)
         {
           a = x;//set a to x , to match where x is currently at.
    
           // check for no characters between "magic" characters if x > 0
           // if x = 0 then do nothing....
           if(x > 0)
           {
             c = a - b;// c = the new count (of x) - the old count (of x)
    
            /*
    
             a magic character must have a difference between it's last
             position (if there is one) and its new position ( which is greater
             than 1 ) to be counted as magic character. Therefore, if "#" is a
             magic character and a string is, "##magic:" it would be discounted.
             However, a string such as,"# #magic:" would be counted as two
             magic characters
    
            */
    
             if(c > 1) { y++; } /* if c > 1 then add 1 to y */
           }
    
           b = a;//set b to a , to save the previous count.
         }
      }
    
      //if the original length of the userstring is (PARSE_SIZE_LIMIT - 1)
      //then subtract 1
      if (orig >= (PARSE_SIZE_LIMIT - 1)) { y -= 1; }
    
    
      //the value of word finalizes the work of string_parser
    
    
      //return the number parts in the string
      if(word == -1) { return y; }
    
      //return the modified user string
      if(word == 0) { strcpy(target,subject); }
    
    
      //** this is where strtok() is used **
    
    
      //copy the first word in the string (must be done to find the next words)
      if(word == 1)
      {
        p = strtok(subject,w);
        strcpy(target,p);
      }
    
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    
      return 0;
    }

  13. #13
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Hmm... I notice:
    Code:
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    shouldn't this be or contain a while loop that terminates when strtok returns a null pointer?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  14. #14
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Code:
      //for lines larger or equal than the size limit....
      if (orig >= (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
        { subject[x] = userstring[x]; }
      }
    
      //for lines less than the size limit....
      if (orig < (PARSE_SIZE_LIMIT - 1))
      {
        for (x = 0; x <= orig; x++)
        { subject[x] = userstring[x]; }
      }
    
      subject[x+1] = '\0';//add terminating null character
    You have a buffer overflow in both cases. Use a debugger to step through your loops to find out the value of "x" after both loops.
    You really need to be more careful if you use <= in for-loops.

    Code:
      //** this is where strtok() is used **
    
      //copy the first word in the string (must be done to find the next words)
      if(word == 1)
      {
        p = strtok(subject,w);
        strcpy(target,p);
      }
    
      //copy all the next words (sequencially) in the string
      if(word > 1)
      {
        p = strtok(NULL,w);
        strcpy(target,p);
      }
    When word > 1 strtok returns garbage or NULL because you never initialise it with a string to tokenize. You only do that when word == 1.

    Furthermore, "w" is not a C-string but a pointer to a single character.

    Bye, Andreas

  15. #15
    Registered User ledow's Avatar
    Join Date
    Dec 2011
    Posts
    435
    Gosh, so when I said:

    "Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else."

    we find two buffer overflows, a strtok that doesn't get initialised, and a strtok that is used on the assumption that there are always X amount of words in it.

    Not to be too mean, but if you'd just posted the code first-off, we could have saved a lot of time. And you were at the point of trying to blame DJGPP etc. rather than spot them?

    P.S. "array subscript 'x' has type char" (GCC will "Warn if an array subscript has type char. This is a common cause of error, as programmers often forget that this type is signed on some machines. This warning is enabled by -Wall).

    - Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
    - A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Undefined behavior
    By jim mcnamara in forum C Programming
    Replies: 2
    Last Post: 02-18-2013, 11:14 PM
  2. Static vs. Dynamic Arrays, Getting Undefined Behavior
    By StefPrez in forum C++ Programming
    Replies: 11
    Last Post: 01-28-2012, 11:39 PM
  3. Is x=x++; Undefined Behavior?
    By envec83 in forum C Programming
    Replies: 5
    Last Post: 10-04-2011, 01:27 AM
  4. Undefined behavior from VC6 to 2k5
    By m37h0d in forum C++ Programming
    Replies: 10
    Last Post: 06-22-2011, 07:56 PM
  5. openGL: textures, gluLookAt, and undefined behavior
    By MK27 in forum Game Programming
    Replies: 7
    Last Post: 04-28-2009, 10:12 AM

Tags for this Thread