General question about undefined behavior

**kjwilliams** · 06-10-2013

Does compiler based undefined behavior exist, under the following
condition...

If under the circumstances a program compiled with a C compiler compiles ( compliant with the ANSI C standard ) without error or warning and executes without run-time errors, but exhibits a strange behavior that if diagnosed with a program such as gdb shows faults directly to a compiler's library, would this constitute as compiler based undefined behavior?

**laserlight** · 06-10-2013

What do you mean by "compiler based undefined behavior"? It sounds like you are just describing a case where there is a bug in a standard library implementation.

**King Mir** · 06-10-2013

Actually it sounds like you have a bug in your code that manifests when a standard library function is called. Nothing wrong with your compiler. You can probably do a backtrace to find what function that you wrote is causing the crash. Hopefully the bug is near there.

**rcgldr** · 06-10-2013

The only broken library function I'm aware of is the ASCII (8-bit) version of _cgets() in Visual Studio (2005 or later, and perhaps earlier) (the unicode (16-bit) version _cgetws() works). This was reported to Microsoft years ago, but was never fixed (at least not in VS 2010). It works fine with Visual C / C++ 4.1 and older versions. I'm not sure when it was first broken.

Normally undefined behavior refers to statements like j = j++ + j++; where the order of evaluation is not defined.

**kjwilliams** · 06-11-2013

Let me rephrase my question...

Probably what I mean by compiler based undefined behavior is actually what is called implementation defined behavior

Lets say, not all compiler implementations can recognize outside of the scope of what is an error defined in ANSI C standard and that not all forms of undefined behavior can be traced to their origin. What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur. For example, a program exhibits a behavior that you think is undefined, but actually its not undefined behavior since that's the way the compiler implementation decides ( and most likely every time in that circumstance ) that that is how your program is going become a binary executable. This is what I call a compiler based undefined behavior, since all compilers are programs too.

**laserlight** · 06-11-2013

Originally Posted by King Mir

Actually it sounds like you have a bug in your code that manifests when a standard library function is called.

Heh, I interpreted it as a hypothetical question such that the program is known to be correct and kjwilliams just wants to know if the standard library bug constitutes "compiler based undefined behavior".

**laserlight** · 06-11-2013

Originally Posted by kjwilliams

What I am talking about is if you write some code that compiles fine but exhibits a behavior that has nothing to do with your code and it is not even undefined behavior, but is part of how the compiler decides at each part of the source code, to develop what becomes the final binary executable. What I am saying, is that there is a gray area of how compilers are designed which allow a behavior to occur.

Yes, that would indeed be implementation defined behaviour, and it is not undefined behaviour. Relying on such behaviour means a potential bug in your code, not in the compiler or standard library implementation.

**kjwilliams** · 06-11-2013

Well the thing is that this is not a hypothetical question, I am writing a console program using DJGPP ( a port of GCC ) for MS-DOS. For the past month, even after upgrading the compiler after the most recent bug fixes, the behavior still occurs. Well , today I found a way around the problem, but I still wondering what was causing the problem was since I cant tell what it is. My program is actually a interpreter , I am designing a programming language for MS-DOS, which is intended to be used from a bootable USB flash drive used in PCs. My program uses non-standard C functions that are part of DJGPP implementation. Well, I found what the undefined behavior was, which the compiler was allowing.

In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
of a string ( in my programming language statement ) that is passed to it. So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
I still have to do that...

Thats why I wanted to know in the first place if there is such a thing as compiler based undefined behavior or implementation defined behavior.

**laserlight** · 06-11-2013

Originally Posted by kjwilliams

Well, I found what the undefined behavior was, which the compiler was allowing.

Please don't use "undefined behaviour" as a synonym for "behaviour that I did not expect" or "behaviour that I consider to be a bug in a library that I am using".

Originally Posted by kjwilliams

In textmode(C4350) ( of conio.h ) .... using clrscr() ( of conio.h ) ..inhibits strtok() (of string.h ) from reaching the last token
of a string ( in my programming language statement ) that is passed to it.

I note that strtok cannot be used to tokenise two strings at the same time (with interleaved usage, under normal circumstances), so if clrscr calls strtok for some reason, this could be a problem.

Originally Posted by kjwilliams

So what I want to do is write a smaller program which will produce the same undefined behavior and post it on the usenet forum for DJGPP, to see what other posters say about it.
I still have to do that...

Good idea. Do that, but stop calling it "undefined behaviour", unless you're going to qualify it as "undefined behaviour with respect to the conio.h library documentation". After all, the contents of conio.h is undefined with respect to the standard, and that is what we refer to when we talk about "undefined behaviour".

**Mr.Lnx** · 06-11-2013

Undefined behaviour <=> Expect the Unexpected.

**ledow** · 06-11-2013

In general, using strtok while passing out to other functions is generally a bad idea. It basically uses a "global variable" to keep track of its state, so any other code using strtok will wipe out your usage of it in the meantime. Tokenize the string, get the information you want, then act on those tokens without using strtok any more.

Additionally, conio.h is DOS-specific and nothing to do with C standards. It's "unexpected" behaviour you're experiencing.

That said, there's no reason why clrscr should interfere with strtok (I can't think of a sensible reason that clrscr would call it, but it might well be trashing RAM that affects strtok) but if you aren't properly initialising the conio functions, then you're likely to run into problems.

I would suggest you post the parts of your code that are having the problem. Don't "edit" them, don't show us something that doesn't error, show us the code that does something unexpected and your workaround. Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else.

**kjwilliams** · 06-11-2013

@ledow

Code:

short int string_parser(char *userstring, char *target, char magic, short int word)
{

  static char subject[PARSE_SIZE_LIMIT]; subject[0] = '\0';


  char *w = &magic;//assign the address of magic to w

  char a = 0,b = 0,c = 0,x = 0,z = 0;//temp variables

  short int length = 0,orig = 0;// get the length of the string
  short int offset = 0,y = 1;//segment control
  char *p;//used for assigning pointer address returned by strtok()


  //** format subject array of 'natural' garbage with spaces & reset x **
  //for (x = 0;x < PARSE_SIZE_LIMIT; x++) { subject[x] = ' '; } x = 0;


  orig = strlen(userstring);//get size length target array


  //copy and reformat the userstring into, 0 - PARSE_SIZE_LIMIT ; format.

  //for lines larger or equal than the size limit....
  if (orig >= (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
    { subject[x] = userstring[x]; }
  }

  //for lines less than the size limit....
  if (orig < (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= orig; x++)
    { subject[x] = userstring[x]; }
  }

  subject[x+1] = '\0';//add terminating null character

  //if (word == -1) { printf("<%s> ",subject); }//temp - keep

  length = strlen(subject);//get string length of subject[];

  //count where all the "magic" characters are
  for(x = 0;x < length;x++)
  {
     //if the subject character = "magic" character then count it
     if(subject[x] == magic)
     {
       a = x;//set a to x , to match where x is currently at.

       // check for no characters between "magic" characters if x > 0
       // if x = 0 then do nothing....
       if(x > 0)
       {
         c = a - b;// c = the new count (of x) - the old count (of x)

        /*

         a magic character must have a difference between it's last
         position (if there is one) and its new position ( which is greater
         than 1 ) to be counted as magic character. Therefore, if "#" is a
         magic character and a string is, "##magic:" it would be discounted.
         However, a string such as,"# #magic:" would be counted as two
         magic characters

        */

         if(c > 1) { y++; } /* if c > 1 then add 1 to y */
       }

       b = a;//set b to a , to save the previous count.
     }
  }

  //if the original length of the userstring is (PARSE_SIZE_LIMIT - 1)
  //then subtract 1
  if (orig >= (PARSE_SIZE_LIMIT - 1)) { y -= 1; }


  //the value of word finalizes the work of string_parser


  //return the number parts in the string
  if(word == -1) { return y; }

  //return the modified user string
  if(word == 0) { strcpy(target,subject); }


  //** this is where strtok() is used **


  //copy the first word in the string (must be done to find the next words)
  if(word == 1)
  {
    p = strtok(subject,w);
    strcpy(target,p);
  }

  //copy all the next words (sequencially) in the string
  if(word > 1)
  {
    p = strtok(NULL,w);
    strcpy(target,p);
  }

  return 0;
}

**AndiPersti** · 06-12-2013

Code:

  //for lines larger or equal than the size limit....
  if (orig >= (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
    { subject[x] = userstring[x]; }
  }

  //for lines less than the size limit....
  if (orig < (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= orig; x++)
    { subject[x] = userstring[x]; }
  }

  subject[x+1] = '\0';//add terminating null character

You have a buffer overflow in both cases. Use a debugger to step through your loops to find out the value of "x" after both loops.
You really need to be more careful if you use <= in for-loops.

Code:

  //** this is where strtok() is used **

  //copy the first word in the string (must be done to find the next words)
  if(word == 1)
  {
    p = strtok(subject,w);
    strcpy(target,p);
  }

  //copy all the next words (sequencially) in the string
  if(word > 1)
  {
    p = strtok(NULL,w);
    strcpy(target,p);
  }

When word > 1 strtok returns garbage or NULL because you never initialise it with a string to tokenize. You only do that when word == 1.

Furthermore, "w" is not a C-string but a pointer to a single character.

Bye, Andreas

**kjwilliams** · 06-12-2013

Originally Posted by ledow

In general, using strtok while passing out to other functions is generally a bad idea. It basically uses a "global variable" to keep track of its state, so any other code using strtok will wipe out your usage of it in the meantime. Tokenize the string, get the information you want, then act on those tokens without using strtok any more.

Additionally, conio.h is DOS-specific and nothing to do with C standards. It's "unexpected" behaviour you're experiencing.

That said, there's no reason why clrscr should interfere with strtok (I can't think of a sensible reason that clrscr would call it, but it might well be trashing RAM that affects strtok) but if you aren't properly initialising the conio functions, then you're likely to run into problems.

I would suggest you post the parts of your code that are having the problem. Don't "edit" them, don't show us something that doesn't error, show us the code that does something unexpected and your workaround. Chances are, almost certainly, that you weren't using strtok properly in the first place and that it's nothing to do with conio, clrscr or anything else.

Well the way that I design code before I implement it into a larger program, is that I test it alone - in this circumstance from other function calls the way that it is set up is like this :

Code:

 

#define PARSE_SIZE_LIMIT 81

// in a function call 

char x[PARSE_SIZE_LIMIT]; x[0] = '\0';
char y[PARSE_SIZE_LIMIT]; y[0] = '\0';
unsigned short int counta;

//lets assume x now has some string of text, in the char array and were going to pass it to string_parser

counta = string_parser(x,y,'#',-1);//find out how many '#' there are ... 

//based on counta  you will have 0,1, or more parts .......

if your text string was like :

#cabbage: red green #gofish:

it would find : two string parts:
#cabbage: red green
#gofish:

back to the code to show how the string is parsed :

Code:

string_parser(x,y,'#',1);//word now = 1 to initialize the first token search
string_parser(x,y,'#',2);//word now is greater than 1 to get the next token search

I can do this again and again as long as there is another '#' that exists in the string to find but thats what counta is for

let me post part 2 of this here
to show you how it works

**kjwilliams** · 06-12-2013

Part 2 :

@ledow, et al

this is a ANSI C program that is portable so you can compile it if you wish

Code:

//re-testing parse_text for WATT

#include <stdio.h>
#include <string.h>

// original design setting:
#define PARSE_SIZE_LIMIT 81

short int string_parser(char *, char *, char, short int);//TEXT_PARSER
short int kfgets(char *);//KFGETS
short int ynreturn(void);//yes/no option

int main (void)
{
   char target[PARSE_SIZE_LIMIT]; target[0] = '\0';
   char worda[PARSE_SIZE_LIMIT]; worda[0] = '\0';
   char uword[PARSE_SIZE_LIMIT]; uword[0] = '\0';
   short int count,countb,x,y;

   do
   {

    do
    {
      printf("\n#>");
      kfgets(target);



      count = string_parser(target,worda,'#',-1);

      if(count == 0) { printf("you must enter something using \'#\'\n"); }

    } while (count == 0);

    string_parser(target,worda,'#',0);
    printf("you entered : %s\n",worda);


    //count # parts

    printf("found %d parts by \'#\'\n\n",count);

    for(x = 1;x <= count;x++)
    {
      string_parser(worda,uword,'#',x);

      printf("part#%d : %s\n",x,uword);
    }


    //count spaced parts

    count = string_parser(worda,uword,' ',-1);

    printf("found %d parts by \' \'\n\n",count);

    for(x = 1;x <= count;x++)
    {
      string_parser(worda,uword,' ',x);

      printf("part#%d : %s\n",x,uword);
    }

    printf("\ntest again (y/n)?");

   } while((ynreturn()) == 1);



   return 0;
}


short int string_parser(char *userstring, char *target, char magic, short int word)
{
  //subject must be static
  static char subject[PARSE_SIZE_LIMIT];//stores 80 chars +1 for the null character

  char *w = &magic;//assign the address of magic to w

  char a = 0,b = 0,c = 0,x = 0,z = 0;//temp variables

  short int length = 0,orig = 0;// get the length of the string
  short int offset = 0,y = 1;//segment control
  char *p;//used for assigning pointer address returned by strtok()

  //** format subject array of 'natural' garbage **

  for (x = 0;x < PARSE_SIZE_LIMIT; x++) { subject[x] = ' '; }
  x = 0;

  orig = strlen(userstring);//get size length target array

  //offset the copy of the userstring by 1 if the first character is not magic

  //if(userstring[0] != magic)
  //{
     //printf("offset detected\r\n");//temp
     //offset = 1;
     //subject[0] = magic;
  //}

  //copy and reformat the userstring into, 0 - PARSE_SIZE_LIMIT ; format.

  //for lines larger or equal than the size limit....
  if (orig >= (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= (PARSE_SIZE_LIMIT - 1); x++)
    { subject[x] = userstring[x]; }
  }

  //for lines less than the size limit....
  if (orig < (PARSE_SIZE_LIMIT - 1))
  {
    for (x = 0; x <= orig; x++)
    { subject[x] = userstring[x]; }
  }

  subject[x+1] = '\0';//add terminating null character

  //if (word == -1) { printf("<%s> ",subject); }//temp - keep

  length = strlen(subject);//get string length of subject[];

  //count where all the "magic" characters are
  for(x = 0;x < length;x++)
  {
     //if the subject character = "magic" character then count it
     if(subject[x] == magic)
     {
       a = x;//set a to x , to match where x is currently at.

       // check for no words between spaces if x > 0
       // if x = 0 then do nothing....
       if(x > 0)
       {
         c = a - b;// c = the new count (of x) - the old count (of x)

         //a magic character must have a difference between it's last
         //position (if there is one) and its new position greater
         //than 1 character to be counted as magic character.
         //Therefore, if "#" is a magic character and a string is,
         //"##magic:" it would be discounted. A string such "# #magic:"
         //would be counted as two magic characters

         if(c > 1) { y++; } /* if c > 1 then add 1 to y */
       }

       b = a;//set b to a , to save the previous count.
     }
  }

  //if the original length of the userstring is (PARSE_SIZE_LIMIT - 1)
  //then subtract 1
  if (orig >= (PARSE_SIZE_LIMIT - 1)) { y -= 1; }

  //return the number parts in the string
  if(word == -1)
  {
    //printf("= %d\r\n",y);//temp - keep
    return y;
  }

  //return the modified user string
  if(word == 0) { strcpy(target,subject); }

Before I had DJGPP - was using a really old ( obsolete ) compiler , Borland Turbo C++ for DOS v3.0 which used the code as an example on how to use strtok(); but it works even under DJGPP
so to continue the rest of my function listing ....

Code:

  //copy the first word in the string (must be done to find the next words)
  if(word == 1)
  {
    p = strtok(subject,w);
    strcpy(target,p);
  }

  //copy all the next words (sequencially) in the string
  if(word > 1)
  {
    p = strtok(NULL,w);
    strcpy(target,p);
  }

  return 0;
}

//generic string prompt - size was eliminated for TC++
short int kfgets(char *target)
{
   short int a;//temp. variables
   char line[81];//temp string storage

   //user must provide prompt for whatever information wanted from the user

   //format the strings
   for(a = 0;a < 81; a++) { line[a] = '\0';}

   fgets(line , 81, stdin);
   a = strlen(line);
   line[a-1] = '\0';//get rid of the newline character added by fgets

   strcpy(target,line);// copy line to the target array

   return 0;
}

short int ynreturn(void)
{

    short int result = 0;
    char x[3];

    while (fgets(x,3,stdin) != NULL && x[1] != '\n');

    switch(x[0])
    {
       case('Y'):
       case('y'):
       {
          result = 1;
          break;
       }

       case('N'):
       case('n'):
       {
          result = 0;
          break;
       }

       default:
       {
          result = 0;
          break;
       }
    }

    return result;
}

so what I am trying to do in my bigger program is break up strings by '#' and then by ' '.....

so in my big program a text line such as :

#clear:

...causes

#newline: 1

...to be parsed as :

#newline:

using the '#' as the token
before its parsed with ' ' as the token

The reason that I started this post as undefined behavior is that DJGPP was compiling my big program to allow parsing of the text line

#newline: 1

as

#newline: 1

other compiliations via DJGPP would do the previous behavior even though the only change
I was doing was commenting out cprintf statements that would show me what the value of
what was going on durring the process as a means of debugging my larger program.
So the way my bigger program works

is that in text file is that #newline: 1 is parsed as #newline: 1 if it comes before #clear: which
is parsed as #clear using '#' as the token. if the two lines are switched the other way around then
the problem occurs.

In the end, my bigger program compiles fine no matter what ... which makes trying to find what IS causing the undefined behavior to be troublesome

Im going to be posting my bigger program source code listing on sourceforge as soon I finish understanding git. I actually have a functioning program that I saved that was compiled with
the Borland compiler, that works.

Thread: General question about undefined behavior

Thread Tools

Search Thread

Display

Hybrid View

General question about undefined behavior

Similar Threads

Undefined behavior

Static vs. Dynamic Arrays, Getting Undefined Behavior

Is x=x++; Undefined Behavior?

Undefined behavior from VC6 to 2k5

openGL: textures, gluLookAt, and undefined behavior

Tags for this Thread