Thread: General question about undefined behavior

  1. #31
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    from a previous post...
    ISO/IEC 9899:1999 (E) ŠISO/IEC

    7.21.5.8 The strtok function
    Synopsis
    1 #include <string.h>
    char *strtok(char * restrict s1,const char * restrict s2);

    Description
    2 A sequence of calls to the strtok function breaks the string pointed
    to by s1 into a sequence of tokens, each of which is delimited by a
    character from the string pointed to by s2. The first call in the
    sequence has a non-null first argument; subsequent calls in the
    sequence have a null first argument.

    ......
    w1 and w2 are both c strings that observe the standard explained as s2, which is defined in the underlined next sentence. That's how I understand it.
    Last edited by kjwilliams; 06-16-2013 at 02:33 PM. Reason: addendum

  2. #32
    Registered User
    Join Date
    Aug 2005
    Location
    Austria
    Posts
    1,990
    w2 is not a c-string.
    Kurt
    EDIT: you seem to confuse null-pointer with nul-character

  3. #33
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    @phantomotap


    I suggest you get used to the idea of strings as a protocol and not a type.
    C strings are neither a protocol or a type, they are arrays - as defined in the C standard

  4. #34
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    That's how I understand it.
    O_o

    As has been explained, your understanding is simply wrong.

    A "C string" is a specific thing. A single character, unless that character is null, is not a "C string".

    Besides, the `strtok' documentation is referencing passing `null' to the first parameter for subsequent calls where `null' isn't valid for the second parameter.

    C strings are neither a protocol or a type, they are arrays - as defined in the C standard.
    Yes, a "C string" is an array, but an array is not necessarily a "C string".

    You have to follow the protocol of ending a "C string" with a null-terminator.

    If you don't follow the protocol, as you haven't, you just have an array.

    Soma
    Last edited by phantomotap; 06-16-2013 at 02:51 PM.

  5. #35
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Quote Originally Posted by kjwilliams View Post
    from a previous post...


    w1 and w2 are both c strings that observe the standard explained as s2, which is defined in the underlined next sentence. That's how I understand it.
    Firstly, this is talking about s1 (which is the first argument), not s2.

    There's also a world of difference between a null ARGUMENT and a string ending in a null character.

    Code:
       char * s1 = NULL; // This is a null pointer
       char * s2 = "-";    // This points to a string which (like every C-string) ends in a null character
    When they say null argument, they mean a null pointer.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  6. #36
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Quote Originally Posted by kjwilliams View Post
    C strings are neither a protocol or a type, they are arrays - as defined in the C standard
    And a square is a rectangle, but that doesn't mean all rectangles are squares.

    C strings are indeed arrays of characters, but they have additional restrictions. To be a string, somewhere before the end of the array there must be a \0 character, and any data that is beyond the \0 character is not considered part of the string, though it would still be part of the array.

    For example:

    Code:
       char buffer[20];
       strcpy(buffer,"Hello");
    At this point in the code, buffer contains a valid C string, of length 5. Six of the 20 characters of the array are used ('H', 'e', 'l', 'l', 'o', '\0'). The other fourteen characters of the array are not part of the string

    Code:
       buffer[5] = '!' ;
    At this point buffer no longer contains a valid string. Buffer is still a fully correct character array which begins with Hello!, but cannot be used as a string because we no longer have a null terminator.

    Code:
       buffer[4] = '\0';
    Now, buffer again contains a valid string, of length 4 and contents "Hell". The exclamation point at buffer[5] still exists, but it's not part of the string.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  7. #37
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    C strings are neither a protocol or a type, they are arrays - as defined in the C standard
    I do not think that you actually read the standard though, e.g.,
    Quote Originally Posted by C99 Clause 7.1.1 Paragraph 1
    A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
    Notice that, "as defined in the C standard", the word "array" does not even appear in the definition of a "string". Of course, you might argue that "contiguous sequence of characters" is synonymous with "array", but:
    Quote Originally Posted by C99 Clause 6.2.5 Paragraph 20a
    Any number of derived types can be constructed from the object, function, and incomplete types, as follows:
    — An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called ‘‘array of T’’. The construction of an array type from an element type is called ‘‘array type derivation’’.
    Thus, the string length is defined with respect to the first null character, but the number of elements in the array does not depend on any special value in its content. Indeed, an array can be used to store a string, but that does not mean that every array of char is a string.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #38
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Ok.... All of you who have been telling me my program is wrong in part or another way, again... have been right!

    And I will have to , AGAIN, redesign my string_parser program.

    Let me explain something. For the majority of my programming C, I used an old (obsolete ) compiler called Borland Turbo C++ and when I committed a memory violation at run time it would say, "null pointer error". On GCC in Linux it would say, "segmentation fault". Same thing. DJGPP is kind of a new compiler that I have been learning to understand. And finally the light has cracked in to my skull as to what DJGPP implements as the same message.

    it looks like:
    Exiting due to signal SIGSEGV
    General Protection Fault at eip=000093a3
    eax=49420001 ebx=0001b500 ecx=00000000 edx=00000006 esi=0001b5b4 edi=0009d5c4
    ebp=0009d508 esp=0009d4a0 program=C:\DJGPP\PROGRAMS\WATT.EXE
    cs: sel=0187 base=8395b000 limit=000bffff
    ds: sel=018f base=8395b000 limit=000bffff
    es: sel=018f base=8395b000 limit=000bffff
    fs: sel=0167 base=000113b0 limit=0000ffff
    gs: sel=019f base=00000000 limit=0010ffff
    ss: sel=018f base=8395b000 limit=000bffff
    App stack: [0009d7f0..0001d7f0] Exceptn stack: [0001d74c..0001b80c]

    Call frame traceback EIPs:
    0x000093a3
    0x000065c5
    0x00005d0e
    0x0000bd68
    You see, when I programmed string_parser again in DJGPP's GCC for Windows/MS-DOS and tested it, DJGPP would not say that *nice* message (above) to me. But when I implemented it in my bigger program called WATT, DJGPP's GCC would *sometimes* blast that message at me. And for a long time, up untill now the message was a mystery. Now the question is why didn't DJGPP's GCC
    blast that message at me when I rewrote it and tested it, before I implemented it into my bigger program?

    Well, Now I've figured it out - because strtok() is sending back a NULL pointer that I am not testing for. Which means, I am not properly using it .... so... again - I got to redesign that function again.

    The ISO C standard didn't help at all - who ever wrote that part about strtok (especially the first paragraph) was not being clear enough to me, no matter how many times I read it.

    P.S. there is one thing that I am also bugged by, about DJGPP - it converts MS-DOS/Windows style paths that use backslashes '\' ,to forward slashes '/' which are used in Linux ( Unix ) - when ever I use

    Code:
    printf("%s\n",__FILE__);
    it shows those forward slashes in my program's path.
    Last edited by kjwilliams; 06-17-2013 at 01:47 AM.

  9. #39
    Registered User ledow's Avatar
    Join Date
    Dec 2011
    Posts
    435
    Quote Originally Posted by kjwilliams View Post
    For the majority of my programming C, I used an old (obsolete ) compiler called Borland Turbo C++ and when I committed a memory violation at run time it would say, "null pointer error". On GCC in Linux it would say, "segmentation fault". Same thing. DJGPP is kind of a new compiler that I have been learning to understand. And finally the light has cracked in to my skull as to what DJGPP implements as the same message.
    When you do out-of-bounds access and dereferencing pointers? Yep, that's what happens. It's called variously a segmentation fault (SIG SEG V - i.e. a signal that a segmentation error happened, and a segmentation error comes from terminology that you're trying to access a "segment" of memory that you're not supposed to be able to look at), or memory violation. Happens everywhere you go and is NOT something you should be ignoring, no matter how rare. On some OS you can happily trash data and memory by doing that if you're not careful (admittedly most modern OS will protect you most of the time, but there's nothing to stop you trashing your own program/data in memory and then potentially writing that back out or acting upon it).

    If you want to catch these things, use a proper debugger to reproduce them and find their origin. As a "quick check" you can run your program through an analysis tool (e.g. things like valgrind etc.) that will spot anything dodgy happening when you run the program and warn you about it. But given that you probably aren't doing any sort of proper debugging on your code, it's a step extra to learn to do (and by proper debugging, I mean you had a problem with strtok, but didn't even narrow it down to the point where you checked what you were passing strtok and whether it was valid or not, or what your pointers were pointing at, let alone anything else - if you'd run it through a debugger, you may have noticed such problems much earlier before trying to lay the blame at a well-respected C compiler / standard library).

    And the forward/back slashes? I think you'll find that's just how DJGPP works - it was written back in the era of DOS and came from a *nix-based compiler so - like MinGW and other similar tools - attempts to help you translate such paths without you needing to do anything specific (otherwise code that does "#include <SDL/SDL.h>", for instance, would have failed on DOS). Why should it ever matter, except as a minor inconvenience if you want to copy/paste paths output by your program?

    - Compiler warnings are like "Bridge Out Ahead" warnings. DON'T just ignore them.
    - A compiler error is something SO stupid that the compiler genuinely can't carry on with its job. A compiler warning is the compiler saying "Well, that's bloody stupid but if you WANT to ignore me..." and carrying on.
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.

  10. #40
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    I recently posted a message like this in comp.os.msdos.djgpp and I am waiting for someone there to respond to see what they say.

    I think that the null pointer returned from strtok() in my string_parser() function that is triggering the General Protection Fault message in my large program (WATT.exe) , but the bigger question is - why am I not seeing this same message with my parstext.c program?

    So I posted my question , my code, the error message ... on that newsgroup...................

    Again, Im thinking that this is some wierd implementation defined behavior in DJGPP (as I originally started this post with). I would prefere that any DJGPP compiled program tears my head off in any program when I violate memory boundaries.

    @ledaw - I use ( or have ) linux on a another ( it needs updating or a different distro ), paths styles can get me into trouble if I confuse them. I use rhide to do my programming, and just seeing the forward slashes in the window's file path display makes me think, should I use forward slashes in my paths or back slashes in my paths as MS-DOS uses them?

    oh yes... from your sig block,

    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.
    Thats actually how I debug my programs.... that and commenting out code using /* ....*/ . When I debugged my code using that method, I was being led to my string_parser() function, but it took me a long time to find out what it was because of the behavior of my two programs that used string_parser() were confusing to me. Its like having two different cars that use the same engine that are wired up and configured identically - one car starts up fine and is ready to go, the other car doesn't start at all. When actually it should be that both cars should not start up at all.

  11. #41
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by kjwilliams
    oh yes... from your sig block,
    - The best debugging tool in the world is a bunch of printf()'s for everything important around the bits you think might be wrong.
    Thats actually how I debug my programs.... that and commenting out code using /* ....*/ .
    I don't really agree that it is the best though: a proper debugger would be better where applicable as it avoids the possibility of failing to remove debugging printf statements. However, a debugger might not be available. Commenting out code is also one approach, but the #if 0 approach may sometimes be better than actually introducing comments.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #42
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    After you learn to use the debugger, I suggest looking at the assert function/macro.

    Even if you decided to ignore the debugger advice, assert function/macro should be looked at.

    assert - C++ Reference

    Tim S.
    Last edited by stahta01; 06-17-2013 at 11:38 AM.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  13. #43
    Registered User
    Join Date
    Aug 2005
    Location
    Austria
    Posts
    1,990
    Quote Originally Posted by kjwilliams View Post
    it took me a long time to find out what it was because of the behavior of my two programs that used string_parser() were confusing to me. Its like having two different cars that use the same engine that are wired up and configured identically - one car starts up fine and is ready to go, the other car doesn't start at all.
    That's the very nature of undefined behaviour.
    Everything can happen. The programs may even appear to work correctly.

    Kurt

  14. #44
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Quote Originally Posted by kjwilliams View Post
    Again, Im thinking that this is some wierd implementation defined behavior in DJGPP (as I originally started this post with). I would prefere that any DJGPP compiled program tears my head off in any program when I violate memory boundaries.
    It will - depending on what memory boundaries you violate, and how the variables are arranged with respect to each other in memory.

    For example, on many compilers this will 'work' (after a fashion):

    Code:
      char x[3] = {'a', 'b', 'c'} ;
      char y = 'd';
      char z = 'e';
      char * p = x;
      p += 4;  // Undefined - walked off the end of the array
      printf("%c", *p); // Might print 'e'.
    On some implementations, that will print 'e', because x, y, and z will be laid out contiguous in the stack frame. It's undefined what happens when you access p because you've walked off the end of the array, but the runtime is often not able to catch that. If the stack frame was created such that &z = &x[0] + 4, p has a valid value - it holds the address of z.

    Some more sophisticated runtime bounds checking could potentially catch this, but generally, you only will see a runtime error if you dereference a pointer whose address could not be read by the program. If your pointer is invalid but still points to a region of memory you could read from, the runtime will generally happily dereference that address for you.

    The moral of the story is that you should never expect your tools to always catch out-of-bounds access to variables; like with much of C, the burden of doing things correctly lies with the programmer.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  15. #45
    Registered User
    Join Date
    May 2013
    Location
    United States
    Posts
    22
    Quote Originally Posted by laserlight View Post
    I don't really agree that it is the best though: a proper debugger would be better where applicable as it avoids the possibility of failing to remove debugging printf statements. However, a debugger might not be available. Commenting out code is also one approach, but the #if 0 approach may sometimes be better than actually introducing comments.
    Well, in certain cases using the built in debugger that uses break points, as in my case , RHIDE for DJGPP has one. But GDB is better - but I am out of practice with it. But I found out what was causing my program to not work....

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Undefined behavior
    By jim mcnamara in forum C Programming
    Replies: 2
    Last Post: 02-18-2013, 11:14 PM
  2. Static vs. Dynamic Arrays, Getting Undefined Behavior
    By StefPrez in forum C++ Programming
    Replies: 11
    Last Post: 01-28-2012, 11:39 PM
  3. Is x=x++; Undefined Behavior?
    By envec83 in forum C Programming
    Replies: 5
    Last Post: 10-04-2011, 01:27 AM
  4. Undefined behavior from VC6 to 2k5
    By m37h0d in forum C++ Programming
    Replies: 10
    Last Post: 06-22-2011, 07:56 PM
  5. openGL: textures, gluLookAt, and undefined behavior
    By MK27 in forum Game Programming
    Replies: 7
    Last Post: 04-28-2009, 10:12 AM

Tags for this Thread