Thread: Researching how to best import CSV and separate content

  1. #31
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Provided your compiler allows you to, otherwise you gotta use typecasting to (intptr_t) or similar to force it to shut up about pointer arithmetic (like with GCC). An index you don't need pre-compiler and/or typecasting.

  2. #32
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    Provided your compiler allows you to, otherwise you gotta use typecasting to (intptr_t) or similar to force it to shut up about pointer arithmetic (like with GCC). An index you don't need pre-compiler and/or typecasting.
    The result of pointer subtraction is of type ptrdiff_t, a signed integer type. An index would normally be of type size_t, an unsigned integer type. Conversion from a signed integer type to an unsigned integer type is always well defined, and since it is the result of a pointer subtraction where the subtrahend is not less than the minuend, it is reasonable to assume that it will be in the range of size_t.

    So no, an index is not more versatile. A pointer allows you to immediately use the token instead of doing pointer arithmetic. Considering that in many tokenisation tasks, one wants to use the token, not the index thereof, I do not see an advantage in returning an index instead of a pointer.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #33
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Instead of us arguing about this how about I just agree to use this which fits both our arguments:
    Code:
    typedef struct STR_POS_
    {
      size_t i;
      char  *p;
    } STR_POS;
    STR_POS strntok (
      char    *dst,
      size_t  size,
      STR_POS  tok,
      char const *delim )
    {
      size_t i = 0;
      size_t tokLen = strcspn( tok.p, delim );
    
      while ( i < tokLen && i < size )
      {
        dst[i++] = *tok.p;
        ++tok.i;
        ++tok.p;
      }
    
      if ( *tok.p )
      {
        ++tok.i;
        ++tok.p;
      }
    
      dst[ ( i < size ) ? i : --i ] = '\0';
    
      return tok;
    }
    After all our arguing I finally saw a way to simplify my original version
    Edit: Forgot to force pointer/index forward if it doesn't stop at NULL
    Last edited by awsdert; 02-07-2015 at 06:15 AM.

  4. #34
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    Instead of us arguing about this how about I just agree to use this which fits both our arguments:
    In post #20, when I provided my example of how I might implement your strntok function, I not only showed the implementation of the function, but an example of how one might use strntok to parse "Example;1;2;3" into tokens in a loop, with ";" as the delimiter string. I suggest that you provide a similiar example for your version with STR_POS.

    Consider what happens if there is a token that is too large to be stored in the destination array. What happens when you call strntok again to parse the next token? In my proposed implementation, only the first n characters of each token will be stored. In your most recent implementation, parsing will continue in the middle of the token rather than at the start of the next token.

    That said, I note that there is a problem with my claim concerning pointer arithmetic to obtain the index: my argument trivially holds true for strtok, since strtok returns a pointer to the first character of the current token. However, it does not hold true for my proposed strntok, since strntok stores the token in a separate array and returns a pointer to the start of the next parse. In other words, we can compute the index of the next token, but not the current token, with pointer arithmetic involving that pointer. (Of course, this isn't so bad: we already know that the index of the first token, if it exists, is 0, and after that we can obtain the subsequent indices.)
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #35
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Well for an example the idea is this:
    Code:
    char line[80] = "0123 4567 89AB CDEF";
    STR_POS tok = {0,line};
    char linePiece[5] = "";
    while ( (tok = strntok( linePiece, 5, tok, " " ) ).p )
    {
      printf( "Line Index #%u: %s", tok.i, linePiece );
    }
    size_t lineLen = tok.i;
    // Continue with more stuff
    Though looking at it now I should've made the test check for the delimiter instead before forcefully incrementing

  6. #36
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Have you tried writing a program to test?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #37
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Not yet but application I'm working on needs it during startup at least so next time I run it I will see if it needs any further work (which case I'll get my experimenter project open instead)

  8. #38
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    Remind me why you're not using plain strtok()? It has its issues, but I don't think your solution improves on it. Don't misunderstand -- your code is fine. There are things about it that I'd change, but they're more to do with futureproofing and potential problems than "arrrrg nooo"s.

    The issues with standard strtok are:
    • It's not thread-safe.
    • You can't process two different strings at the same time.
    • It modifies the original list.
    • You have to call it once with the string then repeat with NULL.
    • You have to do some pointer arithmetic to get the element index.


    As far as I can tell, your version only eliminates the last two issues.

    Here's how easy it is to get the index of an element, given the start of the array:
    Code:
    ptrdiff_t GetIndex(char* start, char* element)
    {
    	return element - start;
    }
    IMPORTANT
    Quote Originally Posted by laserlight
    That said, I note that there is a problem with my claim concerning pointer arithmetic to obtain the index: my argument trivially holds true for strtok, since strtok returns a pointer to the first character of the current token. However, it does not hold true for my proposed strntok, since strntok stores the token in a separate array and returns a pointer to the start of the next parse. In other words, we can compute the index of the next token, but not the current token, with pointer arithmetic involving that pointer. (Of course, this isn't so bad: we already know that the index of the first token, if it exists, is 0, and after that we can obtain the subsequent indices.)
    You mentioned a 2D array in your first post. A 2D array full of pointers.... THAT is where your compiler will start moaning about pointer subtraction. As stated above, it's safe and legal within the same object or array -- but getting the types right is a bit trickier than with a single array. And DO NOT try to do straightforward subtraction like above on two different arrays. IIRC, it's undefined behaviour. At best you'll get the actual value between the two arrays, which is absolutely meaningless.

  9. #39
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    To those who prefer to use strtok(), but dislike its drawbacks:

    You can use strtok_r() on all POSIX.1-2001 systems (just about every OS except Windows), and strtok_s() in Microsoft Visual Studio (available since 2005, at least). Both have the exact same interface: the additional third parameter is a pointer to a char pointer. This char pointer is all the state the function needs, so these versions of the function are thread-safe and can be used on multiple strings at the same time.

    If you want portable code, then I suggest you replace the #include <string.h> at the top of your program with
    Code:
    #if _MSC_VER >= 1400
    #include <string.h>
    #define strtok_r strtok_s
    #else
    #if _POSIX_C_SOURCE < 1
    #define _POSIX_C_SOURCE 200809L
    #endif
    #include <string.h>
    #endif
    and then just use strtok_r() as documented.

    I know Microsoft claims POSIX is not a standard (for example on the Security Features in the CRT page, "Another source of deprecation warnings, unrelated to security, is the POSIX functions. Replace POSIX function names with their standard equivalents (for example, change access to _access)", which is blatantly untrue, and the reason I hate to venture to MSDN.) The truth remains that POSIX is a standard, one MS itself supported for quite a while, and MS is just one vendor; the one who refuses to comply to any standards. Rant over. Ahem.

    The way this works is that if you are using MS Visual Studio, strtok_r is declared a macro, so that the compiler actually sees strtok_s instead. Otherwise, the _POSIX_C_SOURCE macro is defined before including the header, so that these POSIX-1.2001 functions should be provided by the standard C library. (Actually, I set the value so that up to POSIX.1-2008 should be available, just in case.)

    I have verified that this works in non-Windows systems, but I cannot test it with MS Visual Studio. Would anyone care to confirm?

  10. #40
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    * Some posts in this thread was moved to awsdert's utility library *

    Quote Originally Posted by Nominal Animal
    To those who prefer to use strtok(), but dislike its drawbacks:

    You can use strtok_r() on all POSIX.1-2001 systems (just about every OS except Windows), and strtok_s() in Microsoft Visual Studio (available since 2005, at least). Both have the exact same interface: the additional third parameter is a pointer to a char pointer. This char pointer is all the state the function needs, so these versions of the function are thread-safe and can be used on multiple strings at the same time.
    Yeah, it appears in the manual entry alongside strtok on my Linux distro, so I was aware of it.

    Quote Originally Posted by Nominal Animal
    I know Microsoft claims POSIX is not a standard (for example on the Security Features in the CRT page, "Another source of deprecation warnings, unrelated to security, is the POSIX functions. Replace POSIX function names with their standard equivalents (for example, change access to _access)", which is blatantly untrue, and the reason I hate to venture to MSDN.)
    POSIX is a standard, but it is not the C standard, so in that sense Microsoft is right, though it is poorly phrased: they are saying that as the authors of the implementation, they are providing these POSIX functions under names reserved to them by the C standard (C99 Clause 7.1.3 paragraph 1: "All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces.")
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #41
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by laserlight View Post
    POSIX is a standard, but it is not the C standard, so in that sense Microsoft is right, though it is poorly phrased
    Okay, I guess that's true.
    Quote Originally Posted by laserlight View Post
    C99 Clause 7.1.3 paragraph 1
    It *is* kind of funny, though, considering Microsoft's reluctance to implement said C99 standard. I'm not exactly sure if it is fair to assume your phrasing is interpreted based on a standard you have publicly stated you will not comply with -- even if some allowances have been made recently. (As far as I can see, ANSI C/C89/C90 does not contain a similar clause; the ones I found seem to only refer to reserved macro names.)

    Anyway, that was just my small rant. I'm sure I could keep them to myself, if there just was a sub-board to move or split POSIX-related C questions to. Many programming examples and algorithms -- including this one (using getline() and strtok_r() and/or opportunistic parsing via sscanf()) -- would be so much more robust and often simpler that way. It's not like writing a header file to provide the most useful functions in non-POSIX systems is at all difficult, making it easy to write partially-POSIXy C on any system, but I'm not sure if it would be received here as purely a tool for useful portability features. (It would be fun to fine-tune such a header, as a group effort, though.)

    I do feel I am being .. antagonistic? .. when I post this kind of POSIX-related stuff here, and that's not good. I'm a colleague, not an opponent. I really just want to help others write good, robust code (and not just code that works on their own machine), but it seems to backfire every now and then.

    Yes, this message is #2 in my ongoing series of pointing out why a POSIX C sub-board alongside the existing C and C++ ones would be especially nice and useful. If one were set up, however, I now publicly promise to write examples for the most common use cases and gotchas.

  12. #42
    Registered User
    Join Date
    Mar 2010
    Posts
    583
    Heh - a few years ago I heard on the grapevine that Microsoft had finally bracked and decided to support C11 (I don't mean C++11). Ot was the most reliable friend so I went on the net and found pages of articles about why they resistef, why they changed their mind, what they would and wouldn't support and why. Intesting stuff.

    Search for it now though, and there is not a trace. Not a news article, not a blog or forum posts --nothing!! I don't feel strongly about C11, but the total disappearances makes me wonder if I was dreamning. I wasn't!!

    I have no major issues witn Mivrosoft's compiler. Their OS API is pure evil though, stay far away. Their saving grace there is that they publish a lot of their source code, which is the only way to work out some of the nonsense.

    Come to think of it, it's nighttime now so I mgith do the huge download of Visual Studio. IT is crashing all the time and seems to have lost half its DLLs and headers. Did I mention I have no gripes with Microsoft?

  13. #43
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Nominal Animal
    (As far as I can see, ANSI C/C89/C90 does not contain a similar clause; the ones I found seem to only refer to reserved macro names.)
    It does contain a similiar clause. Refer to 4.1.2 paragraph 1: "All external identifiers that begin with an underscore are reserved."

    I do know an obvious example where Microsoft has adopted a C99 or later feature: // style comments
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  14. #44
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by laserlight View Post
    It does contain a similiar clause. Refer to 4.1.2 paragraph 1: "All external identifiers that begin with an underscore are reserved."
    Right you are.

    I still don't understand Microsoft's reasoning for this wording, nor the deprecation warnings. In particular, they don't seem to have the warnings or use the underscore prefix for their *own* additions. After all, ANSI C allows any function that starts with str or mem to be defined in string.h. Yet, strtok_r() is somehow a nonstandard deprecated function, whereas strtok_s() is a "secure" version of a standard function. Both work exactly the same, the only difference being the name.

    No, I think this wording is just another Microsoft attempt at vendor lock-in.

  15. #45
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Nominal Animal
    In particular, they don't seem to have the warnings or use the underscore prefix for their *own* additions. (...) Yet, strtok_r() is somehow a nonstandard deprecated function, whereas strtok_s() is a "secure" version of a standard function.
    I think the reason is that some parties in Microsoft were pushing for standardisation of the _s variety of functions, and since they did manage to get their way (e.g., strtok_s is an optional part of the C11 standard library), there was then no need to deprecate them.

    Quote Originally Posted by Nominal Animal
    After all, ANSI C allows any function that starts with str or mem to be defined in string.h.
    Yes, but this rule is evidently for "future library directions".

    Quote Originally Posted by Nominal Animal
    Both work exactly the same, the only difference being the name.
    Not quite: strtok_r does not have the size parameter that is found in strtok_s.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Researching C++ - Your Opinions Wanted Here!
    By McCrockett in forum C++ Programming
    Replies: 2
    Last Post: 11-08-2012, 09:38 AM
  2. How to import DLL function from a separate C file
    By high123_98 in forum C++ Programming
    Replies: 7
    Last Post: 11-12-2011, 11:57 AM
  3. ASM dll import
    By borko_b in forum C Programming
    Replies: 1
    Last Post: 04-03-2003, 12:15 AM
  4. Import Dll
    By Shakespeare in forum C++ Programming
    Replies: 2
    Last Post: 01-27-2003, 05:40 AM
  5. NT Service - researching...
    By schu777 in forum Windows Programming
    Replies: 3
    Last Post: 03-25-2002, 02:58 PM

Tags for this Thread