Thread: adjusting character counts for utf8

  1. #16
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I suppose you could always use the -E flag to make sure strcasestr is actually there (as this is the warning I would expect if the prototype was missing). (Are you using -std or -ansi flags?)

  2. #17
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    I suppose you could always use the -E flag to make sure strcasestr is actually there (as this is the warning I would expect if the prototype was missing). (Are you using -std or -ansi flags?)
    Oh it's there. This works:
    Code:
    #include <string.h>
    #include <stdio.h>
    
    int main() {
    	char *ptr=strstr("this and that","and");
    	printf("%p %s\n",ptr,ptr);
    	ptr=strstr("this and that","and");
    	printf("%p %s\n",ptr,ptr);
    	return 0;
    }
    But obviously it's useless for illustrating my point. I have the same problem on my other computer, which is running a slightly older glibc/gcc (FC7-32 vs. FC10-64).

    I'm not using the flags you mention, but I see no good reason for strcasestr to cause a segfault like that.

    What do you think about what I was trying to say about the pointer addressess? How could two pointers with two values which appear different to me effectively point to the same place (like I said, the first find would be fine, but then the subsequent search would go awry because of the address returned from the first find -- meaning that pointer was no good for arithmetic)??
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #18
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Maybe I got lost with what you were asking on the pointer question -- just because two pointers point to the same content doesn't mean they point to the same place.

    I can't find anything weird about strcasestr; I get normal things, but I'm not using utf8 characters.

  4. #19
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    Maybe I got lost with what you were asking on the pointer question -- just because two pointers point to the same content doesn't mean they point to the same place.
    !!
    huh...I had presumed that there is one copy of the buffer in memory, and that any pointer into that content must point into somewhere in that one copy.

    I can't find anything weird about strcasestr; I get normal things, but I'm not using utf8 characters.
    So the incredibly simple example above, ptr=strcasestr(haystack,needle), does not segfault for you? There are not utf8 characters in that.

    I'd be curious to know, since I'm writing for other linux systems, and I would say FC7 and FC10 are very common. If it works on other systems, then our strcasestr is a dud.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #20
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I was using BSD/Mac OSX, and the "this and that" example worked fine.

    I have no idea how (your version of) strcasestr is implemented -- I would expect it to point to the original piece of memory, but I don't think it's guaranteed. Maybe it is. Completely wild guess -- does strcasestr make a copy so that it can tolower/toupper the entire string and then use that memory? (I wouldn't think so, but hey.)

  6. #21
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> test.c:8: warning: assignment makes pointer from integer without a cast
    That's bad news. Compile with '-Wimplicit'. I think tabstop was right-on about the prototype being missing. Some online man pages say _GNU_SOURCE is required.

    gg

  7. #22
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Well, I tried _GNU_SOURCE, -Wimplicit, even include <ctype.h> etc. No dice. Then I just added my own prototype at the top:
    Code:
    char *strcasestr (const char *haystack, const char *needle);
    and it works. I'm kind of surprised; this has obviously been the case for several years.

    Anyway, I won't be using strcasestr because, as tabstop confirms, it may return a pointer into a copy of (part of) haystack rather than into haystack itself.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #23
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    >> it may return a pointer into a copy
    I highly doubt it. All documentation suggests otherwise. Not to mention, there's no reason what-so-ever to make a copy.

    gg

  9. #24
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Codeplug View Post
    >> it may return a pointer into a copy
    I highly doubt it. All documentation suggests otherwise. Not to mention, there's no reason what-so-ever to make a copy.

    gg
    Yes, sorry, it meant it may as in "as far as I know, it's possible", not "as far as I know, it does".

    We didn't expect -Wimplicit to stop segfaulting -- we expected -Wimplicit to give you a line that says "hey, I don't know what this strcasestr thing is" when you compiled.

  10. #25
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Codeplug View Post
    >> it may return a pointer into a copy
    I highly doubt it. All documentation suggests otherwise. Not to mention, there's no reason what-so-ever to make a copy.
    gg
    So would I. But then, I would include a prototype if I was writing a function for the C library! So who knows?

    Plus there is this, tho it may be just hearsay:
    Quote Originally Posted by me
    What do you think about what I was trying to say about the pointer addressess? How could two pointers with two values which appear different to me effectively point to the same place (like I said, the first find would be fine, but then the subsequent search would go awry because of the address returned from the first find -- meaning that pointer was no good for arithmetic)??
    In fact, I had to put a control in so that if the return pointer was beyond the end of the search, the search would end (otherwise there was another seg fault for submitting this mystery pointer back to strcasestr). There's a toggle on the GUI to determine the case sensitivity, and lots of debugging to the console, so I could sit and just toggle the button on and off and hit return to do the exact same search on the exact same buffer, over and over. In both cases, the result in the GUI was the same -- str(case)str found the first term. But the pointer value in the debug output was, as I said, outside the buffer for strcasestr. Because I was just one substituting one function for another, there was nothing in my code which could account for the discrepancy.
    Perhaps intentionally leaving the prototype out is a coy way of indicating something...
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #26
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    An implicitly declared function is treated as returning int. That would be bad news on a 64bit system.

    gg

  12. #27
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MK27 View Post
    So would I. But then, I would include a prototype if I was writing a function for the C library! So who knows?
    But ... but ... when I specifically asked whether a prototype was included in the preprocessor output (what you see with -E) you said "Oh it's there." So which is it?

  13. #28
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by tabstop View Post
    But ... but ... when I specifically asked whether a prototype was included in the preprocessor output (what you see with -E) you said "Oh it's there." So which is it?
    What you actually wrote, being a smart, but perhaps a little inscrutable or cryptic person, who occasionally seems too lazy to completely explain him or herself to ignoramuses like me, is:
    I suppose you could always use the -E flag to make sure strcasestr is actually there (as this is the warning I would expect if the prototype was missing).
    Using the -E flag just made me roll my eyes. If it knew what it was for, I suppose that might have meant the same thing as what you just wrote. Sorry! You know there is just too much information, and people like myself sometimes have to wear blinders so we don't get distracted by too many new things.

    Anyway, I was of the belief that if a function call actually works, then the function must be somewhere. That someone could include a function in a library somewhere without a prototype (as I guess is the case) wouldn't have occurred to me, especially since I don't think gcc would let me do that.

    But thanks for your help, tabstop. I can live with "somewhat cryptic".
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #29
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Codeplug View Post
    An implicitly declared function is treated as returning int. That wouldn't be bad news on a 64bit system.
    gg
    Could this explain the pointer wierdness? I would think the value would have to be the same anyway...although I notice on my 64 bit system, an int is actually twice as big as a pointer. Hmmm. Only with a very large buffer. Hmmm.

    If I can find an older version of the code (way back when I was still using strcasestr) I will try it on the 32.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  15. #30
    In my head happyclown's Avatar
    Join Date
    Dec 2008
    Location
    In my head
    Posts
    391
    Quote Originally Posted by MK27 View Post
    What you actually wrote, being a smart, but perhaps a little inscrutable or cryptic person, who occasionally seems too lazy to completely explain him or herself...
    You almost got it 100% right.

    But that's what makes tabstop 85%-90% awesome.
    OS: Linux Mint 13(Maya) LTS 64 bit.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Character literals incorrectly interpreted
    By DL1 in forum C Programming
    Replies: 11
    Last Post: 04-05-2009, 05:35 PM
  2. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  3. Character handling help
    By vandalay in forum C Programming
    Replies: 18
    Last Post: 03-29-2004, 05:32 PM
  4. character occurrence program not working
    By Nutshell in forum C Programming
    Replies: 6
    Last Post: 01-21-2002, 10:31 PM
  5. Replies: 12
    Last Post: 01-12-2002, 09:57 AM