Thread: strstr and utf-8

  1. #1
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300

    strstr and utf-8

    If I want to use strstr on an unsigned char string (the "haystack") containing utf-8 characters, and the "needle" (search for string) is signed, will this matter?

    With normal ascii characters I know it does not, but I'm asking because I don't use multi-byte alphabets myself so it's hard for me to verify that a search which includes utf-8 in both the unsigned and signed string will match. The haystack is unsigned because it's easier to count out the extra utf bytes (for another purpose) but I don't need to do that to the needle, which is actually a (signed) gchar returned by an API function. So I'd rather leave them both the way they are, if that is okay, which I suspect it is, but figured I'd better ask.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Should be ok. The danger comes from sign extension, and an implementation would be pretty dumb to allow that in any of it's char* string functions.

    gg

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Yay.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Tags for this Thread