Thread: How to work with strings containing possibly several NULLs

  1. #1
    Registered User
    Join Date
    Jan 2004
    Posts
    13

    How to work with strings containing possibly several NULLs

    I am working on a mail environment which requires me to retrieve mails from a POP3 server. For everyone unfamiliar with how the POP works:

    You issue CRLF-terminated commands and retrieve answers of unknown physical lengh containing ASCII chars terminated by a CRLF.CRLF

    If I now want to work on parts of that retrieve message - say extract information from the mail header - I'd have to use string-dependent functions - like strstr() or some regexing stuff. But, as I mentioned before, the answer may contain every ASCII char - even the '\0'/NULL character.

    Hence, how would you propose I should handle this data? Is there a string function which doesn't depend on the NULL-byte rather than on a given length? Or do I have to write some code which will iterate through all the chars of the answer, replace every NULL-byte with a sequence of my choice and later replace that replacement char again so the integrity isn't destroyed?

    Sloede

  2. #2
    .
    Join Date
    Nov 2003
    Posts
    307
    memmove() moves (copies) memory from one place to another, based on the number of bytes.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > But, as I mentioned before, the answer may contain every ASCII char - even the '\0'/NULL character.
    Huh?
    I thought POP3 was a text protocol

    http://www.rfc-editor.org/rfc/rfc1939.txt
    Section 11
    11. Message Format

    All messages transmitted during a POP3 session are assumed to conform
    to the standard for the format of Internet text messages [RFC822].
    Anything which isn't printable will be encoded using one of several encoding mechanisms.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Jan 2004
    Posts
    13
    Where did you find that anything which is not printable will be encoded?

    I just found that in RFC 822:

    The field-name must be composed of printable ASCII characters (i.e., characters that have values between 33. and 126., decimal, except colon). The field-body may be composed of any ASCII characters, except CR or LF.

    [...]

    The body is simply a sequence of lines containing ASCII characters.
    That means to me that NULLs can very well be part of the header-field-contents or the message body, or am I getting this totally wrong?

  5. #5
    Obsessed with C chrismiceli's Avatar
    Join Date
    Jan 2003
    Posts
    501
    null in ascii is 0, that is not between 33 and 126, so it mustn't be allowed. Null is also nonprintable, meaning it isn't printable, so there is another reason it shouldn't be allowed.
    Help populate a c/c++ help irc channel
    server: irc://irc.efnet.net
    channel: #c

  6. #6
    Registered User
    Join Date
    Jan 2004
    Posts
    13
    Your right, but that is only true for field-names. The field-body may contain _any ASCII char except CR/LF and the message body may contain ALL ASCII chars - and that's the problem I have:

    How to work with string functions on ASCII char-data if the data may contain NULL-bytes?

  7. #7
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>How to work with string functions on ASCII char-data if the data may contain NULL-bytes?
    You need to identify another method for recognising the end of the string, in this case, that'll be the CRLF I presume. Just write your own parsing routines to work on that basis.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  8. #8
    Registered User
    Join Date
    Jan 2004
    Posts
    13
    But how would you suggest to rewrite the regexing functions? Go through the source code and change everything which doesn't fit?

    Seems like hell of a lot work

    But if you state that this is my solution - why not trying it?


    Peace,

    Sloede

  9. #9
    Registered User
    Join Date
    Oct 2003
    Posts
    49
    I don't believe that there can be NULL bytes in an email but if there really are just replace them with some non-ascii byte before processing.
    Every byte with a value of more then 127 is not an ASCII character. So there can't be any bytes bigger then 127 in an email. That means you can e.g. replace all '\0' with 0xFF.

  10. #10
    Registered User
    Join Date
    Jan 2004
    Posts
    13
    That makes sense! Especially when there's no obvious reason for a NULL-byte, there shouldn't be a great loss of information by changing it to some other byte. I'll probably choose some non-printable char as well, as that wouldn't even change the appearance of the text (like the ACK byte for example)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 04-29-2009, 10:13 AM
  2. Strings Program
    By limergal in forum C++ Programming
    Replies: 4
    Last Post: 12-02-2006, 03:24 PM
  3. Problem with Strings, Please help!
    By varus in forum C++ Programming
    Replies: 8
    Last Post: 11-27-2006, 11:47 PM
  4. The Bludstayne Open Works License
    By frenchfry164 in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 11-26-2003, 11:05 AM
  5. Table mapping Strings to Strings
    By johnmcg in forum C Programming
    Replies: 4
    Last Post: 09-05-2003, 11:04 AM