Thread: Url query encoding/decoding

  1. #1
    Registered User
    Join Date
    Mar 2005
    Location
    Juneda
    Posts
    291

    Url query encoding/decoding

    Hello, I have been developing a simple server that can recv some data via POST method, but there's something that I can't get clear, referent to url encoding/decoding. On a script about 'ISO-8859-1 (ISO-Latin) Character Set Encoding - Decoding' says (that's only a line, if someone want's to take a look at the entire script I can post it)

    Code:
    32-47 -> Reserved Characters -> ' '!?#$%&'()*+,-./ -> Unsafe
    That means from ascii 32 to 47 are unsafe characters that should be encoded on the before send and should be decoded after received. Ok.

    My question is about the '+' character and the url query. That last script also explains de encode/decode keys, that for '+' is

    Code:
    + 	Indicates a space (spaces cannot be used in a URL)  %20
    If the browsers follow those codecoding rules, why my nsn7 sends something like that?

    Code:
    url?name=my+name
    As says the rules, doesn't it should send 'my%20name' to encode 'my name'? or is that the query is encoded using another rules? I suppose that no because all other (well, I haven't tested all the unsafe characters) are encoded using those rules; but why the space character is encoded using an unsafe character instead it's hex equivalent?

    Thank's in advance
    Niara

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Niara View Post
    but why the space character is encoded using an unsafe character instead it's hex equivalent?
    The reserved characters aren't "unsafe," they are reserved for special uses. In this case, the '+' is special because it is reserved to mean "space."

  3. #3
    Registered User
    Join Date
    Mar 2005
    Location
    Juneda
    Posts
    291
    Hello brewbuck, thank's for your time and help.

    But the query is different encoded than the url? I mean that on a url to my pc, for example

    C:/My documents/mypage.html&name=my name

    will be encoded as

    C:/My%20documents/mypage.html&name=my+name

    so the (let me call as) 'directory part' of the url encodes the space with it's ascii hex representation, but not in the query part because if I use a query like

    two words

    wil be encoded as

    two+words

    but if I use

    two"words

    will be encoded as

    two%22words

    as it was in the url part. Is that because I can't get clear: why in the query part the space can be encoded with a '+' but on the url part it is encoded with the ascii hex value?

    More thank's in advance.
    Niara

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Spaces should be encoded as "%20", but the use of "+" instead is discussed as "application/x-www-form-urlencoded type" in the Wikipedia entry on "Percent-encoding".
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Mar 2005
    Location
    Juneda
    Posts
    291
    Hello laserlight, thank's for your help and time.

    I didn't realized about the "application/x-www-form-urlencoded type" and that is really what I'm using (the default navigator form).

    Thank's both brewbuck and laserlight.
    Niara

  6. #6
    Lean Mean Coding Machine KONI's Avatar
    Join Date
    Mar 2007
    Location
    Luxembourg, Europe
    Posts
    444
    PHP makes a difference between urlencode() (where spaces are encoded as '+') and rawurlencode() (where spaces are correctly encoded with the % method). They argue that:
    Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

  7. #7
    Registered User
    Join Date
    Mar 2005
    Location
    Juneda
    Posts
    291
    Hey KONI, thank's for time and help (yes, I always use the same thank's comment , but I think that's correct: 'for the help' is evident, and 'for the time' because you have spent some time to read, think, search your knowledge db and post).

    So the better soution will be to implement the decoder to translate the '%hex' values and also the '+' characters as well, so it will work even if an application encodes de space as '+' and even if it does as '%20'.

    More thank's
    Niara

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. query problem
    By arian in forum C# Programming
    Replies: 1
    Last Post: 08-18-2008, 01:49 PM
  2. URL escape issue
    By George2 in forum C# Programming
    Replies: 2
    Last Post: 08-12-2008, 11:45 AM
  3. Interpreter.c
    By moussa in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:59 PM
  4. Replies: 1
    Last Post: 07-02-2007, 09:22 AM
  5. Parse a URL
    By smithx in forum C Programming
    Replies: 12
    Last Post: 08-21-2006, 03:08 PM