Thread: How to convert string in url encoding to html encoding?

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    4

    How to convert string in url encoding to html encoding?

    Hello, I am using a testing tool that uses C code and it has a function which will convert from HTML to URL or PLAIN but not URL to HTML. Does anyone have any code which would convert the following

    Code:
    "3%2635%3Dsaptest15001%2C8P%26S5%2C88%2663%3DsecEnterprise%2C8P%264F%3D462265%2C8P%264E%3D642037JfUyOifEckq6pX62%2C8P%26pa%2C8P%26Tm%3D3650%2C83%261%2C8P%262r%3Dcgar-bobjqapp2%3A6400%2C8P%26Tn%3D{3%26.2%3D{3%26O%3DPersonalCategory%2C0P%262%3D462327%2C03}%2C2z%26.1%3D{3%26O%3DFavoritesFolder%2C0P%262%3D462326%2C03}%2C2z%26.3%3D{3%26O%3DInbox%2C0P%262%3D462328%2C03}%2C2z%26U%3D3%2C03}%2C%3Fz%263k%3D%40BOBJ_QA%2C8P%265U%3D642038JhpJ4sSqGXzP9rYn642037JfUyOifEckq6pX62%2C8P"
    
    to
    
    "3&35=saptest15001,8P&S5,88&63=secEnterprise,8P&4F=462265,8P&4E=642037JfUyOifEckq6pX62,8P&pa,8P&Tm=3650,83&1,8P&2r=cgar-bobjqapp2:6400,8P&Tn={3&.2={3&O=PersonalCategory,0P&2=462327,03},2z&.1={3&O=FavoritesFolder,0P&2=462326,03},2z&.3={3&O=Inbox,0P&2=462328,03},2z&U=3,03},?z&3k=@BOBJ_QA,8P&5U=642038JhpJ4sSqGXzP9rYn642037JfUyOifEckq6pX62,8P"

    I've run out of ideas on how to accomplish this task. I would like to be able to reuse the code in the future as I am sure I will see this need again.

  2. #2
    Registered User
    Join Date
    Aug 2010
    Posts
    231
    Its very easy with string functions like:
    Code:
    void transformed(char *s)
    {
      char *ent[]={"&","<",">","""," "},
      *pre="&<>\" ",
      c[2]={0};
      int x;
      while( *s )
        if( sscanf(s,"%%%02x",&x) )
          s+=3,printf("%s",strchr(pre,x)?ent[strchr(pre,x)-pre]:(*c=x,c));
        else
          putchar(*s++);
    }
    
    transformed("3%2635%3Dsaptest15001%2C8P%26S5%2C88%2663%3DsecEnterprise%2C8P...");
    To change the code to put in a string and not to stdout is your work.

  3. #3
    Registered User
    Join Date
    Nov 2011
    Posts
    4
    First, thank you for the reply. I am always willing to learn but I am not understanding the logic. To convert 3%2635%3Dsaptest15001%2C8P%26S5%2C88%2663%3DsecEnt erprise%2C8P to 3&amp;35=saptest15001,8P&amp;S5,88&amp;63=secEnter prise,8P how do you do that? It appears your code is going the other way as "transformed" is what I am converting from. Thx.

  4. #4
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Nb. There should be no space between & and # in the entities below!
    But vBulletin does not escape them even inside code tags...


    HTML will also accept decimal codes, eg, &amp; is equivalent to & #38; The number here corresponds to the ASCII value of the character.

    URL encodings also correspond to ASCII values, except they are in hexidecimal. So:

    '&' = ASCII decimal 38, hexadecimal 26.
    & #38 is an HTML entity for &.
    %26 is the URL encoding for &.

    If you apply that, it's easy to come up with functions that translate the entire range of restricted characters Eg, here's one I've used for turning URLS to plain text:

    Code:
    int URIdecode (char *str, char *copy) {
            int len = strlen(str), i, j = 0;
            char hex[3] = {0};
    
            for (i = 0; i < len; i++) {
                    if (str[i] == '%' && i < len-2) {
                            i++;
                            strncpy(hex, &str[i++], 2);
                            copy[j] = strtol(hex, NULL, 16);
                    } else if (str[i] == '+') copy[j] = ' ';
                    else copy[j] = str[i];
                    j++;
            }
            copy[j] = '\0';
    
            return j;
    }
    Tweak that a bit and it will put out (numerical) HTML entities instead.
    Last edited by MK27; 11-05-2011 at 03:51 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  5. #5
    Registered User
    Join Date
    Nov 2011
    Posts
    4
    This is code I was working on trying to get S_Cookie1 to convert to S_CORRECT. I was going through each URL encoded and trying to convert to HTML but was overwriting itself and the code was choking the compiler.

    Code:
    lr_save_string("3%2635%3Dsaptest15001%2C8P%26S5%2C88%2663%3DsecEnterprise%2C8P%264F%3D462265%2C8P%264E%3D642037JfUyOifEckq6pX62%2C8P%26pa%2C8P%26Tm%3D3650%2C83%261%2C8P%262r%3Dcgar-bobjqapp2%3A6400%2C8P%26Tn%3D%7B3%26.2%3D%7B3%26O%3DPersonalCategory%2C0P%262%3D462327%2C03%7D%2C2z%26.1%3D%7B3%26O%3DFavoritesFolder%2C0P%262%3D462326%2C03%7D%2C2z%26.3%3D%7B3%26O%3DInbox%2C0P%262%3D462328%2C03%7D%2C2z%26U%3D3%2C03%7D%2C%3Fz%263k%3D%40BOBJ_QA%2C8P%265U%3D642038JhpJ4sSqGXzP9rYn642037JfUyOifEckq6pX62%2C8P","S_Cookie1");
     lr_save_string("3&amp;35=saptest15001,8P&amp;S5,88&amp;63=secEnterprise,8P&amp;4F=462265,8P&amp;4E=642037JfUyOifEckq6pX62,8P&amp;pa,8P&amp;Tm=3650,83&amp;1,8P&amp;2r=cgar-bobjqapp2:6400,8P&amp;Tn={3&amp;.2={3&amp;O=PersonalCategory,0P&amp;2=462327,03},2z&amp;.1={3&amp;O=FavoritesFolder,0P&amp;2=462326,03},2z&amp;.3={3&amp;O=Inbox,0P&amp;2=462328,03},2z&amp;U=3,03},?z&amp;3k=@BOBJ_QA,8P&amp;5U=642038JhpJ4sSqGXzP9rYn642037JfUyOifEckq6pX62,8P","S_CORRECT");
     
     sprintf(c_workSpaceBefore,"%s",lr_eval_string("{S_Cookie1}"));
     f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%26","&amp;"); /* replaces %26 with &amp;  */
     //strrepl(c_workSpaceAfter,c_workSpaceBefore,"%26","&amp;");
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
    
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%2C",","); /* replaces %2C with ,  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"%2C",",");
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
    
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%3D","="); /* replaces %3D with =  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"%3D","=");
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%40","@"); /* replaces %40 with @  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"+","%2b");
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%3A",":"); /* replaces %3A with :  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"+","%2b");
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%7B","{"); /* replaces %7B with {  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"+","%2b");
     sprintf(c_workSpaceBefore,"%s", c_workSpaceAfter); // avoid overwrite workspaceafter
     lr_error_message("c_workSpaceAfter %s", c_workSpaceAfter);
     lr_error_message("c_workSpaceBefore %s", c_workSpaceBefore);
     //f_STR_replaceSubString(c_workSpaceAfter,c_workSpaceBefore,"%7D","}"); /* replaces %7D with }  */
     strrepl(c_workSpaceAfter,c_workSpaceBefore,"+","%2b");
     lr_save_string(c_workSpaceAfter,"S_Cookie1_html");
     lr_error_message("S_Cookie1_html %s", lr_eval_string("{S_Cookie1_html}"));
     lr_error_message("S_CORRECT %s", lr_eval_string("{S_CORRECT}"));

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Jerel2k11 View Post
    This is code I was working on trying to get S_Cookie1 to convert to S_CORRECT. I was going through each URL encoded and trying to convert to HTML but was overwriting itself and the code was choking the compiler.
    That is a very awkward way to do it IMO.

    I had to re-edit my last post because of a vBulletin glitch with numerical entities, so maybe you want to re-read that now it is fixed. The point is, it is easiest to work with ASCII values in encoding URLS and HTML, rather than treating every single possibility as a special case.

    Of course, the fact that HTML also uses labels for some entities makes decoding more awkward (you need a few special cases), but that is not an issue with URL-encoding.
    Last edited by MK27; 11-05-2011 at 04:04 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  7. #7
    Registered User
    Join Date
    Nov 2011
    Posts
    4
    MK27 I understand your comments however my concern is the conversion that I am after needs to be passed back to the server in the S_CORRECT format. It gets passed back in a SOAP envelope. I also agree with you that the method we were trying to get working is awkward but we were grasping at straws trying to come up with something. Can't remember if I mentioned this but our test tool has a function that will convert from HTML to URL or HTML to PLAIN but not the other way around. This code would get used in a test script so I would have to account for any delay in this process of conversion.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Encoding
    By /dev/bag in forum Linux Programming
    Replies: 1
    Last Post: 06-05-2011, 02:08 PM
  2. html encoding issue
    By Checker1977 in forum Tech Board
    Replies: 8
    Last Post: 12-18-2008, 05:18 PM
  3. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  4. Replies: 5
    Last Post: 09-10-2004, 07:34 PM
  5. Encoding
    By gvector1 in forum C# Programming
    Replies: 0
    Last Post: 06-20-2003, 10:17 AM