Thread: Simple URL spliter and sscanf questions

  1. #1
    Registered User
    Join Date
    Nov 2010
    Posts
    21

    Unhappy Simple URL spliter and sscanf questions

    Dear all,
    I need to parse the http_proxy environmental variable in a linux C program.
    Of course I do know how to retrieve the variable, but I am not sure that I correctly split the variable into parts. What I am interested at is the domain (or IP) part of the variable and the port.
    So, my code (simplified of course) is:
    Code:
    unsigned int prxPort;
    char *prxProtocol, *prxUsername, *prxPass, *prxHostname, *envProxySrv, *envSproxySrv;
    
    envProxySrv=getenv("http_proxy");
    parsePrxSrv(envProxySrv);
    
    void parsePrxSrv(char *srv){
    if(strstr(srv, "http")){
            prxProtocol="http";
            sscanf(srv, "http://%[^:]:%99d", prxHostname, &prxPort);
            printf("Hostname: %s\n", prxHostname);
            printf("Port: %d\n", prxPort);
        }
    }
    In theory, that code should return the port and the hostname from the given URL. However, it prints:null, null:
    Code:
    Hostname: (null)
    Port: 0
    OK, hostname is: (null)Port: 0
    Now, my questions:
    1.
    According to the sscanf(3) man page, the prxHostname should be given by reference, so the "correct" code should be:
    Code:
    sscanf(srv, "http://%[^:]:%99d", &prxHostname, &prxPort);
    But, when I execute the program (and the compiler is not complained neither with warnings nor errors) I have a "segmentation fault" error when I reach he sscanf line. I believed that the reason for that is that I have not correctly initialize it, isn't it?
    So, I made those initializations:
    Code:
    unsigned int prxPort;
    char *prxProtocol, *prxUsername, *prxPass, *prxHostname, *envProxySrv, *envSproxySrv;
    
    envProxySrv=getenv("http_proxy");
    prxUsername="";
    prxHostname="";
    prxPass="";
    prxProtocol="";
    prxPort=8080;
    
    parsePrxSrv(envProxySrv);
    
    void parsePrxSrv(char *srv){
    if(strstr(srv, "http")){
            prxProtocol="http";
            sscanf(srv, "http://%[^:]:%99d", prxHostname, &prxPort);
            printf("Hostname: %s\n", prxHostname);
            printf("Port: %d\n", prxPort);
        }
    }
    Now, I had the segmentation fault either with the &prxHostname, or not. So, how do I correctly initialize it, it that is the problem of no parsing?

    2. Why on earth isn't that code work?

    PS I run the program in debug mode. Now, I set a watch on the envProxySrv variable and I see that the value is not the "pure" text that I would expect (http://192.168.0.1:3128), but the following:
    0x7fffffffe6d3 "http://192.168.0.1:3128"

    That is that hex at the start? Is this the reason why the parser is not working?
    Last edited by tpe; 11-17-2010 at 02:09 PM.

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    But, when I execute the program (and the compiler is not complained neither with warnings nor errors) I have a "segmentation fault" error when I reach he sscanf line. I believed that the reason for that is that I have not correctly initialize it, isn't it?
    Nope. The seg fault is because you declared char *prxHostname instead of char prxHostname[256]. You then point the char * at a string literal, "". The compiler locates string literals in a read-only memory segment, so prxHostname contains a read-only memory address, and that address is where sscanf tries to write the data it scanned. Trying to write to a read-only segment is a segmentation violation.

    Now, I had the segmentation fault either with the &prxHostname, or not.
    Passing &prxHostname is passing the address of a char *, the address of prxHostname, where sscanf can start writing the string it matched. prxHostname is only a few bytes (sizeof(char *)), so there isn't much room to put a whole host name, and it overflows into whatever is around it. Who knows what else the compiler put around there. Presumably your other prx variables, and maybe some string literals. You could generate the assembly code and check that out if you're really curious. Anyhow, again, when sscanf tries to write to that address, it again crosses some segment boundary.

    2. Why on earth isn't that code work?
    Try fixing the declaration of anything strings you're going to sscanf into, and see if that gets you any close.

    That is that hex at the start? Is this the reason why the parser is not working?
    I assume you mean "What is that hex...". It's looks like your debugger telling you either the address or contents of prxHostname. Depending on your debugger and how you specified your watch command, I can't say for sure, but it's not the cause of your problem.
    Last edited by anduril462; 11-17-2010 at 02:14 PM.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Because you
    a) never check the return result of sscanf()
    b) never allocate any space for your strings.

    Simply declaring a char* and hoping for the best doesn't cut it. You have to do more than simply declare variables of a compatible type.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    Guys, thank you for your answer. I was suspecting that. But, I am out of C programming for years and PHP or other scripting language are very easy on that matter.
    So, the issue was indeed the sizes. I added the following local variable in the function:
    Code:
    	char pHost[256]="";
    and the sscanf, now is:
    Code:
    	        sscanf(srv, "http://%[^:]:%99d", pHost, &prxPort);
    Of course, the question now is how to copy that to the global variable (prxHostname)? With strncpy? or strcpy? or ??? I am asking because I want to do it correctly.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Why make it global?
    Why not just make some additional parameters to your split function, for where you want the answer stored.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    I am not sure that I understand your second question...
    About the global variable, the issue is that I don't like from the designs point of view, to mess up with the rest of the code. In reallity, the program is not mine, I just modify it to run in our new environment which requires to use proxy servers. So, I need to have a global variable (or a typedef, I don't know for sure right now which option is the most elegant and correct...) to store that data.
    Anyway, I have found a way to copy that, so, I only need to tell me if it is:
    a: safe
    b. the correct way to do it.

    Code:
            sscanf(srv, "http://%[^:]:%99d", pHost, &prxPort);
    	prxHostname=malloc(sizeof pHost+1);
    	strncpy(prxHostname, pHost, sizeof pHost);
    One more thing:
    The fact that the pHost is char[256] bothers me. Should I use a char[] instead? Or from security point of view is the same? I want to avoid buffer overflows as you understand...

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well you already do this
    parsePrxSrv(envProxySrv);


    So why not
    char host[256];
    parsePrxSrv(envProxySrv,host);
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    Yes, that would be an option. But in that case, I should assume that the maximum length could be only 255 characters. OK, I know RFC etc, but since we are talking about user input, I would like to keep it as safe as possible. (not to mention of course that I am not sure that passing a variable by argument would do the trick for my case)...
    Anyway, I am open to suggestion Still, I have to learn how to safely add the parsed string in the 255 characters char array... Because, a malicious user could easily add a very large http_proxy variable and crash the program... How do I avoid that?

  9. #9
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    What function are you using to get your input. It better not be gets...

  10. #10
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    envProxySrv=getenv("http_proxy");

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    You write your function the same way that fgets() is specified.

    As well as the output array pointer, you also pass the maximum length.
    void parsePrxSrv(const char *srv, char *host, size_t hostlen);

    Then you do something about your scanf call, which doesn't limit your hostname at all at the moment.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    I know. That's my problem, the sscaf

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well since the bulk of the string will be the extracted host name, you can crudely do
    if ( strlen(srv) > hostlen ) // bail out

    Once past that, you can safely sscanf any hostname.

    But string overflow isn't your only problem.
    sscanf can't detect numeric overflow either.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    Numeric overflow?

  15. #15
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    A regular int on a 32-bit machine is 32 bits, meaning it can have values from − 2,147,483,648 to 2,147,483,647. What happens if I prompt for a number, and somebody enters 987654321987654321? That number is too big to store in an int, thus, I get a numeric overflow.

Popular pages Recent additions subscribe to a feed

Tags for this Thread