Thread: Parse a URL

  1. #1
    Registered User
    Join Date
    Aug 2006
    Posts
    10

    Parse a URL

    Hello, i'm trying to parse a URL from a string. The string is a HTTP header which contains a URL, and i want to copy that URL into another variable to use it alone. So, i want to know if there is any function or whatever to do this.

    PD: i use Dev-C++

    Thanx, and sorry about my english

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Look at strncpy().
    If you understand what you're doing, you're not learning anything.

  3. #3
    Registered User
    Join Date
    Aug 2006
    Posts
    10
    Thanx for the answer itsme86, but, how can i use strncpy in this case?. Because the URL isn't always the same length, and it's also not at the beginning of the string, generally (really allways) "GET" is before the URL.

    Thanks for your time.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Use strchr(), strstr() to locate the ends of the URL.
    The difference between two pointers will give you the length of the URL.
    Use strncpy() to copy that much.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User OnionKnight's Avatar
    Join Date
    Jan 2005
    Posts
    555
    First you'll need to find out where the URL starts, then where it ends. After that you copy that section into some buffer and return it.
    An URL starts with http:// and ends with whitespace. This should get you going, the functions that are mentioned above can be of help.

    edit: Since it's an HTTP header I guess it's probably not an absolute URL, look for a '/' after you find the "GET" to find the beginning of the URL.
    Last edited by OnionKnight; 08-15-2006 at 01:54 PM.

  6. #6
    Registered User
    Join Date
    Aug 2006
    Posts
    10
    Thanks to all your answers. I did this code, but it's not working at all. First of all, the URL is displayed two times, when i just use printf one time, and second, some times a rare symbol is displayed at the end of the URL.

    Here is the code:

    Code:
    if ((begin = strstr(headers, "http")) != NULL)
       ;
    else if ((begin = strchr(headers, '/')) != NULL)
       ;
    else {
       printf("URL not found\n");
       return 1;
     }
    end = strchr(begin, ' ');
    length = strlen(begin) - strlen(end);
    strncpy(url, begin, length);
    printf("%s\n", url);
    Does anybody know where is the problem?

    Thanx again.

  7. #7
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    How about posting a simple, complete and compileable example including an input text string?

    [edit]Something like this I find easy to work with.
    Code:
    #include <stdio.h>
    
    int main(void)
    {
       const char text[] = "http://cboard.cprogramming.com/online.php";
       char url[512];
       if ( sscanf(text, "http://%511[^/\n]", url) == 1 )
       {
          printf("url = \"%s\"\n", url);
       }
       return 0;
    }
    
    /* my output
    url = "cboard.cprogramming.com"
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  8. #8
    Registered User
    Join Date
    Aug 2006
    Posts
    10
    Dave_Sinkula, the code that you posted is very good, i used it on my program, i just had to use a pointer to point where the http begin, but the problem is, your code just print the host, but i need all the URL (cboard.cprogramming.com/online.php).

    Well, here is my code to compile:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    main()
    {
       char *begin, *end, url[200];
       int length;
       char *headers = "GET http://www.google.com/intl/en/about.html HTTP/1.1\nHost: www.google.com\nAccept: text/html\nAccept: video/mpg\nAccept: image/jpg\nUser-Agent: Mozilla/5.0\n";
    
       if ((begin = strstr(headers, "http")) != NULL)
          ;
       else if ((begin = strchr(headers, '/')) != NULL)
          ;
       else {
          printf("URL not found\n");
          return 1;
       }
       end = strchr(begin, ' ');
       length = strlen(begin) - strlen(end);
       strncpy(url, begin, length);
       printf("%s\n", url);
       
       return 0;
    }
    /* my output is: http://www.google.com/intl/en/about.htmlç~ */
    Thanx for your help.

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    #define TOFIND "GET "
    char *b = NULL, *e = NULL;
    ...
    if( (b = strstr( buffer, TOFIND )) )
    {
        b += strlen( TOFIND );
        if( (e = strrchr( b, ' ' )) )
            *e = '\0';
    }
    Something like that?

    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by smithx
    Dave_Sinkula, [...] but the problem is, your code just print the host, but i need all the URL (cboard.cprogramming.com/online.php).
    Code:
    #include <stdio.h>
    
    int main(void)
    {
       const char text[] = 
       "GET http://www.google.com/intl/en/about.html HTTP/1.1\n"
       "Host: www.google.com\n"
       "Accept: text/html\n"
       "Accept: video/mpg\n"
       "Accept: image/jpg\n"
       "User-Agent: Mozilla/5.0\n";
       char url[512];
       puts(text);
       if ( sscanf(text, "GET http://%511s", url) == 1 )
       {
          printf("url = \"%s\"\n", url);
       }
       return 0;
    }
    
    /* my output
    GET http://www.google.com/intl/en/about.html HTTP/1.1
    Host: www.google.com
    Accept: text/html
    Accept: video/mpg
    Accept: image/jpg
    User-Agent: Mozilla/5.0
    
    url = "www.google.com/intl/en/about.html"
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  11. #11
    and Nothing Else Matters
    Join Date
    Jul 2006
    Location
    Philippines
    Posts
    117
    i've noticed that most problems passed to this boards can be solved by sscanf... can any point to me where i can read a comprehensive discussion about sscanf?? thanks

  12. #12
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by sangken
    can any point to me where i can read a comprehensive discussion about sscanf??
    Perhaps the man/help pages and your own creativity. It's not a real regex, but a number of things tickle my fancy.

    Try something wacky and post questions/curiosities -- in a separate thread.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  13. #13
    Registered User
    Join Date
    Aug 2006
    Posts
    10
    Sorry, i forgot to answer. I'm using the Dave_Sinkula's code, which works perfectly.

    Thanx to all.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Interpreter.c
    By moussa in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:59 PM
  2. Please Help - Problem with Compilers
    By toonlover in forum C++ Programming
    Replies: 5
    Last Post: 07-23-2005, 10:03 AM
  3. string ?
    By cogeek in forum C Programming
    Replies: 27
    Last Post: 12-05-2004, 10:45 PM
  4. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM