Thread: Domain name from URL

  1. #1
    Registered User
    Join Date
    Nov 2007
    Posts
    2

    Question Domain name from URL

    Hello,

    I'm fairly new to C programming and I am looking for a method to extract the domain name from a web page url.

    EX. User provides http://www.google.com/index.html

    I want to extract google.com from this string.

    I know I can do this using regular expressions, but C doesnt seem to support Regex. Could someone give me an idea on how to handle this?

    Thanks,
    Kyle

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Get a regex library or parse it yourself,

    One possible solution:

    Code:
    char domain[64];
    
    sscanf("http://google.com/index.html", "http://%[^/]", domain);
    Or something.

  3. #3
    Registered User
    Join Date
    Nov 2007
    Posts
    2
    Thanks a ton, this will work perfect. Could you explain the format string for me"http://%[^/]" ? I looked at the man pages for sscanf but I think your format is abit more advanced. This is a very powerful tool and I would love to be able to utilize this.

    Thanks again,
    Kyle

  4. #4
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    what that specifes is this

    Code:
    http://google.com/index.html
    http:// ==> Match the literal values with the orginal string, but dont store them in domain string
    [^/] ==> Read eveything but not '/' char and store the read value on to domain string
    So the parser read the http:// and matched and excluded them. And it keep on reading until '/' hits. And the condition breaks and the parser quits.


    ssharish

  5. #5
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    OK, I know that this is a very late "answer", but in my case it is not working.
    I have the following data:
    Code:
    char *srv;
    char *prxHostname;
    
    srv=getenv("http_proxy");
    sscanf(strcat(srv, "/"), "http://%[^/]", prxHostname);
    printf("Hostname: %s\n", prxHostname);
    The http_proxy variable is: http://192.168.0.10:3128
    Now, whatever I do, the prxHostname is always NULL!
    Any ideas why?

  6. #6
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    There are few issues with your code. When you concatenate the string with '/', have you made sure that serv string has enough space? And your trying to store the fetched value in prxHostname. Have you allocated enough memory before doing that. Its just a pointer but not a stirng to hold the fetchd value does it?

    ssharish

    EDIT: You need to open a new thread for these kind of issues, instead of reopening a thread which 3 years old!!
    Last edited by ssharish2005; 11-17-2010 at 07:57 AM.
    Life is like riding a bicycle. To keep your balance you must keep moving - Einstein

  7. #7
    Registered User
    Join Date
    Nov 2010
    Posts
    21
    Quote Originally Posted by ssharish2005 View Post
    There are few issues with your code. When you concatenate the string with '/', have you made sure that serv string has enough space? And your trying to store the fetched value in prxHostname. Have you allocated enough memory before doing that. Its just a pointer but not a stirng to hold the fetchd value does it?

    ssharish

    EDIT: You need to open a new thread for these kind of issues, instead of reopening a thread which 3 years old!!
    OK, I will open a new thread then

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. URL escape issue
    By George2 in forum C# Programming
    Replies: 2
    Last Post: 08-12-2008, 11:45 AM
  2. how to get domain part from URL
    By George2 in forum C# Programming
    Replies: 2
    Last Post: 07-23-2008, 12:06 PM
  3. Interpreter.c
    By moussa in forum C Programming
    Replies: 4
    Last Post: 05-28-2008, 05:59 PM
  4. Domain Resolution :: Winsock
    By kuphryn in forum Windows Programming
    Replies: 5
    Last Post: 08-01-2002, 03:34 PM
  5. MSN Vital Information
    By iain in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 09-22-2001, 08:55 PM