Thread: A C Function clone of PHP's file get contents()

  1. #1
    Registered User
    Join Date
    Nov 2008
    Posts
    3

    A C Function clone of PHP's file get contents()

    I've developing PHP for a very long time now and have only messed with C on occasion. I have therefore got far too used to simple functions such as file_get_contents() to retrieve the HTML / Data stored on an external http server.

    I am looking for a simple method ie. preferable a function in C which does the same thing as the PHP function without using a library such as libcurl which needs compiling etc. I'm using gcc to compile and have only half managed to write one which involves -lsocket and -lnsl

    Thank you all

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Ok, so you don't want so use ANY library. Then I suggest you stick to php, as that is the only way that you don't need to worry about the libraries. Of coruse, php will ALSO use socket libraries, since there is no other (realistic [1]) way to deal with getting stuff from a web-page on a different machine to your system. Whether you also need further libraries or not depends on how much code you want to write.

    My suggestion, if you want to achieve this in C would be to use libcurl. This of course assumes that you actually have a working libcurl for the system you are trying to do this on. If you don't then you are stuck with [1] below.

    [1] Of course, what is done in the socket libraries isn't magical - the socket library contains C code, and if one person can come up with a socket library, someone else can write a tekcos library that does exactly the same thing but with different code. In doing so, you'll basically walk the wrong way around the block to get next door by doing so [actually, probably more like walking a full circle around a mid-sized town to get to next doors].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Nov 2008
    Posts
    3
    Thanks for your response matsp, it's not a case that I don't want to use libraries, rather that my uni provides a shell which is restricted and I'm failing miserably to get libcurl compiled / installed on there.

    I obviously can put such a file such as curl.h into the directory but it doesn't function without compilation

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And your university wants you to do this? If so, I expect you'd be able to talk to one of the sysadmins to install the relevant libraries on your system.

    If you are not supposed to do this, then I can't expect that you'd get much sympathy from the sysadmins.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Registered User
    Join Date
    Nov 2008
    Posts
    3
    Not exactly supposed to I just wanted to add a feature to download a data file within my software rather than me manually going getting a data file and SCPing it across

    As I think I said, I half got something working which I found on coding.debuntu.org/system/files/htmlget.c, problem is that this will retrieve only the root ie. www.google.com not www.google.com/this_directory/this_file.dat

    If you (or anyone) could help me get this working in the later case then a function with these libraries could be used

    Code:
    #include <stdio.h>
    #include <sys/socket.h>
    #include <arpa/inet.h>
    #include <stdlib.h>
    #include <netdb.h>
    #include <string.h>
    int create_tcp_socket();
    char *get_ip(char *host);
    char *build_get_query(char *host, char *page);
    void usage();
    
    #define HOST "coding.debuntu.org"
    #define PAGE "/"
    #define PORT 80
    #define USERAGENT "HTMLGET 1.0"
    
    int main(int argc, char **argv)
    {
      struct sockaddr_in *remote;
      int sock;
      int tmpres;
      char *ip;
      char *get;
      char buf[BUFSIZ+1];
      char *host;
      char *page;
    
      if(argc == 1){
        usage();
        exit(2);
      }  
      host = argv[1];
      if(argc > 2){
        page = argv[2];
      }else{
        page = PAGE;
      }
      sock = create_tcp_socket();
      ip = get_ip(host);
      fprintf(stderr, "IP is %s\n", ip); 
      remote = (struct sockaddr_in *)malloc(sizeof(struct sockaddr_in *));
      remote->sin_family = AF_INET;
      tmpres = inet_pton(AF_INET, ip, (void *)(&(remote->sin_addr.s_addr)));
      if( tmpres < 0)  
      {
        perror("Can't set remote->sin_addr.s_addr");
        exit(1);
      }else if(tmpres == 0)
      {
        fprintf(stderr, "%s is not a valid IP address\n", ip);
        exit(1);
      }
      remote->sin_port = htons(PORT);
    
      if(connect(sock, (struct sockaddr *)remote, sizeof(struct sockaddr)) < 0){
        perror("Could not connect");
        exit(1);
      }
      get = build_get_query(host, page);
      fprintf(stderr, "Query is:\n<<START>>\n%s<<END>>\n", get);
      
      //Send the query to the server
      int sent = 0;
      while(sent < strlen(get))
      { 
        tmpres = send(sock, get+sent, strlen(get)-sent, 0);
        if(tmpres == -1){
          perror("Can't send query");
          exit(1);
        }
        sent += tmpres;
      }
      //now it is time to receive the page
      memset(buf, 0, sizeof(buf));
      int htmlstart = 0;
      char * htmlcontent;
      while((tmpres = recv(sock, buf, BUFSIZ, 0)) > 0){
        if(htmlstart == 0)
        {
          /* Under certain conditions this will not work.
          * If the \r\n\r\n part is splitted into two messages
          * it will fail to detect the beginning of HTML content
          */
          htmlcontent = strstr(buf, "\r\n\r\n");
          if(htmlcontent != NULL){
            htmlstart = 1;
            htmlcontent += 4;
          }
        }else{
          htmlcontent = buf;
        }
        if(htmlstart){
          fprintf(stdout, htmlcontent);
        }
     
        memset(buf, 0, tmpres);
      }
      if(tmpres < 0)
      {
        perror("Error receiving data");
      }
      free(get);
      free(remote);
      free(ip);
      close(sock);
      return 0;
    }
    
    void usage()
    {
      fprintf(stderr, "USAGE: htmlget host [page]\n\
    \thost: the website hostname. ex: coding.debuntu.org\n\
    \tpage: the page to retrieve. ex: index.html, default: /\n");
    }
    
    
    int create_tcp_socket()
    {
      int sock;
      if((sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0){
        perror("Can't create TCP socket");
        exit(1);
      }
      return sock;
    }
    
    
    char *get_ip(char *host)
    {
      struct hostent *hent;
      int iplen = 15; //XXX.XXX.XXX.XXX
      char *ip = (char *)malloc(iplen+1);
      memset(ip, 0, iplen+1);
      if((hent = gethostbyname(host)) == NULL)
      {
        herror("Can't get IP");
        exit(1);
      }
      if(inet_ntop(AF_INET, (void *)hent->h_addr_list[0], ip, iplen) == NULL)
      {
        perror("Can't resolve host");
        exit(1);
      }
      return ip;
    }
    
    char *build_get_query(char *host, char *page)
    {
      char *query;
      char *getpage = page;
      char *tpl = "GET /%s HTTP/1.0\r\nHost: %s\r\nUser-Agent: %s\r\n\r\n";
      if(getpage[0] == '/'){
        getpage = getpage + 1;
        fprintf(stderr,"Removing leading \"/\", converting %s to %s\n", page, getpage);
      }
      // -5 is to consider the %s %s %s in tpl and the ending \0
      query = (char *)malloc(strlen(host)+strlen(getpage)+strlen(USERAGENT)+strlen(tpl)-5);
      sprintf(query, tpl, getpage, host, USERAGENT);
      return query;
    }

  6. #6
    Registered User
    Join Date
    May 2009
    Posts
    1
    I get a crash with your code when printing the downloaded html if it contains "%" characters. The fix is to change:
    Code:
        if(htmlstart){
          fprintf(stdout, htmlcontent);
        }
    to:
    Code:
        if(htmlstart){
          fprintf(stdout, "%s", htmlcontent);
        }
    -- Geoff

  7. #7
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Plus that code is terrible, given headers often exceed BUFSIZ bytes.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Getting an error with OpenGL: collect2: ld returned 1 exit status
    By Lorgon Jortle in forum C++ Programming
    Replies: 6
    Last Post: 05-08-2009, 08:18 PM
  2. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  3. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  4. c++ linking problem for x11
    By kron in forum Linux Programming
    Replies: 1
    Last Post: 11-19-2004, 10:18 AM
  5. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM