Thread: Internet Socket Returning Extra data

  1. #1
    Registered User
    Join Date
    Jun 2008
    Posts
    7

    Internet Socket Returning Extra data

    Hi,

    I have a basic spider that goes and downloads a webpage. It works fine except it returns the complete webpage and then an extra bit.

    I enclosed recv inside of a while statement:

    Code:
    while(numbytes = recv(sockfd, buf, MAXDATASIZE - 1, 0))
           {
           printf(buf);
           }

    Is this the right way to do this?

  2. #2
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    I believe you should be checking for SOCKET_ERROR if you're on Windows and -1 if you're on *nix to signify errors. Zero is an indication of a closed connection, so that part you have correct.

    http://www.opengroup.org/onlinepubs/...ions/recv.html

    http://msdn.microsoft.com/en-us/libr...21(VS.85).aspx

  3. #3
    Registered User
    Join Date
    Apr 2007
    Location
    Sydney, Australia
    Posts
    217
    Shouldn't the while loop look like this:

    Code:
    while((numbytes = recv(sockfd, buf, MAXDATASIZE - 1, 0)))
           {
           printf(buf);
           }

  4. #4
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    It returns the page fine so that all seems to work. Been reading all weekend on Internet Sockets and I think the problem is with the charset of the web page I am accessing. Do I need to convert the charset of the page I am down loading?

    It is unix I am developing on if that helps.

  5. #5
    Registered User
    Join Date
    Apr 2007
    Location
    Sydney, Australia
    Posts
    217
    Wait, is it still printing extra stuff? If it is and you havn't tried my solution, then you should try it because i think it might be the problem. It seems to go through the loop one too many times.

  6. #6
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    39ster:

    It works using your loop also but having same problem. Perhaps it's in issue with the way printf is printing out to my console?

    it prints out the page no problem and them after the closing html it echos out another two lines... these lines are not on the original webpage.

  7. #7
    Registered User
    Join Date
    Apr 2007
    Location
    Sydney, Australia
    Posts
    217
    You havn't null terminated the string.

    Code:
    while((numbytes = recv(sockfd, buf, MAXDATASIZE - 1, 0)))
    {
           buf[numbytes] = 0;
           printf(buf);
    }

  8. #8
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    Still same problem.

  9. #9
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    you are right. The last loop prints about 20 - 50 characters of the previous loop then it exits and this is the extra data. I am thinking maybe I have to detect an EOF

  10. #10
    Registered User
    Join Date
    Jan 2008
    Posts
    45
    You have to look at the HTTP-Header, if Content-Length size or Transfer-Encoding:chunked is present and handle these correctly.

    I guess you are downloading a site that uses chunked transfer (e.g. google).

  11. #11
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    yes, encoding is chunked. Least now I can start work in the right direction.

  12. #12
    Registered User
    Join Date
    Jan 2008
    Posts
    45

  13. #13
    Registered User
    Join Date
    Jun 2008
    Posts
    7
    Looks like I will learn a lot more about http which prob isnt a bad thing.

  14. #14
    Registered User
    Join Date
    Apr 2007
    Location
    Sydney, Australia
    Posts
    217
    Code:
    while((numbytes = recv(sockfd, buf, MAXDATASIZE - 1, 0)) > 0)
    {
           buf[numbytes] = 0;
           printf(buf);
    }
    Only 0 is false. Anything else, including negatives would pass

  15. #15
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Code:
    printf(buf);
    to
    Code:
    printf("%s", buf);
    or
    Code:
    fputs(buf, stdout);
    See this.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. ListBox Extra Data Storage
    By Welder in forum Windows Programming
    Replies: 1
    Last Post: 11-01-2007, 01:46 PM
  2. Cannot read incoming data from socket
    By fnoyan in forum C++ Programming
    Replies: 6
    Last Post: 03-06-2006, 02:42 AM
  3. jumble socket data
    By stormy in forum C Programming
    Replies: 10
    Last Post: 08-23-2005, 10:07 AM
  4. Replies: 4
    Last Post: 06-14-2005, 05:45 AM
  5. can't insert data into my B-Tree class structure
    By daluu in forum C++ Programming
    Replies: 0
    Last Post: 12-05-2002, 06:03 PM

Tags for this Thread