Thread: Winsock HTML Page Source Dump...

  1. #1
    Registered User
    Join Date
    Nov 2007
    Posts
    17

    Winsock HTML Page Source Dump...

    So, I have the basic socket where it connects to a webpage, and dumps the page source into a text document so I can check if there is new posts on a forum. But I always never get it the way it shows as if I were to view the page source through a web browser.. For example, heres a complete line:

    Right: <td class="lc"><a href="index.php?showuser=276033">LostPeon</a><img height=10 width=10 src="/images/i2.gif"><br><span class="desc">Thu, Nov 1 2007, 09:40pm</span></td>

    Wrong: <td class="lc"><a href="index.php?showuser=276033">LostPeon</a><img height=10 wi


    As you can see where it cuts out.. This can happen to more then one line, but almost every check it has cut lines, but not every line.. Is there some sort of technique these web browsers use to always display the correct page source? Anyways, thanks for any help.

  2. #2
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    code? are you calling recv more than once? because it might not return all the data all at once.

  3. #3
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    Yeah of course.. It's in a loop..

    Code:
    for (;recv(FORUM, dbuf, 512, 0) > 0;) { write << dbuf; }
    It's not just the forum page either.. It's any website where it cuts off lines... So I really need some information on either why it's doing this, or how web browsers do it..

  4. #4
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Can you post a small compilable example?

  5. #5
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    Code:
    #include <iostream>
    #include <string>
    #include <fstream>
    #include <stdio.h>
    #include <string.h>
    #include <winsock.h>
    
    using namespace std;
    
    void stopic (void);
    char* cauth (char*);
    void dSend (int D2JSP, char* request) {
         
         if (send(D2JSP, request, strlen(request), 0) == INVALID_SOCKET) {
                         cout<<"Failed to send request to D2JSP... " << WSAGetLastError() << endl;
                         shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();
                         }
                         cout<< request << endl;
    }
    
    int main()
    {  
        WSADATA WsaDat;
        SOCKET D2JSP;
        sockaddr_in D2;
        
        hostent* dHost;
        
        char *dIP, dRecv, *request, dbuf[512];
        unsigned short dline = 0;
        request = "GET /index.php?showforum=168 HTTP/1.1\r\nHost: forums.d2jsp.org\r\n\r\n";
        
        if (WSAStartup(MAKEWORD(2, 0), &WsaDat) != 0) {
                                   cout<<"WSAStartup failed to initalize...\n";
                                   }
        D2JSP = socket(AF_INET, SOCK_STREAM, 0);
        if (D2JSP == INVALID_SOCKET) {
                  cout<<"Failed to make D2JSP socket... " << WSAGetLastError() << endl;
                  WSACleanup();
                  }
        
        dHost = gethostbyname("forums.d2jsp.org");
        dIP = inet_ntoa (*(in_addr*) dHost->h_addr);
        
        D2.sin_family = AF_INET;
        D2.sin_addr.s_addr = inet_addr (dIP);
        D2.sin_port = htons (80);
        
        if (connect(D2JSP, (sockaddr*) &D2, sizeof(D2)) == INVALID_SOCKET) {
                           cout<<"Failed to connect to D2JSP socket... " << WSAGetLastError() << endl;
                           shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();
                           }
      
        dSend(D2JSP, request);
        
        dRecv = recv(D2JSP, dbuf, 512, 0);
        if (dRecv == INVALID_SOCKET) {
                        cout<<"Failed to recieve data through D2JSP... " << WSAGetLastError() << endl;
                        shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();
                        }
    
                        ofstream dparse ("d2jsp.txt");
                        for (;recv(D2JSP, dbuf, 512, 0) > 0;) {
                            if (dparse.is_open()) { dparse << dbuf; }
                            }
                            dparse.close();
                            shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();
       
        system("pause");
        return 0;
    }

  6. #6
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    AFAIK, send never returns INVALID_SOCKET. Check out MSDN:

    http://msdn2.microsoft.com/en-us/library/ms740149.aspx

  7. #7
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    Thanks but do you have any idea for the cut lines?

  8. #8
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Quote Originally Posted by http://msdn2.microsoft.com/en-us/library/ms738564.aspx
    The inet_ntoa function takes an Internet address structure specified by the in parameter and returns a NULL-terminated ASCII string that represents the address in "." (dot) notation as in "192.168.16.0", an example of an IPv4 address in dotted-decimal notation. The string returned by inet_ntoa resides in memory that is allocated by Windows Sockets. The application should not make any assumptions about the way in which the memory is allocated. The string returned is guaranteed to be valid only until the next Windows Sockets function call is made within the same thread. Therefore, the data should be copied before another Windows Sockets call is made.
    http://msdn2.microsoft.com/en-us/library/ms738564.aspx

    [edit]Why do throw away your first call to recv?[/edit]
    Code:
        dRecv = recv(D2JSP, dbuf, 512, 0);
        if (dRecv == INVALID_SOCKET) {
    Last edited by robwhit; 11-16-2007 at 11:40 PM.

  9. #9
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    I'm not sure why.. I never really thought about it, and I'm new to all of this so I just stick to the examples and thats what it shows.. I'll change it though and see what happens...

    It's just packet information is what it looks like, but it still gives me incomplete lines =(.. I just don't know or how to attempt to get the whole page source... What am I supposed to do?!? Anyways, thanks for your time and efforts.. Hopefully you or someone else can figure it out...
    Last edited by blake_; 11-16-2007 at 11:54 PM.

  10. #10
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Code:
        ofstream dparse ("d2jsp.txt");
        if (!dparse.is_open())
        { shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();}
    
        dSend(D2JSP, request);
        
        while ((dRecv = recv(D2JSP, dbuf, 512, 0)) > 0) {
            dparse << dbuf;
        }
    
        if (dRecv == SOCKET_ERROR) {
            cout<<"Failed to recieve data through D2JSP... " << WSAGetLastError() << endl; /* i before e except after c */
        }
    
        dparse.close();
        shutdown(D2JSP, 2); closesocket(D2JSP); WSACleanup();
    Something to note: whenever you encounter an error, you do cleanup functions, but you still continue on with the wrong data/etc/the thing that went wrong. You should do something about that, or it totally defeats the purpose of having error handling at all.
    Last edited by robwhit; 11-17-2007 at 12:53 AM.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Another mistake is assuming that recv will somehow append a \0 to make your string output well-defined.

    It doesn't add anything at all. You need to use the return result to tell you exactly how much data is present.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    bump

  13. #13
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Bump it again without adding something substantive to show that you've been paying attention and trying some stuff and I'll close it. Read the rules - no bumping.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  14. #14
    Registered User
    Join Date
    Nov 2007
    Posts
    17
    I tried everything you guys mention.. come on now.. why would I ask for help and not take your suggestions.. if they worked then i wouldnt be here but they don't so i need help on why i keep getting cut lines

  15. #15
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Can you post the modified code?

    If you don't understand something, ask a specific question.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. why page based I/O can improve performance?
    By George2 in forum C Programming
    Replies: 1
    Last Post: 06-12-2006, 07:42 AM
  2. Request for comments
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 15
    Last Post: 01-02-2004, 10:33 AM
  3. requesting html source from a server
    By threahdead in forum Linux Programming
    Replies: 2
    Last Post: 08-01-2003, 07:52 PM