-
All your attempts so far fail to hit the last byte of the array with a \0. In one way or another, you wrote past the end of the array.
It is an obvious bug which needs to be fixed before even beginning to discuss what else may (or may not) need to be looked at.
-
okay, it's added as you showed. now what
-
I dunno, what are you seeing in the rest of the code (observations on what is, or isn't happening), that kind of thing.
Rather than a single massive main(), consider a more layered approach where each function performs one specific task (managing the connection, managing the transfer). A bit like this.
Code:
void doTransfer( SOCKET D2JSP, char *request ) {
char dbuf[513]
int dRecv;
dSend(D2JSP, request);
ofstream dparse ("d2jsp.txt");
do {
dRecv = recv(D2JSP, dbuf, sizeof dbuf - 1, 0);
if (dRecv == SOCKET_ERROR) {
cout<<"Failed to recieve data through D2JSP... " << WSAGetLastError() << endl;
break;
}
dbuf[dRecv] = '\0';
if (!dparse.is_open()) { cout<<"Failed to open d2jsp.txt...\n"; break; }
else dparse << dbuf;
} while (dRecv > 0);
dparse.close();
}
void doConnection ( ) {
SOCKET D2JSP;
sockaddr_in D2;
hostent* dHost;
char *dIP, *request;
unsigned short dline = 0, cwrite = 0;
request = "GET /index.php?showforum=168 HTTP/1.1\r\nHost: forums.d2jsp.org\r\n\r\n";
D2JSP = socket(AF_INET, SOCK_STREAM, 0);
if (D2JSP == INVALID_SOCKET) {
cout<<"Failed to make D2JSP socket... " << WSAGetLastError() << endl;
return;
}
dHost = gethostbyname("forums.d2jsp.org");
dIP = inet_ntoa (*(in_addr*) dHost->h_addr);
cout<<"D2JSP IP: " << dIP << endl;
D2.sin_family = AF_INET;
D2.sin_addr.s_addr = inet_addr (dIP);
D2.sin_port = htons (80);
if (connect(D2JSP, (sockaddr*) &D2, sizeof(D2)) == INVALID_SOCKET) {
cout<<"Failed to connect to D2JSP socket... " << WSAGetLastError() << endl;
shutdown(D2JSP, 2);
closesocket(D2JSP);
return;
}
doTransfer( D2JSP, request );
shutdown(D2JSP, 2);
closesocket(D2JSP);
}
int main()
{
WSADATA WsaDat;
if ( WSAStartup(MAKEWORD(2, 0), &WsaDat) != 0 ) {
cout<<"WSAStartup failed to initalize...\n";
} else {
doConnection();
WSACleanup();
}
return 0;
}
-
Thanks for the info but I'll show you what usually happens when I dump the html source into a text file...
Original Line: <td><a href="index.php?showuser=387288">Villaloboos</a><br><span class="desc">Mon, Nov 19 2007, 10:04pm</span></td>
Cut Line:
Line 1: <td><a href="index.php?showuser=387288">Villaloboos</a><br><span class=
Line 2: 35e
Line 3: "desc">Mon, Nov 19 2007, 10:04pm</span></td>
On line 2, there is always some kind of random numbers with letters on line 2 before line 3. And when this happens, Line 2 and Line 3 are placed right under line 1. Have you ever seen or encountered this before?
-
I've no idea, but if it's any consolation, I see the same effect.
If I do this though
dparse << dbuf << "--\n--\n";
I can see that the random data has nothing to do with the boundaries of the buffer say (it aways seems to be in the middle somewhere).
I've seen the same page in Firefox, and there's no sign of those extra chars.
Try to trace the communications with wireshark and compare your code with a standard browser.
-
Do you know what encoding the server is sending back? If I had to guess, I'd say it's sending back chunked encoding, and that's what you're seeing. (Firefox has already decoded it by the time you hit View Source) You'll have to look it up in the HTTP RFC, and decode it. (chucked encoding has to be the #1 reason why I usually opt for libcurl when doing HTTP work.)
You should be able to verify if you're getting chucked-encoding - it'll show up in the response headers. (I think as "Content-Encoding: chunked\r\n") See this
As an aside note, I think servers can send back other encodings, like gzip compression, etc.
-
Thanks for the response Cactus_Hugger.. Yes it is in fact chunked and your response helps me understand more of my problem which I was looking for.. Thanks to the guys for the help on cleaning up my code also.. I'll have to look into this and check it out.. Thanks again.