![]() |
| | #1 |
| Registered User Join Date: Dec 2008
Posts: 99
| winsock splitting msg In my application, I request a file from an HTTP server in order to parse a certain type of information. The problem is, since I have a char[] buffer of 512 indices, my application sometimes receives only a part of the string I need to parse, while I need the WHOLE string. The worst thing is that the server does not always split the string that I need to parse in the same place. Sometimes it splits it in half, sometimes a quarter, etc. This disables me from predicting the splitting of the string. My current solution is very, very, very nasty. I have a char[] buffer of 80k indices, that way, the server won't have the need to split the body into parts and I can receive the whole string I need to parse. This makes my application very slow and a HUGE memory occupant. Any solutions that cross your mind? Thank you, abraham2119 |
| abraham2119 is offline | |
| | #2 |
| Registered User Join Date: Apr 2008
Posts: 299
| You can choose how much you want to receive every time. Just change the argument of recv() where you specify how many bytes you want to receive. |
| carrotcake1029 is offline | |
| | #3 |
| and the hat of vanishing Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,214
| > The worst thing is that the server does not always split the string that I need to parse in the same place. It's nothing to do with the server, it's all about the nature of a TCP/IP connection. It is a stream protocol. Messages can be fragmented on transmission as well as reception. And it's your job to deal with that at both ends (depending on whether you're the transmitter and/or receiver).
__________________ If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut. Up to 8Mb PlusNet broadband from only £5.99 a month! |
| Salem is offline | |
| | #4 | |
| Registered User Join Date: Dec 2008
Posts: 99
| Quote:
However, since I don't know how much I need to receive in order for them not to split, -because I want my application to have the ability to function with various HTTP servers, not just the same one- receiving 80k bytes was the only solution in my eyes. Any other solutions? EDIT: Salem, I am aware of that and that is why I came here; in order to receive help to find a correct design to my application. | |
| abraham2119 is offline | |
| | #5 |
| and the hat of vanishing Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,214
| What's special about 80K? You have 2 buffers, large enough to contain a single record (say a line) of the information you want to parse. One buffer holds a complete line. The other buffer holds a fragment of a line. When the record boundary is found, the first part of the buffer becomes a whole line buffer, and the tail end of it becomes a fragment for the next line.
__________________ If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut. Up to 8Mb PlusNet broadband from only £5.99 a month! |
| Salem is offline | |
| | #6 |
| Registered User Join Date: Sep 2004 Location: California
Posts: 2,845
| The correct way is to call recv() in a loop until you have received all of the data. The protocol you are using (in this case it appears to be HTTP) should tell you how much data you need to receive, so you just keep calling recv() until you have gotten that amount. In the case of HTTP, the data size is usually stored in the "Content-Length" header. This is not always the case though, because if the server is used chunked encoding, there will be no Content-Length header, and instead you need to decode the chunked encoding. Also keep in mind that the Content-Length does not include the HTTP header in the size, so you have to parse that out first (The header should always end with a double CRLF). None of this is trivial, and that's why people usually will use a library like libcurl to do HTTP transactions. |
| bithub is offline | |
| | #7 | |
| Registered User Join Date: Dec 2008
Posts: 99
| Quote:
This is a good solution IF I knew when the string would split, which I don't. Imagine this was the line I am looking to parse: Code: <Test>Test</Test> Because sometimes, the line IS sent together, and sometimes it is not. Note: There are more than one of the lines I need to parse that are sent from the server. Meaning, I have to parse more than one line separately. EDIT: bithub, the server sends the data using a chunked encoding. The total size of the data to be received is 80k bytes. That is why I had a buffer which could hold 80k bytes. Although, this is not essentially the way I wanted to do this. The application runs much faster when receiving 512 (for example) bytes at a time than 80k~. This is because I do a lot of string manipulation with the data received and manipulating an 80k~ length string is slower than that of a 512 length string. Last edited by abraham2119; 06-09-2009 at 10:56 AM. | |
| abraham2119 is offline | |
| | #8 |
| and the hat of vanishing Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,214
| So what's the difference between Code: <Test>Test</Test> <Test> Test</Test> <Test> Test </Test>
__________________ If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut. Up to 8Mb PlusNet broadband from only £5.99 a month! |
| Salem is offline | |
| | #9 |
| Registered User Join Date: Dec 2008
Posts: 99
| Ugh, that is not the point. I need to PARSE something from the HTML. In this case, I'd have to parse whatever is between the <Test> tags. My whole problem is getting the WHOLE string.. |
| abraham2119 is offline | |
| | #10 |
| and the hat of vanishing Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,214
| What about a buffer-less approach, and use a state machine? Code: while ( (ch=getNextChar()) ) {
switch ( ch ) {
case '<':
tagString[tagStringLen++] = ch;
state = inTag;
break;
case '>':
tagString[tagStringLen++] = ch;
tagString[tagStringLen] = '\0';
process( tagString ); // set some state on seeing <test>, clear it on seeing </test>
state = outTag;
break;
// and so on
}
}
More generally, you would compare both 'state' and 'ch' to determine what 'newstate' should be, and perform any additional processing along the way. Adding detection of say comments is pretty easy.
__________________ If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut. Up to 8Mb PlusNet broadband from only £5.99 a month! |
| Salem is offline | |
| | #11 |
| Registered User Join Date: Jul 2009
Posts: 3
| You can recieve as much data as you want. So, you could recieve the data one byte at a time if you like. Not very efficient but you can parse the data as it comes in and stop the recieve calls when you wish. You are not locked into recieving a preset amount of data. As a side note (as mentioned before) TCP/IP does not preserve the data boundaries. Whereas UDP does but you are not guaranteed to recieve the data in at all. More information on this can be found here - Winsock |
| lonewolff is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Winsock issues | tjpanda | Windows Programming | 3 | 12-04-2008 08:32 AM |
| Winsock, weird sideeffect | Magos | Networking/Device Communication | 9 | 05-02-2005 01:46 PM |
| Winsock Messaging Program | Morgul | Windows Programming | 13 | 04-25-2005 04:00 PM |
| Where do I initialize Winsock and catch messages for it? | Lithorien | Windows Programming | 10 | 12-30-2004 12:11 PM |
| winsock | pode | Networking/Device Communication | 2 | 09-26-2003 12:45 AM |