Parasing website data : intermitten rubbish characters retrived
hi, i am making a program to parse data from website, to do that i need to download the file
Step1: download file
Code:
CString Data;
//CString Buffer;
DeleteUrlCacheEntry(url);// delete the old stupid cache
HINTERNET IntOpen = ::InternetOpen("Sample", LOCAL_INTERNET_ACCESS, NULL, 0, 0);
HINTERNET handle = ::InternetOpenUrl(IntOpen, url, NULL, NULL, NULL, NULL);
HANDLE hFile = ::CreateFile("c:\\index.txt", GENERIC_WRITE, NULL, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
char Buffer[1024];
DWORD dwRead =0;
while(::InternetReadFile(handle, Buffer, sizeof(Buffer), &dwRead) == TRUE)
{
if ( dwRead == 0)
break;
DWORD dwWrite = 0;
::WriteFile(hFile, Buffer, dwRead, &dwWrite, NULL);
Data+=Buffer;
}
::CloseHandle(hFile);
::InternetCloseHandle(handle);
the Cstring "Data" contains the website in a plain text
step2 : parse the data using brackets
because a lot of data in within <> brackets, this can be used to reference the desired data
Code:
// this function look for the text and removes "bracket_distance" number of <>, then return the result
// eg. "dsfsd<><><><>6.35<>", item = dsfsd, bracket_distance = 4
CString Mydialog::Parse_Backets(CString file_string, CString item, int bracket_distance)
{
file_string.ReleaseBuffer();
int start_index;
int end_index;
start_index = file_string.Find(item);
if(start_index == -1)
{
CString error_string = "Error";
error_flag = 1;
return error_string;
}
for(int i =0; i <bracket_distance; i++)
{
start_index = file_string.Find(">",start_index)+1;
}
end_index = file_string.Find("<",start_index) - 1;
file_string=file_string.Mid(start_index, end_index-start_index+1 );
return file_string;
}
now the problem is once in a while i get rubbish characters. Like the actual value when i browse to the website, should be 0.55 , i get 0.aj5m5, or even 0.1595
the website is http://stquote.sgx.com/live/st/STStock.asp?stk=G
does anyone knows how to solve this problem?
using:
- mfc
- VC6.0