Thread: Downloading a webpage onto the disk?

  1. #1
    Registered User
    Join Date
    Apr 2003
    Posts
    7

    Question Downloading a webpage onto the disk?

    Hi.

    I would like to make a program that will be able to download a webpage, save it the the harddisk and then the program will interpret the file and alerts me if neccessary.

    For example, it will download a webpage(let's say a webpage on on the stock prices) onto the disk as an HTML file which my program can read and interpret and alerts the user on certain conditions(like the stock prices falls below a certain level).

    I tried looking into the Temporary Internet Folder and noticed that the HTML file that contained the code I want to interpret are there and I could probably do a system("start www.website.com/webpage.html" ) followed by reading the file in the Temporary Internet Folder. The problem with this solution is that the IE will pop up with the webpage(disturbing the user).

    Is there a more elegant way to directly download a webpage onto the disk? I am making a simple Console Program using Visual C++ 6.0.

    Thanks in advance.

  2. #2
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Here is some code which can be used to get the html from a webpage. You can then write the html to disk, parse it, or whatever you want.
    Code:
    /**
     * This function sends the http headers to the specified server.  The result is then
     * copied into a passed buffer.
     *	lpszServer - The webserver address
     *	lpszHttp - The http request
     *	data - buffer to hold the html returned
     *	datasize - the size of the data buffer
     */
    BOOL SendHttp(LPCTSTR lpszServer, LPCTSTR lpszHttp, LPTSTR data, UINT datasize)
    {
    	SOCKET s;
    	WSADATA wsaData;
    	struct sockaddr_in hostaddr;
    	struct hostent *serverent;
    	char *serverip;
    	char buff[1024];
    	int i,bytes;
    
    	/* Initialize sockets */
    	if(WSAStartup(MAKEWORD(2,2),&wsaData))
    	{
    		DisplayError("Error Initializing Sockets",GetLastError());
    		return FALSE;
    	}
    	
    	/* create a socket descriptor */
    	s = socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);
    	if(s == INVALID_SOCKET)
    	{
    		DisplayError("Error creating socket.",GetLastError());
    		return FALSE;
    	}
    
    	/* Get a hostent structure from the domain name */
    	if(!(serverent = gethostbyname(lpszServer)))
    	{
    		DisplayError("Could not resolve host name.",GetLastError());
    		return FALSE;
    	}
    
    	/* Get the ip address from the hostent structure */
    	if(!(serverip = inet_ntoa(*(struct in_addr *)*serverent->h_addr_list)))
    	{
    		DisplayError("Call to inet_ntoa failed",0);
    		return FALSE;
    	}
    
    	memset(&hostaddr,0,sizeof(struct sockaddr_in));
    	hostaddr.sin_family = AF_INET;
    	hostaddr.sin_addr.s_addr = inet_addr(serverip);
    	hostaddr.sin_port = htons(80);
    
    	/* Connect to the server */
    	if(connect(s,(struct sockaddr*)&hostaddr,sizeof(struct sockaddr)))
    	{
    		DisplayError("Unable to connect to server.",GetLastError());
    		return FALSE;
    	}
    
    	/* Send the http headers */
    	if(send(s,lpszHttp,strlen(lpszHttp),0) == SOCKET_ERROR)
    	{
    		DisplayError("Error Sending HTTP data.",GetLastError());
    		return FALSE;
    	}
    
    	/* Receive a response */
    	i = 0;
    	while(1)
    	{
    		bytes = recv(s,buff,sizeof(buff),0);
    		if(bytes <= 0) break;
    		if( (bytes + i + 1) > datasize) break; /* dont overflow the buffer */
    		memcpy(data + i, buff,bytes);
    		i += bytes;
    	}
    	data[i] = 0;
    
    	closesocket(s);
    
    	return TRUE;
    }
    
    void DisplayError(LPCTSTR lpszError, int errornum)
    {
    	char szError[256];
    
    	if(errno)
    		sprintf(szError,"%s\n\nError Number: %d",lpszError,errornum);
    	else
    		sprintf(szError,"%s",lpszError);
    	MessageBox(NULL,szError,"ERROR",MB_OK | MB_ICONERROR);
    }
    The http request might look something like:
    HTTP/1.1
    Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
    Accept-Language: en-us
    Content-Type: application/x-www-form-urlencoded
    User-Agent: Mozilla/4.0
    Host: website_server.com
    Content-Length: 0
    Cache-Control: no-cache
    \r\n\r\n

    This should be enough to get you started.

  3. #3
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    You can also use wininet to download web pages. This sample downloads a web page into memory. My apologies for the excessive use of goto.

    Code:
    #include <windows.h>
    #include <wininet.h>
    #include <stdlib.h>  /* for realloc */
    #pragma comment(lib, "wininet.lib")
    
    
    /* 
     * realloc wrapper that frees the original memory if the new memory can not be allocated.
     * Returns the new memory or NULL on failure.
     */
    void* ReallocOrFree(void* original_ptr, size_t new_size)
    {
    	void* temp = realloc(original_ptr, new_size);
    	if (!temp) free(original_ptr);
    	return temp;
    }
    
    
    /* 
     * Get the contents of an http, https, ftp or gopher file.
     * This is a blocking function that will not return until the file is completely loaded.
     * This may be a lengthy operation and this function must not be called from a GUI thread.
     * If lpcbSize is not NULL it is used to return the total size of the returned file.
     * The result must be released with free() when no longer needed.
     */
    char* GetInternetFile(LPCTSTR szURL, size_t cbMaxSize, size_t* lpcbActualSize)
    {
    	HINTERNET hNet         = NULL;
    	HINTERNET hUrlFile     = NULL;
    	char*     buffer       = NULL;
    	DWORD     cbBytesRead  = 0;
    	SIZE_T    cbBytesTotal = 0;
    	BOOL      bResult      = FALSE;
    	const DWORD cbReadSize = 0x4000;
    
    	if (!(hNet = InternetOpen(TEXT("CBOARD Downloader"), PRE_CONFIG_INTERNET_ACCESS, NULL, NULL, 0)))
    		goto cleanup;
    
    	if (!(hUrlFile = InternetOpenUrl(hNet, szURL, NULL, 0, INTERNET_FLAG_RESYNCHRONIZE, 0)))
    		goto cleanup;
    
    	do
    	{
    		if (!(buffer = (char*) ReallocOrFree(buffer, cbBytesTotal + cbReadSize)))
    			goto cleanup;
    
    		if (!InternetReadFile(hUrlFile, buffer + cbBytesTotal, cbReadSize, &cbBytesRead))
    			goto cleanup;
    
    		cbBytesTotal += cbBytesRead;
    
    		/* Max size check and size_t overflow check */
    		if (cbBytesTotal > cbMaxSize || ((((size_t) -1) - cbReadSize) - 1) < cbBytesTotal)
    			goto cleanup;
    
    	} while (cbBytesRead > 0);
    
    	if (!(buffer = (char*) ReallocOrFree(buffer, cbBytesTotal + 1)))
    		goto cleanup;
    
    	buffer[cbBytesTotal] = '\0';
    	bResult = TRUE;
    
    cleanup:
    	if (hUrlFile) InternetCloseHandle(hUrlFile);
    	if (hNet)     InternetCloseHandle(hNet);
    	if (!bResult) free(buffer);
    	if (lpcbActualSize) *lpcbActualSize = (bResult ? cbBytesTotal : 0);
    
    	return (bResult ? buffer : NULL);
    }
    
    
    
    #if 1
    #include <stdio.h>
    
    int main(void)
    {
    	size_t sz;
    	char* file_contents = GetInternetFile(TEXT("http://google.com/ie"), 100000, &sz);
    
    	if (file_contents)
    		printf("%s\n", file_contents);
    	else
    		printf("Unable to retrieve file.");
    
    	printf("Size is %d.\n", sz);
    
    	getchar();
    	return 0;
    }
    #endif
    In theory, you could also use the URLDownloadToFile() function, but it seems to be rather flaky.

  4. #4
    Registered User
    Join Date
    Nov 2004
    Posts
    12
    Thanks!

    Well they want to fetch the emails from the ebay page, so that they can inform the people that order goods from them over ebay about the status oft their order.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Disk failure in 3 disk RAID0 & 5?
    By cpjust in forum Tech Board
    Replies: 12
    Last Post: 12-22-2008, 10:09 AM
  2. C++ winsock downloading webpage 505 error
    By god_of_war in forum Networking/Device Communication
    Replies: 9
    Last Post: 12-29-2006, 04:35 PM
  3. lost disk
    By Benzakhar in forum Linux Programming
    Replies: 7
    Last Post: 01-11-2004, 06:18 PM
  4. Formatting Output
    By Aakash Datt in forum C++ Programming
    Replies: 2
    Last Post: 05-16-2003, 08:20 PM
  5. Towers of Hanoi, special output.
    By spoon_ in forum C Programming
    Replies: 3
    Last Post: 03-15-2003, 06:08 PM