Thread: My program needs to read from a synchronized html file

  1. #1
    Registered User
    Join Date
    Nov 2005
    Posts
    9

    Cool My program needs to read from a synchronized html file

    Hi everyone. I am trying to write a C program that can check for data on a website. The way I have it planned out is for Windows XP home edition to synchronize the site every hour or so in order that it can tell if anything has been added to the site. However, when I want an offline webpage, it doesn't save in HTML. Instead, it's just like a link to the offline page that my program can't read. So the questions are:

    1. Anyone know how to synchronize HTML files, not "offline web pages" with XP home?
    2. If the above is not possible, then can anyone tell me how or refer me to an online resource that can help me get my C program to update the info itself?

    I've got Bloodshed dev C++ and I plan on making this a command line app if that helps. Thanks a lot.

  2. #2
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    You can probably get the actual file name for the cached page using GetUrlCacheEntryInfo (see the WinINet functions)

    Then you just have to set up a schedule to synchronize the pages, which you can do through internet explorer by right-clicking your favorites item and clicking "make available offline" (that's one way, at least).
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  3. #3
    Registered User
    Join Date
    Nov 2005
    Posts
    9
    Thanks for the reply. These functions do look useful, except I'm kind of new to these Wininet functions. I think I got the point that this is a BOOL function and it will return true or false (not sure what to do with that result) and that I'm supposed to include a structure with all the information declared that the function will get. But I'm gettin some compile errors and I'm wonderin if someone could show me generally how the program should be structured or show me a website that tells me so I know I'm doin it right. Also, I'm not 100% on how I can access these variables inside the struct, and what are the necessary headers? Thanks in advance guys, and hopefully soon I'll be able to get a grasp on this internet stuff..
    Last edited by istheman5; 11-28-2005 at 03:14 PM.

  4. #4
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    I've only used a couple of the WinINet functions before, but I found a page on msdn that gives an example of GetUrlCacheEntryInfo(and other cache functions)
    And I went ahead and tried it out myself. This seems to work alright for me:
    Code:
      LPINTERNET_CACHE_ENTRY_INFO cei; 
          DWORD dwSize = 0;
          if(!GetUrlCacheEntryInfo("http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/system_error_codes__0-499_.asp",NULL,&dwSize))
          {
            if(GetLastError()==ERROR_INSUFFICIENT_BUFFER)
            {
              //just need to allocate correct buffer size
              cei = (LPINTERNET_CACHE_ENTRY_INFO)new char[dwSize];
              cei->dwStructSize = dwSize;
              //try again
              if(!GetUrlCacheEntryInfo("http://msdn.microsoft.com/library/default.asp?url=/library/en-us/debug/base/system_error_codes__0-499_.asp",cei,&dwSize))
                MessageBox(NULL,"Could not retrieve cache entry info","Error",MB_OK);//handle error
              else //if successful, print out filename
                MessageBox(NULL,cei->lpszLocalFileName,"Cache entry file:",MB_OK);
            }
            else
              MessageBox(NULL,"Could not retrieve cache entry info","Error",MB_OK);//handle error
          }
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  5. #5
    Registered User
    Join Date
    Nov 2005
    Posts
    9
    Thanks for your help thus far. I'm still havin a couple problems.
    First of all, my compiler didnt like
    Code:
    cei = (LPINTERNET_CACHE_ENTRY_INFO)new char[dwSize];
    -it said new must be declared first or somethin like that..
    So I erased it after I put together some code (slightly different from yours for no real reason) and realized that it wasn't even getting past the first if statement..
    Code:
    #include <stdio.h>
    #include "C:\Dev-Cpp\include\windows.h"
    #include "C:\Dev-Cpp\include\wininet.h"
    
    int main(){
        LPINTERNET_CACHE_ENTRY_INFO info;
        DWORD dwSize = 0;
        if(!GetUrlCacheEntryInfo("http://www.yahoo.com/",NULL,&dwSize)){
            if(GetLastError()==ERROR_INSUFFICIENT_BUFFER){
              info->dwStructSize = dwSize;
              if(!GetUrlCacheEntryInfo("http://www.yahoo.com/",info,&dwSize)){
                  printf("Error 1.\n");
              }
              else{
                  printf("Filename is %s\n", info->lpszLocalFileName);
              }
        }
        else{
             printf("Error 2.\n");
        }
    }
    }
    I kept getting Error 2. Being that I'm an average c programmer, I accept that I am probably making stupid mistakes and don't realize it. JaWiB, thanks so far, and hopefully this problem will be solved.

  6. #6
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    My only guess is that the website is stored in the cache under a different name.

    You might try using InternetOpen, InternetOpenUrl, and InternetReadFile instead. I believe that InternetOpenUrl will actually open the cached file anyways if you specify the correct flag.
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  7. #7
    Registered User
    Join Date
    Nov 2005
    Posts
    9
    OK. Nice to know there is an alternative.. But before I start a new approach, I'd like to make sure that when we talk about the "cache", we are referring to C:\WINDOWS\Offline Web Pages on my XP os. Also, it seems that whenever I specify the URL of a site that isn't in my cache, I get the appropriate error message, and whenever I specify a URL that is in my cache, I get a stupid XP error popup, the one where it asks ya to send an error report. Strange.. Anyway though, concerning this new approach. I believe I have InternetOpen and InternetOpenUrl down (except for lpszAgent in InternetOpen, it is NULL right now, is that ok?), but InternetReadFile still confuses me. How exactly does this buffer work (lpBuffer)? And how does the whole bytes to read and bytes read thing work? And what exactly is this function going to read for me? Sorry for askin so many questions.. Everyday though I am getting more and more of a feel for this stuff. thanks in advance
    Last edited by istheman5; 11-29-2005 at 08:19 PM.

  8. #8
    carry on JaWiB's Avatar
    Join Date
    Feb 2003
    Location
    Seattle, WA
    Posts
    1,972
    The problem with your code is that you don't allocate info anywhere. If you are compiling this as C code then you can use malloc:
    Code:
    cei = (LPINTERNET_CACHE_ENTRY_INFO)new char[dwSize];
    //replace with 
    cei = (LPINTERNET_CACHE_ENTRY_INFO)malloc(dwSize);
    As far as InternetReadFile goes, you can do something like this:
    Code:
    char buf[512];
    int bytesread=1;
    while(bytesread)//if bytesread is 0, you've reached the end of the file
      {
        if(InternetReadFile(hUrl,buf,512,&bytesread))
        {
          //buf contains the next chunk of file now
        }
        else
        {
          //failed to read from file
        }
      }
    "Think not but that I know these things; or think
    I know them not: not therefore am I short
    Of knowing what I ought."
    -John Milton, Paradise Regained (1671)

    "Work hard and it might happen."
    -XSquared

  9. #9
    Registered User
    Join Date
    Nov 2005
    Posts
    9
    Niiice.. malloc() worked. I guess I could've avoided a lot of questions if I told you what language I was using! Getting the path of the HTML file - done. But here's what I'm trying to do. This program will (hopefully) read this HTML file and find out if a certain image is present in it. I wrote some code to get it to read the file and apparently, its not working:
    Code:
    void CheckForPic(){
      int i = 1;
      char curChar;
      char testString[13];
      while(!feof(ofile)){
        curChar = fgetc(ofile);
        if(curChar == 'd'){
          testString[0] = 'd';
          for(i; i < 12; i++){
            testString[i] = fgetc(ofile);
            printf("%s\n", testString); // I put this here to try to debug
          }
          if(testString == "devilsnow.gif"){
            printf("Its there\n");
            system("PAUSE");
            exit(1);
          }
        }
      }
      printf("Its not\n");
      system("PAUSE");
      exit(1);
    }
    It's supposed to just read the file until it finds a 'd', and then read 12 characters after the 'd' all to one string, and see if it matches the image name. If it doesn't it should repeat the process and find another 'd'. Here's my HTML file if it helps:
    Code:
    <html>
    <body>
    <img src="images/devilsnow.gif"></img>
    </body>
    </html>
    And, when all seemed well, here's the output I got:
    Code:
    dy"
    dy>
    dy>
    4ÿ"
    dy>
    <ÿ"
    dy>
    <i"
    dy>
    <im
    dy>
    <imgàÿ"
    dy>
    <img ÿ"
    dy>
    <img s"
    dy>
    <img sr
    dy>
    <img src°>ÃwxDÁwÿÿÿÿû¾Ãd
    Its not
    So the real problem is solved. But perhaps you can answer this one too?

    Thank you very much so far

    EDIT: Nevermind, got it workin with fread(). Thanks for all the help Dawib. Later.
    Last edited by istheman5; 12-01-2005 at 04:04 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. A development process
    By Noir in forum C Programming
    Replies: 37
    Last Post: 07-10-2011, 10:39 PM
  2. Need Help Fixing My C Program. Deals with File I/O
    By Matus in forum C Programming
    Replies: 7
    Last Post: 04-29-2008, 07:51 PM
  3. help with text input
    By Alphawaves in forum C Programming
    Replies: 8
    Last Post: 04-08-2007, 04:54 PM
  4. My program can't read an internet file!
    By Queatrix in forum Windows Programming
    Replies: 3
    Last Post: 05-06-2005, 04:25 PM
  5. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM