Thread: scanning text files

  1. #1

    Unhappy scanning text files

    Hey all,
    Is there a way to find a specific set of characters in a file and write that to another file?
    What I'm trying to do is to search an html file for any " . " regardless of length, for the purposes of finding links. Then write all references found to a single text file.


  2. #2
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    So you want to look for every occurrency of " . " (that is: space-period-space) and write them to another file? Regardless of length? The length of that is 3, always. What you're asking doesn't seem to make sense.

    You should list an example or 2 as to what exactly you hope to accomplish

  3. #3
    Registered User deleeuw's Avatar
    Join Date
    Aug 2001


    Sorry for the lack of examples.
    Suppose I have an html file that has two links:

    <img src="vacation.jpg">
    <a href="introduction.html">

    I need to be able to seach through that file to find both
    "vacation.jpg" and "introduction.html" --obviously being different lengths, but having in common:


    When those matches are found, it writes them to a file, named "links.txt".
    When, therefore, I open up links.txt, I find:


    I hope that's clearer.

    Deum solum fidentia est

  4. #4
    if the only double quotes in the file are used in this context then search for the first double quote char by char. place all char after the first double quote in a char buffer until you find the second double quote. When you find the second double quote put a null char at the end of the char buffer and send the string to the new file. Repeat until end of file.

    If the double quotes are just for emphasis in your post, then use get(buffername, buffersize, space) as the input stream method reading the source. After each file is read determine length of buffer with strlen(). Use a loop looking for the period char. If the element after the period char is not a null char, then save the buffer to the new file.

    If you are familiar with STL strings you can scan in one word at a time with the version of get() for the string class, then use find() or maybe findFirst() to locate the period character. Then use pointer math to evaluate the next character in the string. If it's NULL then it's period at the end of sentence. If it's not NULL, then it is the period in an address.

  5. #5
    Registered User deleeuw's Avatar
    Join Date
    Aug 2001

    forgot one thing

    forgot to add:
    The idea is that one can enter any html file to be searched, and that the links within may be of any number, and thus may not be known by the person doing the searching (hence, the search!)

    Hope that doesn't confuse the matter too much!
    Deum solum fidentia est

  6. #6
    Magically delicious LuckY's Avatar
    Join Date
    Oct 2001
    Ah. That makes much more sense ;P
    With that in mind, don't you think it would be easier to search for "<img" and "<a" instead of just a period? I mean, because a period can be in any part of the document.

    First you have to determine the maximum length of a line, or if you don't want to, use a string object to dynamically read from the file so length doesn't matter. Here's a start using the former:
    char szBuf[MAX_LEN+1];
    while (fin.good()) {
      if (strstr(szBuf,"<a")) {
        //find "href=\"" then the enclosing "\""
      else if (strstr(szBuf,"<img")) {
        //find "src=\"" then it's enclosing "\""
      //fout << the filename here
    Just a start to help u in the right direction. I'd write more, but don't have time now.
    Hope it helps some.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help creating multiple text files
    By Tom Bombadil in forum C Programming
    Replies: 19
    Last Post: 03-28-2009, 11:21 AM
  2. text files in c++
    By DungeonMaster in forum C++ Programming
    Replies: 5
    Last Post: 03-14-2006, 03:48 PM
  3. Batch file programming
    By year2038bug in forum Tech Board
    Replies: 10
    Last Post: 09-05-2005, 03:30 PM
  4. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 07:35 PM