Thread: strstr woes

  1. #1
    Registered User
    Join Date
    Apr 2005
    Posts
    12

    strstr woes

    My goal is to read a file until a specific string is found and then output what was read into another file. I've been working on this for a few days now and this is all I've been able to come up with:

    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    
    
    int main()
    {
     char fileRead[256]; 
     char * buffitUP;  
     int bufferlength;
     char needle[] = "</script>"; 
     cout << "enter name of the html file to open and edit: "; 
     cin.getline(fileRead, 256); 
     ifstream oFile;
    oFile.open (fileRead);
    oFile.seekg (0, ios::end);
    bufferlength = oFile.tellg();
    oFile.seekg (0, ios::beg);
     buffitUP = new char [bufferlength];
    
    	
     oFile.read (buffitUP, bufferlength);
     if (strstr (buffitUP, needle) != NULL)
    	 cout << "found it";
    return 0;
    }
    I've hit a brick wall after I've located the specific string. Any help would be greatly appreciated.

  2. #2
    Registered User Mortissus's Avatar
    Join Date
    Dec 2004
    Location
    Brazil, Porto Alegre
    Posts
    152
    Please, tell me more about your problem. Why don´t you just create an ofstream and print the result of strstr? As you must know strstr returns the first occurence of needle in buffitUP, NULL if not found.

  3. #3
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    1) When you read char's into a char array, generally you don't want to read anything into the last spot in the array. That spot is usually reserved for a '\0' character which allows you to treat the array as a c-style string, and that allows you to use cstring functions on the array, as well as being able to output the array using just the array name:
    Code:
    cout<<my_charArray;
    When you do this:
    Code:
    cin.getline(fileRead, 256);
    your program will only read in 255 char's and the last spot will automatically be reserved for a '\0' character. cin.getline() is defined to put a '\0' in the last spot. However, when you do this:

    oFile.read(buffitUP, bufferlength);

    the last spot in the array is not reserved for a '\0' character, and instead the last spot is filled with a character from the file, so you no longer have a c-style string. If you try to use cstring functions on the array, you may get errors from going out of bounds(i.e. past the end of the array), and you can't output the array using just the array name. All the cstring functions use the '\0' character to let them know when the end of the array has been reached. Your buffitUP array does not have that '\0' at the end, which could create problems if, for instance, the string "</script>" was not present in the file.

    So, you need to create a char array with a size that is one bigger than bufferlength, and then you need to specifically assign a '\0' character to the last spot in the array. Even then, if you output the whole file with cout<< after you read it in, there will be a bunch of junk characters at the end of the displayed file. I believe that has to do with the fact that there can be more characters in the file than C++ produces when it reads the file. That's because in Windows when you hit Return at the end of a line in your file, it causes two invisible characters to be entered at the end of the line: \r\n, and C++ converts those two characters to just '\n'. So, the number of characters in the file is not the same number of characters that C++ produces when it reads the file. When you did that seeking and telling, you got the total number of characters in the file--not the total number of characters C++ produces from the file.

    That means the char array you created was too big, and there will be some random junk characters at the end of the array because those spots in the array never get initialized or assigned a value. For that reason, I don't think seekg() and tellg() are recommended for text files.

    There are other ways to read in the whole file, for instance since your bufferlength is larger than or equal to the total number of chars C++ produces when it reads the text file, you could do something like this:
    Code:
    while(!oFile.eof())
    {
    	oFile.getline(buffitUP, bufferlength, '*');
    }
    
    cout<<buffitUP<<endl
    The delimitter, which is the third parameter to getline() and has a default value of '\n'( the endline character), can be set to some char that won't be found in the file. Also, getline() will automatically tack on a '\0' after the last char read in, which means buffitUP will be a c-style string. You could also just forgo the seeking and telling, and declare buffitUP as a static array with some maximum size, like 5000.

    If there isn't a char that definitely won't be in the file which you can use as a delimiter, then I'm not sure of an easy way to read in the whole file in one shot. I think you might have to read in each line into a temp variable, and then copy that temp variable into a char array that is being used to accumulate the whole file. That would require keeping track of the position in the array of the last char read in. Maybe someone else can comment on the easiest way to read in a whole file into a char array.

    2) strstr() will return a pointer to the first occurrence of "</script>", which means it will return a pointer to the '<' char. If you name that pointer 'end', then using pointer arithmetic you can add 8 to end and get a pointer to the '>' char. Then, once again using pointer arithmetic, you can get the difference between buffitUP and end, which will give you the number of char's in the char array from the start of the array to the '>' character in the </script> tag. Then, using a for loop and the difference as the loop conditional, you can copy each char from buffitUP to another char array that holds the result, and finally tack on a '\0' character after the last character. Whew!

    3) But, why go to all that trouble when you can use a string type and its find() and substring() function? You can read in the file line by line and add each new line to a string variable accumulating the result by just using the '+' sign. You could also use find() on each string you read in, and when you find the </script> tag, you could break out of the loop, and not read the rest of the file.
    Last edited by 7stud; 04-02-2005 at 01:45 AM.

  4. #4
    Registered User
    Join Date
    Apr 2005
    Posts
    12
    Okay I think i am getting closer this is what I have so far:

    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    
    
    int main()
    {
     char fileRead[256]; 
     char * haystack; 
     int bufferlength;
     char needle[] = "</script>"; 
     char fileNew[] = "newhtml.htm";
     char delemitter = 177;
    	
     cout << "enter name of the html file to open and edit: "; 
     cin.getline(fileRead, 256); 
        ifstream inputFile(fileRead);  
        inputFile.seekg (0, ios::end);
    bufferlength = inputFile.tellg();
        inputFile.seekg (0, ios::beg);
    	haystack = new char [bufferlength];
    	ofstream outputFile(fileNew);
    	while(!inputFile.eof())
    {
    	inputFile.getline(haystack, bufferlength, '¥');	
    	if(!strstr(haystack, needle) == NULL)
    	{
    			inputFile.close();
    	        cout << "finished";
    		    break;
    	} else
    	if(strstr(haystack, needle) == NULL)
    	{
    	   cout << haystack;
           outputFile << haystack;
    	}
    }
          outputFile.close();
    return 0;
    }
    However, the only thing the while loop really accomplishes is 'cout << "finished";' the program exists and I am left with a blank newhtml.htm file.

  5. #5
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    Look at this conditional:
    Code:
    if(!strstr(haystack, needle) == NULL)
    	{
    			inputFile.close();
    	        cout << "finished";
    		    break;
    Suppose "</script>" is found in the file. That means the comparison:

    strstr(haystack, needle) == NULL

    will be false. Then, !false evalutes to true, so the if block is executed, and in the if block, you close the file and output "finished". I don't think that is what you want to do when you find "</script>" in the file.

    When you get that straightened out, you still aren't quite grasping the complexity of what you have to do. This is what your program does now:
    Code:
    haystack = new char [bufferlength];
    ...	
    inputFile.getline(haystack, bufferlength, '¥');	
    ...
    cout << haystack;
    outputFile << haystack;
    After using the getline() function, the variable haystack will contain your whole file, and then you write the whole file to outputFile. If you are intent on using char arrays, then this is what you have to do:
    2) strstr() will return a pointer to the first occurrence of "</script>", which means it will return a pointer to the '<' char.
    But, you want a marker that points to the end of the </script> tag, so you need to move the pointer to the end of the tag:
    If you name that pointer 'end', then using pointer arithmetic you can add 8 to end and get a pointer to the '>' char.
    That gives you a marker to the spot in the char array buffitUP, namely the closing '>' character of the </script> tag, that is the end of the data you want to extract.
    Then, once again using pointer arithmetic, you can get the difference between buffitUP and end, which will give you the number of char's in the char array from the start of the array to the '>' character in the </script> tag.
    buffitUP is a pointer to the first char in the array, and after doing the above steps, the variable end will be a pointer to '>' character in the </script> tag. If you subtract those pointers, the difference is an integer that will be the total number of characters you want to copy from the char array buffitUP.
    Then, using a for loop and the difference as the loop conditional, you can copy each char from buffitUP to another char array that holds the result, and finally tack on a '\0' character after the last character. Whew!
    That means you need to create a for-loop starting at 0 and ending once the specified number of characters has been copied. You also need to create a char array that is going to hold the char's you copy from buffitUP. Then, you just loop through buffitUP, which is an array, so you can use subscript notation to pick out the char's, and then you can assign them to the result array, e.g.:

    buffitUP[35] = result[35];

    Once you get the proper portion of buffitUP copied to result, and it displays properly using cout<<, you can write result to an output file.

    I suggest you attempt the following before writing the whole program. If you can't do the following, you won't be able to do the steps outlined above. Here is a char array:

    char data[] = "hello world";

    1) Use strstr() to get a pointer to the 'w'. Then, subtract the pointer to 'h', which is data, from the pointer to 'w' to get the number of char's between the two pointers. To be clear, you want:

    pointer to 'w' - pointer to 'h'

    not the other way around. Then, use a for loop and that integer number to end the loop. Inside the for-loop copy the characters in data to another char array.

    2) Next, write a program that uses the pointer to 'w' to get a pointer to the char 'd' in data. Then, get the difference between that pointer and a pointer to the first char(i.e. data) and use a for loop to copy the characters in data up to and including the 'd' into another array.

    When you are trying to write a program with concepts you've never tried before, or don't quite understand, it is better to first attempt a smaller practice program that eliminates the file I/0, and focuses on doing just the core operations.
    Last edited by 7stud; 04-02-2005 at 04:19 PM.

  6. #6
    Registered User
    Join Date
    Apr 2005
    Posts
    12
    I think I got it, but before I post the finished product I will post 7stud's exercise just in case anyone else is having the same problem I had.

    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    
    using namespace std;
    
    int main()
    {
     char data[] = "hello world";
     char * pointerw = strstr (data, "w");
     char * pointerh = strstr (data, "h");
     char * pointerd = strstr (data, "d");
     char * newdata;
     char * newdataTwo;
    	int wh = pointerw - pointerh;
     newdata = new char [wh];
    	for (int i = 0; i < wh; i++)
    	{
    		newdata[i] = data[i];
    	}
     cout << newdata << endl;
     cout << wh << endl;
        int dw = pointerd - pointerw;
        cout << dw << endl;	
        int dh = dw + wh;
    	cout << dh << endl;
     newdataTwo = new char [dh];
    	for (int a = 0; a < dh + 1; a++)
    	{
    		newdataTwo[a] = data[a];
    	}
     cout << newdataTwo << endl;
    return 0;
    }
    Here is the finished product:
    Code:
    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    
    
    int main()
    {
     char needle[] = "// -->"; 
     char fileNew[] = "newhtml.htm";
     char fileRead[256];
     int bufferlength;
     char * haystack; 	
     char * toCopy;
     cout << "enter name of the html file to open and edit: "; 
     cin.getline(fileRead, 256); 
    
        ifstream inputFile(fileRead);  
       
    	inputFile.seekg (0, ios::end);
    	bufferlength = inputFile.tellg();
        inputFile.seekg (0, ios::beg);
    	haystack = new char [bufferlength];
    	
    	ofstream outputFile(fileNew);
    
    	inputFile.getline(haystack, bufferlength, '¥');	
    
     char * pointer1 = strstr (haystack, needle);
     char * pointer2 = strstr (haystack, "<");
     
     int counter = pointer1 - pointer2;
      
     toCopy = new char [counter];
     for (int i = 0; i < counter - 2; i++)
     {
    	 toCopy[i] = haystack[i];
     }
     outputFile << toCopy;
     outputFile << "\n index = 0; \n for (i = 0; i < ansMap.length; ++i) { document.write (TranslateAnswer (ansMap[i], i)); \n } \n // --> \n </script></head></html>";
    
     cout << "Open newhtml.htm in your favorite webbrowser and enjoy\n";
     
    return 0;
    }
    Last edited by magis; 04-03-2005 at 08:11 AM.

  7. #7
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    Hey, well done!

    You skipped one thing I was trying to encourage you to try: getting the pointer to 'd' when all you have is the pointer to 'w'. In other words, I thought you should try getting the pointer to 'd' without directly using strstr(). So, if you haven't already tried it, use strstr() to get the pointer to 'w', but then use pointer arithmetic to get the pointer to 'd'. In your program, the way I envisioned it, you are not going to be able to use strstr() to get the '>' in the </script> tag because there are going to be many other '>' characters in the file.

    Of course if in the finished product you already figured out a way to get a pointer to the '>' char in the </script> tag, then...nice work.

    edit: now that I think about it, have you thought about what will happen in either of these cases:
    Code:
    </   script>
    
    </s cript>
    
    </script       >
    Last edited by 7stud; 04-02-2005 at 10:15 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Create Copies of Files
    By Kanshu in forum C++ Programming
    Replies: 13
    Last Post: 05-09-2009, 07:53 AM
  2. Quick Ques on String Manipulation
    By ckuttruff in forum C Programming
    Replies: 8
    Last Post: 06-22-2008, 09:32 PM
  3. linked list using strstr
    By ilovec.. in forum C Programming
    Replies: 3
    Last Post: 11-04-2006, 01:30 PM
  4. strstr on a wchar array
    By cloudy in forum C++ Programming
    Replies: 5
    Last Post: 06-28-2006, 06:42 AM
  5. Question about strstr()
    By choykawairicky in forum C++ Programming
    Replies: 2
    Last Post: 11-28-2004, 08:18 PM