I have to write a program to find certain phrases in different files. What's the best approach to doing this? Would it be better to do it C or a in .bat file?
Any pointers on how to get started?
TIA.
I have to write a program to find certain phrases in different files. What's the best approach to doing this? Would it be better to do it C or a in .bat file?
Any pointers on how to get started?
TIA.
Download Grep for Windows and write a batch file
Which is best depends on the specifics, and on the skills you have in writing bat files and C programs.
In C there is a function called strstr() (#include <string.h>), that is made for searching a string, for a sub string - right up your alley.
The algorithm could be as simple as:
You have a "find" command in Windows, already. If you put each file (and path if applicable), into a text file, you can re-direct the names, automatically, into either a C program, or "find", or your own bat file. "find" accepts wildcards.Code:while you have unsearched files in your list of files to be searched { Open the next text file in the list while(there are more rows of text in the file) { put the row in a char array if(strstr() test) is true code to handle a found target string } close that file print a message that file <name or number> has been searched. }
find "targetString" *.* /s handles all subdirectories also.
Last edited by Adak; 10-25-2012 at 01:42 PM.
If I were to do this for one file at a time, would this design work?
Create structure with :
an array of strings to look for
pass / fail array
Open file
Loop:
Search file for each string in array
If string found, set the corresponding value pass/ fail in array to 1
Increment number of passes
Print out number of passes out of total
print out failures ( strings that were not found)
Yes, in any case, the particulars can always be changed. The basics have to be there: opening the file/s, reading from them, strstr() and testing the return it gives, then taking actions based on that info, and closing that file.
"certain phrases" is not quite as simple as the suggestions above. You must ensure that the words which are contiguous in the phrases occur similarly in the file.
Phrases can span multiple lines in the text file. That means words may be separated by one or more spaces, punctuation, new-line character(s).
More complications:
Can a period interrupt a phrase which may span what appears to be multiple sentences? That could give a false positive. Do phrases contain periods also?
^^^^ Good point. The computer will need explicit strings to search for, and you may need to include researches of the file or some "fuzzy" logic, if you want anything more generic to be searched for.