# Thread: Simple Array or Not?

1. ## Simple Array or Not?

Here's what I'm having trouble with:

Large text file: I must pull out certain rows if they are in one of the 4 groups put that into a file. Called results.all Location in the file is constant at (302,2).

Then I have to exclude from that file (results.all) a list of keycodes they are numbers some are in sequence some are not two digits. The location in the file is constant (502,2).

First question could I use a struct or should I just use brute force. I need it to be as fast as possible? The keycodes have 28 members so should I make an array like keycode[27] but how to I get the members associated with each location in the array? Is there an easy way to do the ones that are in sequence ie.. [20-34]..

Any help would be great thanks again....

2. Chances are, reading the text file by itself will be the most expensive part of the exercise (disks are really slow by comparison to processors and memory).

So I'd go for something which is obvious - that is, it's easy to write and easy to see that its correct.

Personally, I'd use PERL for such tasks, since it has a large number of built-in functions and data types for field extraction, pattern matching and such like.

3. You wrote:
I'd use PERL for such tasks,
Well my only option is c++ 6.0 . But thanks.

I'm new to most of this but in past programs I used a struct but I'm unsure how to applie it to this case.

Here are the structs I'm looking at now:
Code:
```struct Search : unary_function<string, bool>
{
string value;
Search(const string& val) : value (val) {}
bool operator()(const string& a)
{

if (x==1)
return a.compare(302,2,value,0,2) == 0;
else
return a.compare(502,2,value,0,2)==0;
}
};
struct compare: binary_function<string, string, bool> {
bool operator()(const string& a, const string& b)
{

return a.compare(258,12,b,258,12)<0;

}
};```
Now at first this code was to search for phone numbers and remove repeats and do not calls. Well now the Search struct is sent the main file "a" and sent the string with a size of 2 which hold the keycodes like 01 05 10 20. The line increments to eventually search through all of the numbers. There is a problem however how should I change the compare struct so it doesn't erase repeats.
All I want it to do is to remove the rows that have specific keycodes there are 28 of them.
Any help would be great. Sorry for the long post

4. C++ looking code moved to the C++ board.

5. You wrote:

C++ looking code moved to the C++ board.
I don't understand your comment did you move me over to the C++ board??? If thats the case thanks but if you didn't then that is where I am and I still don't understand your statement.

Well anyway does anyone have some answers or even suggestions on how I can get off the ground?

6. > return a.compare(302,2,value,0,2)
You're complicating the issue.

1. read a line from a file
2. use a substring function to extract a part of that line, say y = input.substring(302,2);
3. compare that with whatever you're looking for, and act appropriately
4. repeat.

7. Code:
```int Disposition(string processed)
//does the whole thing over again
//with a new set of codes at a different location.
{

ifstream in("excluded.all");//has the keycodes to be kept in
string row;
string record;
ofstream out("finished.all");

while (getline(in,row))
{
ifstream pro(processed.c_str());
while (getline(pro,record))
{
if ((record.substr(501,2))==(row))
{
out<<record<<endl;
}
}
}
return 0;
}

int Annotate ( string main)//main file
/* Searches given file with user entered keycodes and appends
to the user given file with the extension of .all*/

{
string row;
string line;			// for data lines
string filename="results.all";
string group="group.all";
ofstream fout(filename.c_str());
ifstream in(group.c_str());
ifstream input(main.c_str());
while (getline(in,row))
{
ifstream input(main.c_str());
while (getline(input,line))
{

if ((line.substr(301,2))==row)
fout<<line<<endl;//<<" "<<row(0,2)<<endl;
}
}

Disposition(filename);//sends the processed results.all

return 0;
}```
I wrote this out and it works but, I need it to go much faster and thought a different way would be much faster. I had to change the excluded.all file from containing one I didn't want to one I wanted so that file went from 28 elements to 72 elements which causes the whole file to be run through that many times.. Any suggestions?

8. So how big are these input files?

For example, if ifstream in(group.c_str()); is small, then I'd read the whole thing into a map (pseudocode)
Code:
```#include <map>
#include <string>

// map a string to a bool
std::map<std::string,bool> mymap;

// read the first file into a map
while (getline(in,row)) {
mymap[row] = true;
}

// the process the main file ONCE as well
while ( getline(input,line) )  {
// test the key part, to see if it's in the map
if ( mymap[line.substr(301,2)] ) {
// yes, so output the line
fout<<line<<endl;
}
}```
Looking things up in a map is pretty efficient, which in itself is way more efficient than scanning the file many times.

Also, you're not closing the files when you're done with each loop.

9. You wrote:
So how big are these input files?
Well the group.c_str() file contains only 4, 2 character elements. I converted the Annotate over to the map you suggested. But Disposition the "excluded.all" has like 72 elements is that too big for a map also? The main file main.c_str() can be very big 50mb and up... Right now the main file is only 361kb which sickens me that it runs this slow like two minutes!!!!! In disposition the codes run from 01-99 and there are 28 elements that can't be in the file and thus 72 elements that should be. Each line of the file(processed.c_str()) at (501,2 ) has to match one of the 72 elements to be in the finished.all file. I'll go try and implement another map for disposition.
Thanks

10. > But Disposition the "excluded.all" has like 72 elements is that too big for a map also?
No.

11. Everything worked great time cut into pieces. PsuedoCode rocks.