Array or Pointer (The Better Way?)

**ajb268** · 12-28-2004

Okay, For a time now I have been working on a file sorter of sorts that
Takes a large file (approx 3-20meg) of text. The Prog finds the phone number and makes a file out of that, then the program checks for duplicate phone #'s and makes a new file of the contacts info. Program below:

Code:

#include "stdafx.h"
#include "stdlib.h"
#include "iostream.h"
#include "fstream.h"
#include "string.h"
#include <io.h>
#include <string>
#include <stdio.h>
#include <iomanip.h>
#include <strstrea.h>

char phonenumber[9];
int j=0;
char phonenumber2[9];
const int lineSize = 501;
int i=0;

int phonelist(const char *phonefilename,const char *filename)
{	
	char dataLine[lineSize];
	ofstream fout2;
	ifstream filename2(filename, ios::in);
	fout2.open(phonefilename,fstream::in | fstream::out);//|fstream::app);
	i=0;
	while (filename2.getline(dataLine, lineSize, '\n')) 
	{	
		char *phonestr=0;
		phonestr = strtok (dataLine+258," ");//finds the phone number
		if ((i%2)==0)//for some reason it stays on the same line twice
		fout2<<phonestr<<endl;	
		i++;
	}	
	fout2.close();
	
	return 0;
}


int DupRemover(const char *filename,const char *nodup)
{/* this basically makes the phone file and then removes duplicates from "filename" and makes a new file without the dupes*/
	char phonefilename[10]="phone.all";
	j=0;
	ofstream fout3;
	char stupidLine[lineSize];
	
	
	phonelist(phonefilename,filename);
	
	
	fout3.open(nodup,fstream::in | fstream::out);//| fstream::app); 
	ifstream phonefile(phonefilename,ios::in);
	int h=0;
	i=0;
	while (phonefile.getline(phonenumber,12,'\n'))
	{
	
		ifstream filename21(filename, ios::in);
		j=0;
		int d=0;
		while (filename21.getline(stupidLine, lineSize)) 
		{	
			j++;
			if (d==2)//it does it twice had to stop that
				d=0;
			if (d==0)
			{
				//fout5<<"inner loop "<<phonenumber<<" "<<j<<" "<<h<<endl;
			//	fout5<<stupidLine+258<<endl;
				if ((strstr(stupidLine+258, phonenumber) !=0))//Only searches the first 270 columns 
				{	
					if (j<=h)
						break;
					fout3<<stupidLine<<endl;//Writes the whole line and returns to the next line
					
					h=j;
					break;
				}
			}
			d++;
		}
	
		filename21.close();	
	}
	
	fout3.close();
	return 0;

}



int main (void)
{

	char filename[12];
	char nodup[12];
	cout<<"Enter your processed file: ";
	cin.getline (filename,11);
	strncat (filename, ".all", 6);
	cout << "Your File Name is " << filename << ".\n";
	cout<<"Enter the new file name: ";
	cin.getline(nodup,11);
	strncat (nodup, ".all",11);
	DupRemover(filename,nodup);
	return(0);



}

Sorry for long post, the file consists of 501 characters in a row. Currently to process a 8 meg file it couldn't take upwards of 40 minutes and with a possiblity of 20 meg files I need a new way. I'm a beginner in programming so please bear with me. Any help would be great.

**ajb268** · 12-28-2004

Sorry again that post was so long: My main question is how to load an array from a file... From above the code the strings are loaded from the text file and it takes too long. I thought maybe somebody would know a way to use arrays or pointers to expedite things. Or to make my current program faster. Thanks again

**sean** · 12-28-2004

Ultimately, it all has to be handled piece by piece. The way to make code go faster is simply to reduce the number of steps required, and organize what's left the best way possible, but you can't change the fact that you have to use an array. http://www.aihorizon.com has some good articles on making efficient algorithms (they say they focus on AI, but I find them better suited for this purpose). There's no one way to make this kind of program go faster, but read some of the articles on search and sort algorithms, etc... and then take another look at how you designed this program. Try and apply what you learned to think up a better way to read through the arrays. I doubt however, that you will get this program to go much more than a few minutes faster.

**Prelude** · 12-28-2004

The first order of business is to not use a quadratic algorithm that requires disk I/O. That's a large portion of your time right there. Start by thinking in terms of how to best structure your data for efficient processing. Here's one quick example off the top of my head:

Code:

#include <cstring>
#include <fstream>
#include <functional>
#include <set>
#include <string>

using namespace std;

struct compare: binary_function<string, string, bool> {
  bool operator()(const string& a, const string& b)
  {
    return strncmp(a.c_str() + 258, b.c_str() + 258, 12) < 0;
  }
};

void DupRemover(const char *filename, const char *nodup)
{
  ifstream in(filename);
  string line;
  set<string, compare> rem_dup;

  while (getline(in, line))
    rem_dup.insert(line);

  ofstream out(nodup);
  set<string, compare>::const_iterator it = rem_dup.begin();

  while (it != rem_dup.end())
    out<< *it <<endl;
}

Though this will work (if I understand your file's formatting), you may find that paging data in and out of virtual memory is still too slow. In that case you may have to resort to less elegant methods. But don't go there until you're sure that it's still too slow for your needs.

**ajb268** · 12-28-2004

I realized that the code was hideous looking it was a product of blunt force and no style. Thanks for your input I'll check out that link and hopefully I can make some headway.
One more thing to make things clearer: Right now to check for repeats it has to compare each phone number at a time. Is there a way to eliminate the duplicate #'s all at once or if I had a separate file of Do not call numbers anyway of searching the whole file with the whole list of DNC numbers all at once? and then get rid of the row if it is found.
I just can't figure why this should take so long if in something like sql it could look at a column of numbers and yell at you if there is repeating fields and it was designated the primary key....
Oh well of to research some more thanks again

**Prelude** · 12-28-2004

>Is there a way to eliminate the duplicate #'s all at once
No, the best you can do is handle them one at a time as quickly as possible.

>I just can't figure why this should take so long if in something like sql
Relational databases are implemented with clever algorithms and structuring by brilliant programmers such that queries are uberfast. That's quite a difference when compared to your brute-force-and-ignorance algorithm written by a C++ beginner. Don't feel bad, your problem is a common one.

**ajb268** · 12-29-2004

Well I tried to implement your code but all heck breaks loose when I do. some errors are this:

Code:

Compiling...
arraytest.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,compare,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,compare,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<ch
ar> > > >' : identifier was truncated to '255' characters in the debug information

etc.....

Do you have any suggestions on what is going on? Any help is awesome.

**ajb268** · 12-29-2004

The line of code it points to is in the Set file for the warning
typedef _Imp::size_type size_type;

The line of code it points to in the prog for the error is
set<string, compare> rem_dup;

Hope this helps
Thanks again

**elad** · 12-29-2004

warning C4786

That is not an error. It is a warning, and a very annoying one at that. Basically it's telling you that the in the debug mode the identifier is limited to 255 char. In my use of STL containers with MSV6 I get this warning quite often. I ignore it and every thing works out ok. I have heard you can tell the compiler to ignore irritating warnings like this, but I haven't implemented such a process.

**Stack Overflow** · 12-29-2004

Hello,

You may find the following links useful pertaining to ignoring complier Warning C4786:

Info About Microsoft Visual C++ 6.0 [Important: Subsection Warning Messages]
FIX: C4786 Warning Is Not Disabled with #pragma Warning

Also, you say an error occurs with the line of code using set? If so, what is the error? I'm not very familiar with this subject, though you can find some useful information here.

- Stack Overflow

**ajb268** · 12-29-2004

Well the error pertained to the lack of a header file my bad...
Anyway I'm still working on Preludes code, his code does seem to go faster but I get erroneous output it just outputs the same thing over and over again.

The struct code in particular seems to compare the phone numbers but returns the compare when one string is greater than the other. I changed this to != and < and <= and >= but it just picks another row to print out and repeats it in a loop. < prints the 9th record...
> prints the 33rd record != prints the last record and == prints the first record.

Can anyone figure that out, again I'm trying to get it to parse out the repeating numbers/strings and remove it from the file.

Thanks so far for everything you've all been very helpful.

**ajb268** · 12-29-2004

Prelude first thanks for the code, Second could you add some comments to the code you put up.. It seems to make an infinite loop going and it prints only one line over and over. I need to increment each iteration to the next line but the code to me is confusing.
Thanks again

**Prelude** · 12-29-2004

I wrote it off the top of my head, so it could be wrong. Also, I can't test it because I don't have an example of the file you're using. I was going off of your description and your code. I'll be happy to comment it once it's been tested.

**ajb268** · 12-30-2004

I attached the code and a sample text file which I made shorter so it can be processed. Note the character size per line is now approx 269 or 270 characters. The last character of the phone # is 269.
I renamed the test file to test.txt in my program I add an .all extension.

Thanks Again!!

**ajb268** · 12-30-2004

Code:

  set<string, compare> rem_dup;

  while (getline(in, line))
    rem_dup.insert(line);

  ofstream out(nodup);
  set<string, compare>::const_iterator it = rem_dup.begin();

  while (it != rem_dup.end())
    out<< *it <<endl;
}

Prelude thanks again for the code. "it" just needed to be incrementedto go to the next line of the file. For awhile there it was just printing the same line over and over. Thanks again.

Thread: Array or Pointer (The Better Way?)

Thread Tools

Search Thread

Display

Array or Pointer (The Better Way?)

Simpler Post

Much thanks

Still have problem's

Additional Problem

Implementation of Code

I need help

Code and Test File

Similar Threads

Returning a multidimensional array from pointer function in Visual C++

sorting the matrix question..

pointer to array of structs

towers of hanoi problem

Hi, could someone help me with arrays?