Thread: Array or Pointer (The Better Way?)

  1. #1
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Array or Pointer (The Better Way?)

    Okay, For a time now I have been working on a file sorter of sorts that
    Takes a large file (approx 3-20meg) of text. The Prog finds the phone number and makes a file out of that, then the program checks for duplicate phone #'s and makes a new file of the contacts info. Program below:

    Code:
    #include "stdafx.h"
    #include "stdlib.h"
    #include "iostream.h"
    #include "fstream.h"
    #include "string.h"
    #include <io.h>
    #include <string>
    #include <stdio.h>
    #include <iomanip.h>
    #include <strstrea.h>
    
    char phonenumber[9];
    int j=0;
    char phonenumber2[9];
    const int lineSize = 501;
    int i=0;
    
    int phonelist(const char *phonefilename,const char *filename)
    {	
    	char dataLine[lineSize];
    	ofstream fout2;
    	ifstream filename2(filename, ios::in);
    	fout2.open(phonefilename,fstream::in | fstream::out);//|fstream::app);
    	i=0;
    	while (filename2.getline(dataLine, lineSize, '\n')) 
    	{	
    		char *phonestr=0;
    		phonestr = strtok (dataLine+258," ");//finds the phone number
    		if ((i%2)==0)//for some reason it stays on the same line twice
    		fout2<<phonestr<<endl;	
    		i++;
    	}	
    	fout2.close();
    	
    	return 0;
    }
    
    
    int DupRemover(const char *filename,const char *nodup)
    {/* this basically makes the phone file and then removes duplicates from "filename" and makes a new file without the dupes*/
    	char phonefilename[10]="phone.all";
    	j=0;
    	ofstream fout3;
    	char stupidLine[lineSize];
    	
    	
    	phonelist(phonefilename,filename);
    	
    	
    	fout3.open(nodup,fstream::in | fstream::out);//| fstream::app); 
    	ifstream phonefile(phonefilename,ios::in);
    	int h=0;
    	i=0;
    	while (phonefile.getline(phonenumber,12,'\n'))
    	{
    	
    		ifstream filename21(filename, ios::in);
    		j=0;
    		int d=0;
    		while (filename21.getline(stupidLine, lineSize)) 
    		{	
    			j++;
    			if (d==2)//it does it twice had to stop that
    				d=0;
    			if (d==0)
    			{
    				//fout5<<"inner loop "<<phonenumber<<" "<<j<<" "<<h<<endl;
    			//	fout5<<stupidLine+258<<endl;
    				if ((strstr(stupidLine+258, phonenumber) !=0))//Only searches the first 270 columns 
    				{	
    					if (j<=h)
    						break;
    					fout3<<stupidLine<<endl;//Writes the whole line and returns to the next line
    					
    					h=j;
    					break;
    				}
    			}
    			d++;
    		}
    	
    		filename21.close();	
    	}
    	
    	fout3.close();
    	return 0;
    
    }
    
    
    
    int main (void)
    {
    
    	char filename[12];
    	char nodup[12];
    	cout<<"Enter your processed file: ";
    	cin.getline (filename,11);
    	strncat (filename, ".all", 6);
    	cout << "Your File Name is " << filename << ".\n";
    	cout<<"Enter the new file name: ";
    	cin.getline(nodup,11);
    	strncat (nodup, ".all",11);
    	DupRemover(filename,nodup);
    	return(0);
    
    
    
    }
    Sorry for long post, the file consists of 501 characters in a row. Currently to process a 8 meg file it couldn't take upwards of 40 minutes and with a possiblity of 20 meg files I need a new way. I'm a beginner in programming so please bear with me. Any help would be great.

  2. #2
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Simpler Post

    Sorry again that post was so long: My main question is how to load an array from a file... From above the code the strings are loaded from the text file and it takes too long. I thought maybe somebody would know a way to use arrays or pointers to expedite things. Or to make my current program faster. Thanks again

  3. #3
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    Ultimately, it all has to be handled piece by piece. The way to make code go faster is simply to reduce the number of steps required, and organize what's left the best way possible, but you can't change the fact that you have to use an array. http://www.aihorizon.com has some good articles on making efficient algorithms (they say they focus on AI, but I find them better suited for this purpose). There's no one way to make this kind of program go faster, but read some of the articles on search and sort algorithms, etc... and then take another look at how you designed this program. Try and apply what you learned to think up a better way to read through the arrays. I doubt however, that you will get this program to go much more than a few minutes faster.

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    The first order of business is to not use a quadratic algorithm that requires disk I/O. That's a large portion of your time right there. Start by thinking in terms of how to best structure your data for efficient processing. Here's one quick example off the top of my head:
    Code:
    #include <cstring>
    #include <fstream>
    #include <functional>
    #include <set>
    #include <string>
    
    using namespace std;
    
    struct compare: binary_function<string, string, bool> {
      bool operator()(const string& a, const string& b)
      {
        return strncmp(a.c_str() + 258, b.c_str() + 258, 12) < 0;
      }
    };
    
    void DupRemover(const char *filename, const char *nodup)
    {
      ifstream in(filename);
      string line;
      set<string, compare> rem_dup;
    
      while (getline(in, line))
        rem_dup.insert(line);
    
      ofstream out(nodup);
      set<string, compare>::const_iterator it = rem_dup.begin();
    
      while (it != rem_dup.end())
        out<< *it <<endl;
    }
    Though this will work (if I understand your file's formatting), you may find that paging data in and out of virtual memory is still too slow. In that case you may have to resort to less elegant methods. But don't go there until you're sure that it's still too slow for your needs.
    My best code is written with the delete key.

  5. #5
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Much thanks

    I realized that the code was hideous looking it was a product of blunt force and no style. Thanks for your input I'll check out that link and hopefully I can make some headway.
    One more thing to make things clearer: Right now to check for repeats it has to compare each phone number at a time. Is there a way to eliminate the duplicate #'s all at once or if I had a separate file of Do not call numbers anyway of searching the whole file with the whole list of DNC numbers all at once? and then get rid of the row if it is found.
    I just can't figure why this should take so long if in something like sql it could look at a column of numbers and yell at you if there is repeating fields and it was designated the primary key....
    Oh well of to research some more thanks again

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Is there a way to eliminate the duplicate #'s all at once
    No, the best you can do is handle them one at a time as quickly as possible.

    >I just can't figure why this should take so long if in something like sql
    Relational databases are implemented with clever algorithms and structuring by brilliant programmers such that queries are uberfast. That's quite a difference when compared to your brute-force-and-ignorance algorithm written by a C++ beginner. Don't feel bad, your problem is a common one.
    My best code is written with the delete key.

  7. #7
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Still have problem's

    Well I tried to implement your code but all heck breaks loose when I do. some errors are this:
    Code:
    Compiling...
    arraytest.cpp
    c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
    ic_string<char,std::char_traits<char>,std::allocator<char> >,compare,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,compare,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<ch
    ar> > > >' : identifier was truncated to '255' characters in the debug information
    
    etc.....
    Do you have any suggestions on what is going on? Any help is awesome.

  8. #8
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Additional Problem

    The line of code it points to is in the Set file for the warning
    typedef _Imp::size_type size_type;

    The line of code it points to in the prog for the error is
    set<string, compare> rem_dup;

    Hope this helps
    Thanks again

  9. #9
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    warning C4786


    That is not an error. It is a warning, and a very annoying one at that. Basically it's telling you that the in the debug mode the identifier is limited to 255 char. In my use of STL containers with MSV6 I get this warning quite often. I ignore it and every thing works out ok. I have heard you can tell the compiler to ignore irritating warnings like this, but I haven't implemented such a process.

  10. #10
    Hello,

    You may find the following links useful pertaining to ignoring complier Warning C4786:

    Also, you say an error occurs with the line of code using set? If so, what is the error? I'm not very familiar with this subject, though you can find some useful information here.


    - Stack Overflow
    Segmentation Fault: I am an error in which a running program attempts to access memory not allocated to it and core dumps with a segmentation violation error. This is often caused by improper usage of pointers, attempts to access a non-existent or read-only physical memory address, re-use of memory if freed within the same scope, de-referencing a null pointer, or (in C) inadvertently using a non-pointer variable as a pointer.

  11. #11
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Implementation of Code

    Well the error pertained to the lack of a header file my bad...
    Anyway I'm still working on Preludes code, his code does seem to go faster but I get erroneous output it just outputs the same thing over and over again.

    The struct code in particular seems to compare the phone numbers but returns the compare when one string is greater than the other. I changed this to != and < and <= and >= but it just picks another row to print out and repeats it in a loop. < prints the 9th record...
    > prints the 33rd record != prints the last record and == prints the first record.

    Can anyone figure that out, again I'm trying to get it to parse out the repeating numbers/strings and remove it from the file.

    Thanks so far for everything you've all been very helpful.

  12. #12
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    I need help

    Prelude first thanks for the code, Second could you add some comments to the code you put up.. It seems to make an infinite loop going and it prints only one line over and over. I need to increment each iteration to the next line but the code to me is confusing.
    Thanks again

  13. #13
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    I wrote it off the top of my head, so it could be wrong. Also, I can't test it because I don't have an example of the file you're using. I was going off of your description and your code. I'll be happy to comment it once it's been tested.
    My best code is written with the delete key.

  14. #14
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Code and Test File

    I attached the code and a sample text file which I made shorter so it can be processed. Note the character size per line is now approx 269 or 270 characters. The last character of the phone # is 269.
    I renamed the test file to test.txt in my program I add an .all extension.


    Thanks Again!!

  15. #15
    Registered User
    Join Date
    Dec 2004
    Posts
    45
    Code:
      set<string, compare> rem_dup;
    
      while (getline(in, line))
        rem_dup.insert(line);
    
      ofstream out(nodup);
      set<string, compare>::const_iterator it = rem_dup.begin();
    
      while (it != rem_dup.end())
        out<< *it <<endl;
    }
    Prelude thanks again for the code. "it" just needed to be incrementedto go to the next line of the file. For awhile there it was just printing the same line over and over. Thanks again.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 0
    Last Post: 05-29-2009, 05:48 AM
  2. sorting the matrix question..
    By transgalactic2 in forum C Programming
    Replies: 47
    Last Post: 12-22-2008, 03:17 PM
  3. pointer to array of structs
    By Luken8r in forum C Programming
    Replies: 2
    Last Post: 01-08-2008, 02:05 PM
  4. towers of hanoi problem
    By aik_21 in forum C Programming
    Replies: 1
    Last Post: 10-02-2004, 01:34 PM
  5. Hi, could someone help me with arrays?
    By goodn in forum C Programming
    Replies: 20
    Last Post: 10-18-2001, 09:48 AM