Thread: Max Line Size and Structs

  1. #1
    Registered User
    Join Date
    Dec 2004
    Posts
    45

    Max Line Size and Structs

    Heres what I'm trying to do

    for each item in N file:
    for each item in M file:
    see if this N list item is the same as this M list item

    Now to do this I'm using this code and have a few problems :
    Code:
    void DupRemover( const string& dontcall)
    {
    	ifstream in("tested.all");
     	ofstream trial("newfile.all");
    	ifstream dnc("dnc.all");
    	ofstream nodup( "dup.all");
    	string line;
    	string number;
    	set<string, compare> strSet;
    	
    	while (getline(in, line))	 
    		{
                         strSet.insert(line);/*When I put this in the file trial>>"newfile.all"  it shortens the line to 64-65 characters which screws up my code from then on.  The line size is 501 characters per  row/*
                      }
    	set<string, compare>::iterator tic;
    copy(strSet.begin(),strSet.end(),ostream_iterator<string>(trial,"/n"));
    
    while (getline(dnc,number,'\n'))//;tic != strSet.end())	
    		{	  
    		
    		tic = find_if(strSet.begin(),strSet.end(),Search(number));
    		if( tic != strSet.end() )
    			{
    			trial<<*tic <<endl;
    			strSet.erase(tic);
                            }
    
    
            	}
    	copy(strSet.begin(),strSet.end(),ostream_iterator<string>(nodup,"\n"));
    }
    Do you have any ideas, the program doesn't really out put anything and it seems to hang in spots. Is the strSet.erase actually erasing the data at position tic when the data matches up?

    Thanks again for all your future and past help.

  2. #2
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Code:
    copy(strSet.begin(),strSet.end(),ostream_iterator<string>(trial,"/n"));
    ...
    copy(strSet.begin(),strSet.end(),ostream_iterator<string>(nodup,"\n"));
    One of these things is not like the other. I'll leave it to you to determine which it should be. Of course, maybe its a typo?


    Quote Originally Posted by ajb268
    the program doesn't really out put anything
    No output will get sent to the console if that's what you mean by "no output". It does look like there should be output sent to the files however. After running the program:

    newfile.all (trial) - You are writing to this file twice. I don't know if this is the intention. The first time, you use the copy templated function to output the entire contents of the setto the file. The second time, you are writing a list of the records from your set container where you were able to find a match of the numbers that you read in from the dnc.all (dnc) file. These record matches should be removed from the set. In the end, this file may be containing more than you want it to.

    dup.all (nodup) - This file should contain the final list of records, minus any duplicates and minus records that matched numbers from the dnc.all file.

    Quote Originally Posted by ajb268
    Do you have any ideas,
    It may help for your to show us your Search function object so we can make sure you got the lengths and offsets correct.

    Also, you could try sprinkling some debug couts in a couple places, maybe before/after the two copy functions to give you some feedback.

    Quote Originally Posted by ajb268
    Is the strSet.erase actually erasing the data at position tic when the data matches up?
    Yes, provided the find_if function finds a match, the record you have stored in the set at the position indicated by the iterator tic should be erased from the set.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  3. #3
    Registered User
    Join Date
    Dec 2004
    Posts
    45
    This is my adjusted Search code
    This is what I think it should do
    Tested is the main file the phone number is out on the 259th char on the row and the search uses the row(called number from dnc file)
    to check if it is found in tested. In the dnc file the phone number is the first thing hence; a.compare(258,12,value,0,12)
    Code:
    struct Search : unary_function<string, bool>
    {
        string value;
        Search(const string& val) : value (val) {}
        bool operator()(const string& a)
        {
            return a.compare(258,12,value,0,12) == 0;
        }
    };
    IN this section I want it to write to Nodup("dup.all") at the end without the numbers in dnc to be there. I kept code the way I had it for you to see what is wrong with it .

    I get a row from dnc.all and put it in number which goes into search the if statement != it is suppose to write to trial and erase tic, then copy the set to Nodup. IT seems that the if statement is never true because nothing is ever sent to Trial. if I set the if to == the program errors out at runtime.
    Code:
    {
    	ifstream in("tested.all");
     	ofstream trial("newfile.all");
    	ifstream dnc("dnc.all");
    	ofstream nodup( "dup.all");
    	string line;
    	string number;
    	set<string, compare> strSet;
    	
    	while (getline(in, line))	 
    		strSet.insert(line);
    	set<string, compare>::iterator tic;
    	while (getline(dnc,number,'\n'))
    		{	  
    		
    		tic = find_if(strSet.begin(),strSet.end(),Search(number));
    		if( tic != strSet.end() )
    			{
    			trial<<*tic <<endl;
    			strSet.erase(tic);
    			}
    		}
    	copy(strSet.begin(),strSet.end(),ostream_iterator<string>(nodup,"\n"));
    }
    IF you need anything else let me know I'll now got do some cout testing

  4. #4
    Registered User
    Join Date
    Dec 2004
    Posts
    45
    Quick question why do I get a can not read error when I try to
    cout<<*tic<<endl; I put this in between the find if and the if statement
    Code:
    tic = find_if(strSet.begin(),strSet.end(),Search(number));
    	here
    if( tic != strSet.end() )
    			{
    			trial<<*tic <<endl;
    			strSet.erase(tic);
    			}

  5. #5
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by ajb268
    Quick question why do I get a can not read error when I try to
    cout<<*tic<<endl; I put this in between the find if and the if statement
    Code:
    tic = find_if(strSet.begin(),strSet.end(),Search(number));
    	here
    if( tic != strSet.end() )
    {
        trial<<*tic <<endl;
        strSet.erase(tic);
    }

    The find_if function will return an iterator equivalent to strSet.end() if it does not find a match. You should not attempt to dereference the "tic" iterator unless you first know that it points to a valid element. If nothing is found and you try to display the contents of the invalid iterator by dereferencing it (with the * operator) you will get an error. I think this is probably what is happening. If you want to add some debug code here to output the string pointed to by the iterator, it must be inside the if brackets because at that point in the program we know we have a valid element:

    Code:
    tic = find_if(strSet.begin(),strSet.end(),Search(number));
    
    if( tic != strSet.end() )
    {
        cout << *tic << endl;
        trial<<*tic <<endl;
        strSet.erase(tic);
    }
    else
    {
        cout << "Could not find phone number: " << number << " in set." << endl;
    }
    Quote Originally Posted by ajb268
    This is my adjusted Search code
    This is what I think it should do
    Tested is the main file the phone number is out on the 259th char on the row and the search uses the row(called number from dnc file)
    to check if it is found in tested. In the dnc file the phone number is the first thing hence; a.compare(258,12,value,0,12)

    Code:
    struct Search : unary_function<string, bool>
    {
        string value;
        Search(const string& val) : value (val) {}
        bool operator()(const string& a)
        {
            return a.compare(258,12,value,0,12) == 0;
        }
    };
    That looks good, I think it should be doing what you want it to do. Depending on how big your test data is, you might consider altering that temporarily to show the two strings that it is comparing as the find_if function tries to find a match:

    Code:
    struct Search : unary_function<string, bool>
    {
        string value;
        Search(const string& val) : value (val) {}
        bool operator()(const string& a)
        {
            bool temp = a.compare(258,12,value,0,12) == 0;
            cout << "Comparing " << a.substr(258,12) << " from tested.all with "
                 << value << " from dnc.all, result is " << boolalpha << temp << endl;
            cin.get();
            return temp;
        }
    };
    This should let you see the two strings and the result of the comparison as they are compared against eachother. You will need to #include <iomanip> to get the boolalpha to work. I would try this with a small set of data or else the output will become quite large.

    Alternately, instead of writing to cout, you can open another ofstream object ("log.txt" for example) and then change all of your debug couts to use this new log file instead. Then you can run the program (you wouldn't see any output doing it this way), but at the end of the program you could then browse the debug statements in the log file at your leisure using whatever text editor you have handy.
    Last edited by hk_mp5kpdw; 01-12-2005 at 12:00 PM.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  6. #6
    Registered User
    Join Date
    Dec 2004
    Posts
    45
    Cool thanks for all of your help, your explanation really helped me to understand.
    It seems to make sense now thinking about all this makes me think it has to be the lengths and offsets because I added dnc numbers into the file and it still doesn't pass that if so I'll mess with that for awhile.

    Thanks again

  7. #7
    Registered User
    Join Date
    Dec 2004
    Posts
    45
    I have a separate issue now... Before I reduce the main file of the duplicates and do not calls, I need to group the file by id key code. How would I remove the duplicate part of this program to just search for duplicates of the same string and put them in a file. My code so far for this is:
    Code:
    #include "stdafx.h"
    #include <iostream>
    #include <fstream>
    #include <functional>
    #include <set>
    #include <string>
    #include <algorithm>
    
    
    using namespace std;
    
    
    int i=0;
    
    
    struct Search : unary_function<string, bool>
    {
        string value;
        Search(const string& val) : value (val) {}
        bool operator()(const string& a)
        {
            
    		return a.compare(17,10,value,0,4) == 0;
        }
    };
    
    struct compare: binary_function<string, string, bool> {
      bool operator()(const string& a, const string& b)
      {
    	
    	  return a.compare(18,10,b,18,10)>0;
    	
      }
    };
    
    
    
    
    void Annotate ( const string& inSource2)
    /* Searches given file with user entered keycodes and appends
    to the user given file with the extension of .all*/
    
    
    {
    	
    	string line;
    	string filename="box";
    	string keyCode;
    	int knum=2;
    	set<string,compare> strSet;
    	set<string,compare>::iterator it;
    	ifstream in(inSource2.c_str());
    	
    	cout<<"Using "<<inSource2<<" as master file."<<endl;
    	
            //cout<< "Enter your new file name: ";
    	//getline(cin,filename);          hard coded for testing
    	
            filename+=".all";
    	cout << "Your File Name is " << filename <<endl;
    	
            //cout<<"Enter number of key codes: ";
    	//cin>>knum;            Hard coded for testing
    	
            cout<<"Thinking..."<<endl;
    	
    	
    	
    	ofstream out(filename.c_str());
    	
    	
    	
    	while (getline(in,line))
    		strSet.insert(line);
    	while (knum!=0)
    	{
                    cout<<"Enter a Key Code: ";
    		getline(cin,keyCode);
    		cin.ignore(1,'\n');
    		it = find_if(strSet.begin(),strSet.end(),Search(keyCode));
    		if (it != strSet.end())
    			{
    			
    			}
    		knum--;
    	}	
    	copy(strSet.begin(),strSet.end(),(out,
    }
    
    
    
    int main (void)
    {
    	int Groupnum;
    	cout<<"How many files? ";
    	cin>> (Groupnum);
    	cout<<"Your number is "<<Groupnum<<endl<<endl;
    	while (Groupnum !=0)//allows you to enter multiple files 
    	{
    	Annotate("tested.all");
    	--Groupnum;
    	}
    	return(0);
    
    
    
    }
    My last program I didn't have a problem with the get line function after I fixed the string bug and using the fix in the stdafx.h file. But in this one it does some funky things. It will at times automatically press enter without allowing the user to enter the filename(when not hard coded) Program should allow you to pick multiple files to make and in each of those files to add multiple keycodes to such file. pulling from the tested.all file. I changed the offset but I need to get rid of the duplicate killer? Any suggestions...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Getting the size of an array of structs
    By steve1_rm in forum C++ Programming
    Replies: 3
    Last Post: 12-17-2008, 06:29 AM
  2. Error passing an array of structs
    By Catalyst8487 in forum C++ Programming
    Replies: 10
    Last Post: 12-15-2008, 03:38 PM
  3. Replies: 41
    Last Post: 07-04-2004, 03:23 PM
  4. dynamic arrays of structs...
    By matheo917 in forum C++ Programming
    Replies: 8
    Last Post: 12-14-2002, 06:57 AM
  5. structs and elements by reference?
    By kybert in forum C++ Programming
    Replies: 20
    Last Post: 10-07-2002, 09:46 PM