Thread: vector duplicates removal

  1. #1
    Registered User
    Join Date
    Oct 2010
    Posts
    1

    vector duplicates removal

    Hello EveryOne . Actually it is just an example that's why I have used variable names like a, x, y, z . They are strings because I am reading from a file in the actual assignment which has more than 60000 records (lines). Those strings are a combination of both strings and ints. pred is a struct object. vPred is a struct type of vector.

    Could anyone please help me complete this assignment. The first and last string elements of each line (in that file of 60000 records) are start date/time and end date/time respectively. I have to find the duplicates in original vector where all string elements in a line are the same except the first element i-e start date/time. One line of file can have many duplicates i-e 2 or even 3 e-g. I have to find the difference between the minits of the start date/time and the minits of the end date/time and push_back the element from the duplicates to the original vector where the time difference is the minimum. e-g

    Start date/time of '1' element is 2006/06/01 16:34:43
    End date/time of that element is 2006/06/01/ 16:55: 51

    Start date/time of duplicate element (of '1') is 2006/06/01 16:24:43
    End date/time of that element is 2006/06/01/ 16:55: 51 (same as '1')

    I now have to find the difference between 34 (minits of '1' element's start date/time) and 55 (minits of '1' element's end date/time) which is equal to 21

    and the difference between 24 (minits of duplicate (of '1') element's start date/time) and 55 (minits of duplicate (of '1') element's end date/time) which is equal to 31.

    since 21 is less than 31 so I will push_back the element with time difference of 21 to the original vector and will discard all its duplicates.

    like wise finding the time difference between all the duplicates of element '1' as it can have many duplicates and not just one.

    Could anyone please help me do this? I hope I have explained well. For the time being I have put random int+string combination to struct first (start date/time) and last (end date/time) member variables instead of proper date/time to find the difference between the int part of that variable.

    Code:
    #include <string>
    #include <vector>
    #include <iostream>
    #include <algorithm>
    
    struct MyPred
    {
    	std::string a;
    	std::string x;
    	std::string y;
    	std::string z;
    
    	MyPred(const std::string& a, const std::string& x, const std::string& y, const std::string& z): a(a), x(x), y(y), z(z) {}
    
    	bool operator==(const MyPred& p) const
    	{
    		return x == p.x && y == p.y && z == p.z; // a == p.a && 
    	}
    
    	bool operator<(const MyPred& p) const
    	{
    		//if(a < p.a) return true;
    		//if(a > p.a) return false;
    		if(x < p.x) return true;
    		if(x > p.x) return false;
    		if(y < p.y) return true;
    		if(y > p.y) return false;
    		if(z < p.z) return true;
    		if(z > p.z) return false;
    		return false;
    	}
    };
    
    
    int main()
    {
    	std::vector<MyPred>* vPred = new std::vector<MyPred>;
    	vPred->push_back(MyPred("a2c", "1Gak", "c", "d4f"));
    	vPred->push_back(MyPred("j4h", "b", "c", "j87h"));
    	vPred->push_back(MyPred("d4f", "1Gak", "c", "d4f"));
    	vPred->push_back(MyPred("n7s", "1Gak", "c", "d4f"));
    	vPred->push_back(MyPred("l9m", "b", "c", "j87h"));
    	vPred->push_back(MyPred("p24a", "x", "c", "p43a"));
    	vPred->push_back(MyPred("q56r", "l", "m", "q90r"));
    	vPred->push_back(MyPred("g11v", "8f", "h", "g63v"));
    	vPred->push_back(MyPred("u3w", "v", "d", "u11w"));
    	vPred->push_back(MyPred("k76l", "x", "c", "p43a"));
    	vPred->push_back(MyPred("p24a", "g", "z", "p43a"));
    
    	// The values need to be in order for equal_range() to work
    	std::sort(vPred->begin(), vPred->end());
    
    	std::vector<MyPred> uPred; // values that were always unique
    	std::vector<MyPred>* dPred = new std::vector<MyPred>; // values that were duplicated
    
    	std::pair<std::vector<MyPred>::iterator, std::vector<MyPred>::iterator> ret;
    
    	for(std::vector<MyPred>::iterator i = vPred->begin(); i != vPred->end(); i = ret.second)
    	{
    		/*ret = std::equal_range(i, vPred.end(), *i);
    		if(ret.second - ret.first == 1)
    		{
    			uPred.push_back(*i);
    		}
    		else
    		{
    			dPred.push_back(*i);
    		}*/
    
    		ret = std::equal_range(i, vPred->end(), *i);
    		
    		if(ret.second - ret.first != 1) // duplicates
    		{
    				for(std::vector<MyPred>::iterator j = ret.first; j != ret.second; ++j)
    				{
    					dPred->push_back(*j); //put each duplicate onto a new vector
    				}
    		}
    		else if(ret.second - ret.first == 1)
    		{
    			uPred.push_back(*i);
    		}
    	}
    
    	std::cout << "vPred: Sorted input\n";
    	for(std::vector<MyPred>::iterator i = vPred->begin(); i != vPred->end(); ++i)
    	{
    		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
    	}
    
    	std::cout << "dPred: Only the values that were duplicated\n";
    	for(std::vector<MyPred>::iterator i = dPred->begin(); i != dPred->end(); ++i)
    	{
    		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
    	}
    
    	std::cout << "uPred: Only the values that were unique\n";
    	for(std::vector<MyPred>::iterator i = uPred.begin(); i != uPred.end(); ++i)
    	{
    		std::cout << "[" << i->a << ", " << i->x << ", " << i->y << ", " << i->z << "]" << '\n';
    	}
    
    	delete vPred;
    	delete dPred;
    
    	char a;
    	std::cin >> a;
    }

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,633
    Cross-posted to at least 2 other forums.

    Jim

  3. #3
    Registered User
    Join Date
    Aug 2005
    Posts
    1,267
    Do you need all 60000 rows of the file in the vector at the same time? If not, then why not check for duplicates when the file is read? e.g. Read a line, check if it already exists in the vector. If it does then update the vector, if not then add a new item to the vector.

  4. #4
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    If you don't want duplicates, why not use a std::set instead of a vector?
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Removal of gotos
    By ninety3gd in forum C++ Programming
    Replies: 14
    Last Post: 03-04-2010, 05:48 AM
  2. my Dictionary removal code correct?
    By George2 in forum C# Programming
    Replies: 1
    Last Post: 05-17-2008, 08:45 AM
  3. linux removal
    By MadCow257 in forum Tech Board
    Replies: 2
    Last Post: 03-24-2006, 09:48 AM
  4. Deleting object after list removal (C++ visual studio)
    By RancidWannaRiot in forum Windows Programming
    Replies: 2
    Last Post: 10-20-2005, 06:06 PM
  5. looking for dead code removal
    By Unregistered in forum C Programming
    Replies: 5
    Last Post: 09-21-2001, 07:48 AM