Thread: compare structures

  1. #1
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53

    compare structures

    Hi,

    I have defined a structure as

    Code:
    typedef struct ps_report_rows
    {
        std::string cp_item_id; 
        std::string cp_object_string;
        std::string mfg_object_string;
        std::string mor_item_id; 
    } report_row;
    I have 2 different lists containing the report rows structure.

    Code:
    list<report_row> objectList1; 
    list<report_row> objectList2;
    Some report rows in the two lists would be common and some unique. I want to find all the unique report rows that exist in objectList1 and store them in another list.

    This is the code I have in mind

    Code:
    list < report_row > ::iterator listIter1;
    list < report_row > ::iterator listIter2;
    bool isDuplicate = false;
    list<report_row> finalList;
    
    finalList.clear();
    
    for (listIter1 = objectList1.begin(); listIter1 != objectList1.end() ; listIter1++)
    {
    	report_row primaryRow = *listIter1;
    
    	for (listIter2 = objectList2.begin(); listIter2 != objectList2.end() ; listIter2++)
    	{
    		isDuplicate = false;
    		report_row secondaryRow = *listIter2;
    
    		check_if_duplicate_row(primaryRow, secondaryRow, isDuplicate);
    
    		if(isDuplicate == true)
    			break;
    	}
    	if(isDuplicate == false)
    	{
    			finalList.push_back(primaryRow);
    	}
    }
    
    void check_if_duplicate_row(report_row primaryRow, report_row secondaryRow, bool& isDuplicate)
    {
    	if(primaryRow->cp_item_id!=secondaryRow->cp_item_id)
    	{
    		isDuplicate = false;
    		break;
    	}
    	if(primaryRow->cp_object_string!=secondaryRow->cp_object_string)
    	{
    		isDuplicate = false;
    		break;
    	}
    	if(primaryRow->mfg_object_string!=secondaryRow->mfg_object_string)
    	{
    		isDuplicate = false;
    		break;
    	}
    	if(primaryRow->mor_item_id!=secondaryRow->mor_item_id)
    	{
    		isDuplicate = false;
    		break;
    	}
    	isDuplicate = true;
    }
    Does this code look good enough? Is there a more efficient way to achieve this?

  2. #2
    The larch
    Join Date
    May 2006
    Posts
    3,573
    You can overload operator== for report_row and use std::find instead of the inner loop.

    Code:
    void check_if_duplicate_row(report_row primaryRow, report_row secondaryRow, bool& isDuplicate)
    This is very awkward. Why not return bool instead? Also, have you heard of the && operator (and)?

    If you want to make it more efficient, you might sort the list first. Then you'll only need to check pairs of neighbours to find out if they are duplicates or not.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  3. #3
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    use std::find instead of the inner loop.
    Will this work fine?

    Code:
    list < report_row > ::iterator listIter1;
    
    list<report_row> finalList;
    
    finalList.clear();
    
    for (listIter1 = objectList1.begin(); listIter1 != objectList1.end() ; listIter1++)
    {
    	report_row primaryRow = *listIter1;
    
    	if(std::find(objectList2.begin(), objectList2.end(), primaryRow) != objectList2.end())
    	{
    		finalList.push_back(primaryRow);
    	}
    
    }
    Don't I need to compare individual elements in the structure?

    If you want to make it more efficient, you might sort the list first. Then you'll only need to check pairs of neighbours to find out if they are duplicates or not.
    All the report_row elements in objectList1 are unique whereas objectList2 may contain duplicate report rows. So, don't think sorting the list would help.

    You can overload operator== for report_row
    Yes, will read up on operator overloading and try this.

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by lazyme
    All the report_row elements in objectList1 are unique whereas objectList2 may contain duplicate report rows. So, don't think sorting the list would help.
    Describe an algorithm that makes a single pass over the first list and at most a single pass over the second list to determine the elements that are present in the first list but not in the second list.

    Sounds tough? If both the lists are sorted, then you can avoid implementing the algorithm yourself by using std::set_difference. Another algorithm to accomplish the same task, of comparable efficiency (if you were using a container with random access instead of std::list), involves sorting the second list only, then making a single pass over the first list while binary searching the second list to determine if the current element of the first list is present in the second.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    Ok. I got your point.

    Will this work fine?

    Code:
    list < report_row > ::iterator listIterEnd, listIter;
    
    list<report_row> finalList;
    
    finalList.clear();
    
    objectList1.sort();
    objectList2.sort();
    
    listIterEnd = set_difference(objectList1.begin(), objectList1.end(), objectList2.begin(), objectList2.end(), finalList.begin());
    I assume finalList will contain the unique report rows from objectList1. Another person will write a function and will pass me the 2 lists so I cant test with actual data now.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    "Describe an algorithm that makes a single pass over the first list and at most a single pass over the second list to determine the elements that are present in the first list but not in the second list."

    Easy. Create a third and fourth list and... ^_^;

    Soma

  7. #7
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    Ok. What about the code I posted. Should that work fine?

  8. #8
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    objectList1.sort() gives me an error

    Code:
    bool std::operator <(const std::vector<_Ty,_Alloc> &,const std::vector<_Ty,_Alloc> &)' : could not deduce template argument for 'const std::vector<_Ty,_Alloc> &' from 'ps_report_rows
    Will I have to write my own sorting logic?

    Also, how can I use set_difference for my structure. I tried the code I posted above but that doesnt work.

  9. #9
    The larch
    Join Date
    May 2006
    Posts
    3,573
    I have no idea where the vector came from. Do you have operator< defined for ps_report_rows? What does it look like (might be tricky if you want to compare more than one field in a struct)?

    Code:
    bool operator< (const ps_report_rows& lhv, const ps_report_rows& rhv)
    {
         //comparison logic here
    }
    About using set_difference, I'm quite sure that you need to use std::back_inserter. You just can't assign values to the finalList (since it is empty) but you'll need to push_back results to it (which back_inserter does).

    Code:
    #include <iterator>
    ...
    list<report_row> finalList;
    
    //finalList.clear(); //it's already empty
    
    objectList1.sort();
    objectList2.sort();
    
    listIterEnd = set_difference(objectList1.begin(), objectList1.end(), objectList2.begin(), objectList2.end(), std::back_inserter(finalList));
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  10. #10
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    No, I don't have the operator < defined.

    I dont necessarily need to compare more than one field in the struct. Can I use set_difference if my lists are sorted by any field in the struct?

  11. #11
    The larch
    Join Date
    May 2006
    Posts
    3,573
    You'll either need to overload operator<, or provide a comparison function (that is the same for both sort and set_difference).

    The overloaded operator could be implemented like this:

    Code:
    bool operator< (const ps_report_rows& lhv, const ps_report_rows& rhv)
    {
         return lhv.cp_item_id < rhv.cp_item_id;
    }
    However, this means that only the cp_item_id will be considered when set_difference determines which items are the same or not. If this field is equal in two items then it won't matter whether the other fields are the same or not - according to this ordering these two items will be seen as equivalent. (In your original code you are checking whether all 4 fields are equal or not.)
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  12. #12
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    I want set_difference to determine the items are same only if all the four fields match.

    Can I achieve this with

    Code:
    bool operator< (const ps_report_rows& lhv, const ps_report_rows& rhv)
    {
         if(lhv.cp_item_id!=rhv.cp_item_id)
             return lhv.cp_item_id < rhv.cp_item_id;
    
         if(lhv.cp_object_string!=rhv.cp_object_string)
             return lhv.cp_object_string< rhv.cp_object_string;
    
         if(lhv.mfg_object_string!=rhv.mfg_object_string)
             return lhv.mfg_object_string < rhv.mfg_object_string;
       
         return lhv.mor_item_id < rhv.mor_item_id;
    }

  13. #13
    The larch
    Join Date
    May 2006
    Posts
    3,573
    It looks correct.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  14. #14
    Apprentice
    Join Date
    Oct 2006
    Location
    LA
    Posts
    53
    Thanks anon. I have written my code around this and it works fine. The structure contains 13 elements now.

    I am posting my complete code below. Please let me know if you can foresee any errors(mainly in the boolean operator < function) I cant see at the moment and also any suggestions to improve this code.

    Code:
    std::list<pn_report_row_t> finalDeltaList;
    finalDeltaList.clear();
    
    objectList1.sort();
    
    objectList2.sort();
    
    set_difference(objectList1.begin(), objectList1.end(), objectList2.begin(), objectList2.end(), std::back_inserter(finalDeltaList));
    Code:
    std::string siemens_trim(std::string inputStr)
    {
        if (!inputStr.empty())
        {
            /*get the first white space in the string*/
            std::string::size_type startPos = inputStr.find_first_not_of("\n\t ");
            if (startPos != std::string::npos)
            {
                /*get the last white space in the string*/
                std::string::size_type endPos = inputStr.find_last_not_of("\n\t ");
                /*trim the white spaces*/
                inputStr = inputStr.substr(startPos, endPos - startPos + 1);
            }
            else
            {
                inputStr.clear();
            }
    		return inputStr;
        }
    }
    Code:
    bool operator< ( pn_report_row_t& primaryRow,  pn_report_row_t& secondaryRow)
    {
    	string cpItemId = siemens_trim(primaryRow.cp_item_id);
    	string secCpItemId = siemens_trim(secondaryRow.cp_item_id);
    
    	string cpObjStr = siemens_trim(primaryRow.cp_object_string);
    	string secCpObjStr = siemens_trim(secondaryRow.cp_object_string);
    
    	string mfgObjStr = siemens_trim(primaryRow.mfg_object_string);
    	string secMfgObjStr = siemens_trim(secondaryRow.mfg_object_string);
    
    	string morItemId = siemens_trim(primaryRow.mor_item_id);
    	string secMorItemId = siemens_trim(secondaryRow.mor_item_id);
    
    	string morObjStr = siemens_trim(primaryRow.mor_object_string);
    	string secMorObjStr = siemens_trim(secondaryRow.mor_object_string);
    
    	string sorItemId = siemens_trim(primaryRow.sor_item_id);
    	string secSorItemId = siemens_trim(secondaryRow.sor_item_id);
    
    	string sorObjStr = siemens_trim(primaryRow.sor_object_string);
    	string secSorObjStr = siemens_trim(secondaryRow.sor_object_string);
    
    	string manufacturerType = siemens_trim(primaryRow.manufacturer_type);
    	string secManufacturerType = siemens_trim(secondaryRow.manufacturer_type);
    
    	string orderDesc = siemens_trim(primaryRow.order_desc);
    	string secOrderDesc = siemens_trim(secondaryRow.order_desc);
    
    	string addnOrderDesc = siemens_trim(primaryRow.additional_order_desc);
    	string secAddnOrderDesc = siemens_trim(secondaryRow.additional_order_desc);
    
    	string objStr = siemens_trim(primaryRow.object_string);
    	string secObjStr = siemens_trim(secondaryRow.object_string);
    
    	string lastOrdDt = siemens_trim(primaryRow.last_order_date);
    	string secLastOrdDt = siemens_trim(secondaryRow.last_order_date);
    
    	string howImpact = siemens_trim(primaryRow.how_impacted);
    	string secHowImpact = siemens_trim(secondaryRow.how_impacted);
    
    	if(cpItemId.compare(secCpItemId)!=0)
    	{
    		return cpItemId < secCpItemId;
    	}
    	
    	if(cpObjStr.compare(secCpObjStr)!=0)
    	{
    		return cpObjStr < secCpObjStr;
    	}
    	
    	if(mfgObjStr.compare(secMfgObjStr)!=0)
    	{
    		return mfgObjStr < secMfgObjStr;
    	}
    	
    	if(morItemId.compare(secMorItemId)!=0)
    	{
    		return morItemId < secMorItemId;
    	}
    	
    	if(morObjStr.compare(secMorObjStr)!=0)
    	{
    		return morObjStr < secMorObjStr;
    	}
    	
    	if(sorItemId.compare(secSorItemId)!=0)
    	{
    		return sorItemId < secSorItemId;
    	}
    	
    	if(morItemId.compare(secMorItemId)!=0)
    	{
    		return morItemId < secMorItemId;
    	}
    	
    	if(sorObjStr.compare(secSorObjStr)!=0)
    	{
    		return sorObjStr < secSorObjStr;
    	}
    	
    	if(manufacturerType.compare(secManufacturerType)!=0)
    	{
    		return manufacturerType < secManufacturerType;
    	}
    	
    	if(orderDesc.compare(secOrderDesc)!=0)
    	{
    		return orderDesc < secOrderDesc;
    	}
    	
    	if(objStr.compare(secObjStr)!=0)
    	{
    		return objStr < secObjStr;
    	}
    	
    	if(lastOrdDt.compare(secLastOrdDt)!=0)
    	{
    		return lastOrdDt < secLastOrdDt;
    	}
    	
    	return howImpact < secHowImpact;
    }

  15. #15
    The larch
    Join Date
    May 2006
    Posts
    3,573
    The comparison function looks a bit heavy. If it is not important for some reasons that the stored strings keep their trailing and leading whitespace, you could make sure that the strings are trimmed when you store them in a pn_report_row_t.

    Another thing to reduce code might be to store the strings in an array instead of individual variables. You can provide an enum, so each index has a descriptive name (yours aren't very descriptive).

    A small fault is that comparison operators should not modify what is being compared and therefore they should take the arguments by const reference.

    There's also a small problem with the trimming function. The return statement should be moved to the end of the function, otherwise it won't return anything if the string is empty. (The test whether the string is empty is not very important: find_first_not_of would simply return npos and it is OK to clear an empty string.)

    Code:
    std::string siemens_trim(std::string inputStr)
    {
        if (!inputStr.empty())
        {
            /*get the first white space in the string*/
            std::string::size_type startPos = inputStr.find_first_not_of("\n\t ");
            if (startPos != std::string::npos)
            {
                /*get the last white space in the string*/
                std::string::size_type endPos = inputStr.find_last_not_of("\n\t ");
                /*trim the white spaces*/
                inputStr = inputStr.substr(startPos, endPos - startPos + 1);
            }
            else
            {
                inputStr.clear();
            }
        }
        return inputStr;
    }
    
    class pn_report_row_t
    {
    public:
        enum
        {
            cpItemId, cpObjStr, mfgObjStr, morItemId, morObjStr, sorItemId, sorObjStr,
            manufacturerType, orderDesc, addnOrderDesc, objStr, lastOrdDt, howImpact,
            maxField
        };
        const std::string& get(unsigned index) const
        {
            assert(index < maxField);
            return fields[index];
        }
        void set(const std::string& value, unsigned index)
        {
            assert(index < maxField);
            fields[index] = siemens_trim(value);
        }
    private:
        std::string fields[maxField];
    };
    
    bool operator< (const pn_report_row_t& primaryRow, const pn_report_row_t& secondaryRow)
    {
        for (unsigned i = 0; i != pn_report_row_t::maxField; ++i) {
            if (primaryRow.get(i) != secondaryRow.get(i)) {
                return primaryRow.get(i) < secondaryRow.get(i);
            }
        }
        return false;
    }
    You could also optimize the trimming function a bit by avoiding some unnecessary copying. (A single exit point in a function is a good thing, but it isn't that important in C++ where clean-up can be managed by objects.)

    Code:
    std::string siemens_trim(const std::string& inputStr)
    {
        if (!inputStr.empty())
        {
            /*get the first white space in the string*/
            std::string::size_type startPos = inputStr.find_first_not_of("\n\t ");
            if (startPos != std::string::npos)
            {
                /*get the last white space in the string*/
                std::string::size_type endPos = inputStr.find_last_not_of("\n\t ");
                /*trim the white spaces*/
                return inputStr.substr(startPos, endPos - startPos + 1);
            }
        }
        return std::string();
    }
    Last edited by anon; 05-27-2009 at 08:24 AM.
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 07-11-2008, 07:39 AM
  2. Structures within Structures
    By Misko82 in forum C Programming
    Replies: 2
    Last Post: 08-27-2007, 12:25 AM
  3. Classes and Structures.
    By jrahhali in forum C++ Programming
    Replies: 6
    Last Post: 03-28-2004, 05:03 PM
  4. How do I compare structures??
    By aspand in forum C Programming
    Replies: 4
    Last Post: 06-09-2002, 10:47 AM
  5. Methods for Sorting Structures by Element...
    By Sebastiani in forum C Programming
    Replies: 9
    Last Post: 09-14-2001, 12:59 PM