Case insensitive string compare...?

This is a discussion on Case insensitive string compare...? within the C++ Programming forums, part of the General Programming Boards category; Hi, In Item 35 of "Effective STL", Scott Meyers says to "Implement simple case-insensitive string comparisons via mismatch or lexicographical_compare". ...

  1. #1
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545

    Case insensitive string compare...?

    Hi,
    In Item 35 of "Effective STL", Scott Meyers says to "Implement simple case-insensitive string comparisons via mismatch or lexicographical_compare".

    I can see how the mismatch version would work, but when I tried creating a string comparison function with lexicographical_compare(), it isn't working because lexicographical_compare() only returns true "if the range of elements [start1,end1) is lexicographically less than the range of elements [start2,end2)".

    I tried adding a not2() around the predicate, but that didn't help.

    I have almost the same code as the book (except I'm using the C++ tolower() function)...
    Code:
     /** ToLowerChar
    	 *  This is a functor that wraps the std::tolower() function.
    	 */
    	template <typename E>
    	class ToLowerChar	:	public std::binary_function<E, std::locale, E>
    	{
    	public:
    		/** ToLowerChar()
    		 *  A constructor for the ToLowerChar functor that sets the Locale to use when converting
    		 *  the characters to lower-case.
    		 *
    		 *  @param loc [IN] - The locale to use for character conversions.
    		 */
    		ToLowerChar( const std::locale&  loc = std::locale() )
    		:	m_Locale( loc ) {}
    
    		/** operator()
    		 *  This operator will return a lower-case version of the character passed to it.
    		 *
    		 *  @param ch [IN] - This is the character to convert to lower-case.
    		 *
    		 *  @param loc [IN] - (optional) This is the locale to use.
    		 *
    		 *  @return E - The lower-case version of the character that was passed in.
    		 */
    		E operator()( E  ch ) const
    		{
    			return std::tolower( ch, m_Locale );
    		}
    
    	private:
    		const std::locale&	m_Locale;	/**< This is the locale to use for character conversions. */
    	};
    
    	/** CharLessNoCase
    	 *  A functor to compare 2 characters without regard to case.
    	 */
    	template <typename E>
    	class CharLessNoCase	:	public std::binary_function<E, E, bool>
    	{
    	public:
    		/** CharLessNoCase()
    		 *  A constructor for the CharLessNoCase functor that sets the Locale to use when converting
    		 *  the characters to lower-case.
    		 *
    		 *  @param loc [IN] - The locale to use for character conversions.
    		 */
    		CharLessNoCase( const std::locale&  loc = std::locale() )
    		:	m_Locale( loc ) {}
    
    		/** operator()
    		 *  First converts both characters to lower-case, then returns true if they
    		 *  are the same or false if they're not.
    		 *
    		 *  @param c1 [IN] - The first character to compare.
    		 *
    		 *  @param c2 [IN] - The second character to compare.
    		 *
    		 *  @return bool - true if the characters are the same (ignoring case),
    		 *			otherwise false if they're different.
    		 */
    		bool operator()( E  c1, E  c2 ) const
    		{
    			return (ToLowerChar<E>( m_Locale )( c1 ) < ToLowerChar<E>( m_Locale )( c2 ));
    		}
    
    	private:
    		const std::locale&	m_Locale;	/**< This is the locale to use for character conversions. */
    	};
    
    	/** StringCompareNoCase
    	 *  This functor compares 2 STL strings without regard to case.
    	 */
    	template <typename E,
    			  typename T = std::char_traits<E>,
    			  typename A = std::allocator<E> >
    	class StringCompareNoCase	:	public std::binary_function<const std::basic_string<E, T, A>&,
    									    const std::basic_string<E, T, A>&,
    									    bool>
    	{
    	public:
    		/** StringCompareNoCase()
    		 *  A constructor for the StringCompareNoCase functor that sets the Locale to use when converting
    		 *  the characters to lower-case.
    		 *
    		 *  @param loc [IN] - The locale to use for character conversions.
    		 */
    		StringCompareNoCase( const std::locale&  loc = std::locale() )
    		:	m_Locale( loc ) {}
    
    		/** operator()
    		 *  Compares 2 STL strings without regard to case.
    		 *
    		 *  @param str1 [IN] - The first string to compare.
    		 *
    		 *  @param str2 [IN] - The second string to compare.
    		 *
    		 *  @return bool - true if the strings are the same (ignoring case),
    		 *			otherwise false if they're different.
    		 */
    		bool operator()( const std::basic_string<E, T, A>&  str1,
    						 const std::basic_string<E, T, A>&  str2 ) const
    		{
    			return std::lexicographical_compare( str1.begin(), str1.end(),
    							     str2.begin(), str2.end(),
    							     CharLessNoCase<E>( m_Locale ) );
    		}
    
    	private:
    		const std::locale&	m_Locale;	/**< This is the locale to use for character conversions. */
    	};
    Am I doing something wrong, or did I completely misunderstand what Scott was talking about?
    Last edited by cpjust; 02-20-2008 at 02:45 PM.

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    I think this sentence is missing a key element to understand your issue:
    it isn't working because lexicographical_compare() only "if the range of elements [start1,end1) is lexicographically less than the range of elements [start2,end2)"
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Portugal
    Posts
    7,412
    Is it possible that you are misunderstanding the function results?
    The function returns true if the first sequence is lexicographically (carpal tunnel alert!) less than the second sequence. Consequently, it will return false if it is greater or equal to the second sequence. You always obtain a good result.

    Unless you want to see if they are equal... in that case you apply !seq1<seq2 && !seq2<seq1. Naturally you only want to do this if you are using 2 sequences of a different type.
    The programmer’s wife tells him: “Run to the store and pick up a loaf of bread. If they have eggs, get a dozen.”
    The programmer comes home with 12 loaves of bread.


    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by CornedBee View Post
    I think this sentence is missing a key element to understand your issue:
    Crap, somehow the "returns true" got deleted when I was applying formatting...
    It should make sense now.

  5. #5
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    I'm thinking now that the lexicographical_compare version is only useful for sorting rather than testing:
    Code:
    if (str1 == str2) // except ignoring case.
    Maybe Scott just chose a confusing name for the function in his example?
    ciStringLess() might have been a better choice than ciStringCompare().

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    The name is simply confusing. Refer to the earlier part of the item, where he implements ciStringCompare() with the interface you expected. Then he implements a function of the same name with a different interface. Not smart. He then goes on to spread further confusion by directly comparing lexicographical_compare() with strcmp().
    Well, nobody's perfect.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by CornedBee View Post
    The name is simply confusing. Refer to the earlier part of the item, where he implements ciStringCompare() with the interface you expected. Then he implements a function of the same name with a different interface. Not smart. He then goes on to spread further confusion by directly comparing lexicographical_compare() with strcmp().
    Well, nobody's perfect.
    I think I E-mailed him about that same thing a couple years ago, but he just blew me off saying there's no problem there and everything is correct.

  8. #8
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,046
    Very pedantic note:
    (except I'm using the C++ tolower() function)
    tolower() is actually a C function that was inherited into C++. Not that it matters in any case.

    If you're interested in formatting, have you checked out codeform? Here's an online version: dwks.theprogrammingsite.com/myprogs/cfonline.htm

    If you don't like codeform, there are a few other highlighters around. maxorator wrote one in PHP, and Martin somebody wrote a closed-source, buggy but Win32 one in C++, apparently. There were a few other attempts as well.

    Or you can do what everyone else does and write your own!
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  9. #9
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by dwks View Post
    Very pedantic note:

    tolower() is actually a C function that was inherited into C++. Not that it matters in any case.
    The one in <cctype> is from C, but the one in <locale> is a template: http://msdn2.microsoft.com/en-us/lib...xy(VS.80).aspx

  10. #10
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,046
    Hmm, you're right. I didn't know that.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 5
    Last Post: 03-05-2009, 10:32 AM
  2. Inheritance Hierarchy for a Package class
    By twickre in forum C++ Programming
    Replies: 7
    Last Post: 12-08-2007, 03:13 PM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. lvp string...
    By Magma in forum C++ Programming
    Replies: 4
    Last Post: 02-26-2003, 11:03 PM
  5. Changing bkgrnd color of Child windows
    By cMADsc in forum Windows Programming
    Replies: 11
    Last Post: 09-10-2002, 11:21 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21