Thread: String hashing

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    String hashing

    Is there any way to design a hash function to detect strings of type:

    abcdef, defabc

    Notice that one is formed just by rotation of the other .. is it possible to make a hash function H such that

    H("abcdef")=H("defabc")

    Thank you

  2. #2
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Yes but do you know the axis of rotation which splits the array into two parts which are then swapped.

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    Re

    Hmm..thanks for your reply

    But cant if be generalized ? As in

    H("abc")=H("bca")=H("cba")=... etc

    ??

  4. #4
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    Sure, it CAN be generalized, but that doesn't mean it would do what you need. You need to be more specific.

    For example, you could just add up the numerical value of each character in the string and consider that the "checksum". Then H("abcdef") == H("defabc") would be true. However, that wouldn't create a UNIQUE sum for each combination of letters*, and the sum would vary depending on the character set used of the host machine**. So again - it depends on exactly what you want to do.

    * H("bc") == H("ad") would also be true, to cite just one example

    ** If one used ASCII, and the other didn't

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well simple addition works, if you can tolerate "abcd" being the same as "aadd"

    Failing that, sort the characters and then hash, then you remove the internal ordering.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    sorting wont work

    Hmm..sorting wont work because it will give same result for string like "abad" & "aabd"

  7. #7
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    So what you're saying is that the string has to be exactly the same except for one rotation at any arbitrary location in the string? You need to be way more specific - we can't guess what you're trying to do. Come up with the exact rules, and then put that into pseudocode, and then put that into C.

  8. #8
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    Re

    sorry for being so vague...
    Here is exactly what i want to implement

    If two strings are gives say A,B i want to know if B can be obtained just by pure rotation for charecters in A. I know alternate ways to do this such as brute force or using factoradics ... etc
    But some one told me that hashing is one the ways to do this efficienlty

    ex: given "abcde" .. possible rotations are "bcdea","cdeab","deabc","eabcd'.

    So given any two strings from the set S={"abcde", "bcdea","cdeab","deabc","eabcd"} i need a function that will return true
    H("abcde")==H("deabc") ...etc
    H("abcde")!=H("eadbc") since eadbc cannot be obtained by rotating "abcde"

    hope u get it

    thank u
    Last edited by jack_carver; 09-10-2009 at 12:00 PM.

  9. #9
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    So if I get it the only strings that matches "abcde" in set S is "cdeab" and "bcdea" and "deabc" obtained with a single rotation of the original substring?
    Last edited by itCbitC; 09-10-2009 at 12:09 PM.

  10. #10
    Registered User
    Join Date
    Jan 2009
    Posts
    71

    Re

    the number of rotations do not matter ...

    it should return "equal" for any two strings in that set

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Are you always limited to a small alphabet, with a limited string length?
    I doubt you'll get a unique function that will work on any length of string (lets say a DNA fragment).

    One simple way would be to strcat() a copy of one string to itself, then use strstr() on the other.

    Say
    Code:
    strcpy(temp,"cdeab");
    strcat(temp,"cdeab");
    // find cdeabcdeab
    if ( strstr( temp, "abcde" ) ) {
      // match
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Registered User
    Join Date
    Sep 2001
    Posts
    4,912
    Hmm..sorting wont work because it will give same result for string like "abad" & "aabd"
    Unless I'm misunderstanding rotations, can't you derive one of those strings by rotations on the other?

  13. #13
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by jack_carver View Post
    the number of rotations do not matter ...

    it should return "equal" for any two strings in that set
    IMHO what you're trying to do is break up the string into two (un)equal parts and switch those around.
    Now you want to test if this flipped-around string is the same as the original one. Isn't that correct?

  14. #14
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by jack_carver View Post
    If two strings are gives say A,B i want to know if B can be obtained just by pure rotation for charecters in A. I know alternate ways to do this such as brute force or using factoradics ... etc
    But some one told me that hashing is one the ways to do this efficienlty
    That's correct, it will help the process to be efficient. However what the hash does is tell you if of the two strings one definitely cannot be rotated to turn one into the other. It does not tell you if one can be rotated to be the other.
    In other words, you use it as a fast rejection method, and in the rare cases where a hash match is found, you do the same hard work you were going to without hashing.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  15. #15
    Registered User
    Join Date
    Jan 2009
    Posts
    71
    @ iMalc
    However what the hash does is tell you if of the two strings one definitely cannot be rotated to turn one into the other. It does not tell you if one can be rotated to be the other.
    How can i make such a function ?

    Thank u

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. char Handling, probably typical newbie stuff
    By Neolyth in forum C Programming
    Replies: 16
    Last Post: 06-21-2009, 04:05 AM
  2. Replies: 4
    Last Post: 03-03-2006, 02:11 AM
  3. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  4. creating class, and linking files
    By JCK in forum C++ Programming
    Replies: 12
    Last Post: 12-08-2002, 02:45 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM