Thread: repeating strings

  1. #1
    Registered User
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    267

    Unhappy repeating strings

    if i was given a long text file completely made of random characters, how would i find portions of it that repeats itself?

    Ive spent about 2 weeks looking through this forum, googling and trying to find a solution myself but still haven't gotten anywhere

    OS: Windows 7, XUbuntu 11.10, Arch Linux
    IDE: CodeBlocks
    Compiler: GCC

  2. #2
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    How would you do it by hand?

  3. #3
    Registered User
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    267
    id loop through it searching for every possibility but is there a simpler way?

    OS: Windows 7, XUbuntu 11.10, Arch Linux
    IDE: CodeBlocks
    Compiler: GCC

  4. #4
    The larch
    Join Date
    May 2006
    Posts
    3,573
    How do you define "portions that repeats itself"?

    Suppose you had "hfgdkgF". Does g repeat itself?

    Or do you mean "abcdeabcdeabcde" where the file consists of "abcde" repeated?

  5. #5
    Registered User
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    267
    the latter

    OS: Windows 7, XUbuntu 11.10, Arch Linux
    IDE: CodeBlocks
    Compiler: GCC

  6. #6
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Does the repeated pattern start from the beginning of the file, and does the file end with the full sequence? That is, are strings like "qwertyabcdeabcde" and "abcdeabcdeabcd" excluded?

    If so, one approach would rely on the fact that the length of the repeating sequence must be evenly divisible by the length of the file. So you can check only those sequences that satisfy that condition.

    "abacaabacaabacaabaca"
    1) length (20) divisible by 2: check "ab"
    2) length divisible by 4: check "abac"
    3) length divisible by 5: check "abaca" - OK

    Or, you might find the second occurrence of the first character. Then you can check if the string from first character up to the second occurrence of the first character is the repeating sequence. If not, find the third occurrence and repeat. etc.

    "abacaabacaabacaabaca"
    1) 2-nd occurrence is char #3, check "ab"
    2) 3-rd occurrence is char #5, check "abac"
    3) 4-th occurrence is char #6, check "abaca" - HEUREKA!
    Last edited by anon; 11-28-2006 at 06:22 PM.

  7. #7
    Registered User
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    267
    >>a long text file completely made of random characters

    like in "sjhgkdapgbjsomgvjtgdjthcgvjdrshhdapgskd"
    where "gvj" and "dapg" repeats twice

    OS: Windows 7, XUbuntu 11.10, Arch Linux
    IDE: CodeBlocks
    Compiler: GCC

  8. #8
    Registered User
    Join Date
    Jan 2005
    Posts
    7,366
    So "gv", "vj", "dap", "apg", "da", "ap", "pg" also all repeat as well, right? Do you have to find all those? What about the single letter strings that repeat, is there a minimum string size of two, three or something else? Is there a maximum?

  9. #9
    Registered User
    Join Date
    Dec 2005
    Location
    Canada
    Posts
    267
    minimum of 2 characters appearing at least twice. I know how to find single characters that repeat

    OS: Windows 7, XUbuntu 11.10, Arch Linux
    IDE: CodeBlocks
    Compiler: GCC

  10. #10
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    I know how to find single characters that repeat
    Then maybe you should check the next character in the first and second occurence and decide if this is the sequence of two characters that repeates?
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Strings Program
    By limergal in forum C++ Programming
    Replies: 4
    Last Post: 12-02-2006, 03:24 PM
  2. Programming using strings
    By jlu0418 in forum C++ Programming
    Replies: 5
    Last Post: 11-26-2006, 08:07 PM
  3. Problems with strings as key in STL maps
    By all_names_taken in forum C++ Programming
    Replies: 3
    Last Post: 01-17-2006, 11:34 AM
  4. Reading strings input by the user...
    By Cmuppet in forum C Programming
    Replies: 13
    Last Post: 07-21-2004, 06:37 AM
  5. menus and strings
    By garycastillo in forum C Programming
    Replies: 3
    Last Post: 04-29-2002, 11:23 AM