# Thread: repeating strings

1. ## repeating strings

if i was given a long text file completely made of random characters, how would i find portions of it that repeats itself?

Ive spent about 2 weeks looking through this forum, googling and trying to find a solution myself but still haven't gotten anywhere

2. How would you do it by hand?

3. id loop through it searching for every possibility but is there a simpler way?

4. How do you define "portions that repeats itself"?

Suppose you had "hfgdkgF". Does g repeat itself?

Or do you mean "abcdeabcdeabcde" where the file consists of "abcde" repeated?

5. the latter

6. Does the repeated pattern start from the beginning of the file, and does the file end with the full sequence? That is, are strings like "qwertyabcdeabcde" and "abcdeabcdeabcd" excluded?

If so, one approach would rely on the fact that the length of the repeating sequence must be evenly divisible by the length of the file. So you can check only those sequences that satisfy that condition.

"abacaabacaabacaabaca"
1) length (20) divisible by 2: check "ab"
2) length divisible by 4: check "abac"
3) length divisible by 5: check "abaca" - OK

Or, you might find the second occurrence of the first character. Then you can check if the string from first character up to the second occurrence of the first character is the repeating sequence. If not, find the third occurrence and repeat. etc.

"abacaabacaabacaabaca"
1) 2-nd occurrence is char #3, check "ab"
2) 3-rd occurrence is char #5, check "abac"
3) 4-th occurrence is char #6, check "abaca" - HEUREKA!

7. >>a long text file completely made of random characters

like in "sjhgkdapgbjsomgvjtgdjthcgvjdrshhdapgskd"
where "gvj" and "dapg" repeats twice

8. So "gv", "vj", "dap", "apg", "da", "ap", "pg" also all repeat as well, right? Do you have to find all those? What about the single letter strings that repeat, is there a minimum string size of two, three or something else? Is there a maximum?

9. minimum of 2 characters appearing at least twice. I know how to find single characters that repeat

10. I know how to find single characters that repeat
Then maybe you should check the next character in the first and second occurence and decide if this is the sequence of two characters that repeates?

Popular pages Recent additions