Thread: Incremental alphabet from charset

  1. #1
    Registered User bradszy's Avatar
    Join Date
    Jan 2008
    Posts
    114

    Incremental alphabet from charset

    I've been trying to do this for a while, with no luck.
    I've got a charset,
    Code:
    char charset[1024];
    That always contains 2 or more characters.
    I've been trying to take the chars from the charset and make words from them, incrementally to a certain length.
    It's hard to explain.
    I have an array, AKA
    Code:
    char charset[1024];
    Lets say it contains
    Code:
    abc
    and the max length is 2, ill get:
    Code:
    a
    b
    c
    aa
    ab
    ac
    ba
    bb
    bc
    ca
    cb
    cc
    ----------------------------------------
    I've had no luck.
    Could somebody shed some light on this?

    EDIT:
    By the way, i'm not asking for code, just a few tips.
    Last edited by bradszy; 06-28-2008 at 02:57 AM.
    OS: Windows XP Home Edition SP3, Windows 7 Ultimate Beta Build 7000
    LANGUAGES: C++, VB6
    SKILL: Novice/Intermediate

  2. #2
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Create a string of length 1. Store the first character. Repeat this 1024 times.

    Create a string of length 2. Store the first character. Treat this character as the "main" character for this loop. Then put another character into the string. Repeat the above 1024-1 (or better 1024+1-length). Each time you repeat you keep the main character of course the same.
    Create a string of length 2. Store the second character. Etc etc.

    Create a string of length of 3. Store the first character. Treat this character as the "main" character for this loop. Then put another character into the string. Treat it as second "main". Then put another character into the string. Repeat the above 1024-2 (or better 1024+1-length). Each time you repeat you keep the main characters of course the same.
    Create a string of length 3. Store the second character. Etc etc.

    Do this until max length and you will have every possible sequence of characters.

    You will need about O(n!) time for this.

    Now this is kind of obvious. I suppose you have the idea but don't use the appropriate code? So you want to know how the above can be written in C/C++?

  3. #3
    Registered User bradszy's Avatar
    Join Date
    Jan 2008
    Posts
    114
    I suppose you have the idea but don't use the appropriate code? So you want to know how the above can be written in C/C++?
    Yeah, but I definitely do not want a full source.
    Just an idea of how to start it.
    Last edited by bradszy; 06-28-2008 at 03:26 AM.
    OS: Windows XP Home Edition SP3, Windows 7 Ultimate Beta Build 7000
    LANGUAGES: C++, VB6
    SKILL: Novice/Intermediate

  4. #4
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Well, not much time to think about it, a lot of exams await me . I m sure someone here already knows such an algorithm. It is a common think after all.
    I ll get to this tomorrow maybe

  5. #5
    Registered User bradszy's Avatar
    Join Date
    Jan 2008
    Posts
    114
    Thanks mate!
    Good luck with your exams.
    OS: Windows XP Home Edition SP3, Windows 7 Ultimate Beta Build 7000
    LANGUAGES: C++, VB6
    SKILL: Novice/Intermediate

  6. #6
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    I don't see how to implement C_ntua solution without having to write code specific to each string length. That's it, you would have to write some code for generating all the string of length 1, then write some other code for generating all the string of length 2, etc, until you are bored. That makes me think there should be a better way/algorithm for doing this -- but maybe there's a clever implementation of C_ntua I just don't see.

    I see quite different ways to resolve your problem "elegantly", but most have poor spatial complexity. Let's say N is the maximum length of your generated strings and M the number of characters in your "charset". In the example you gave, N would be 2 and M would be 3. Well, the first way I tough off was about building a tree of depth N, where every node has M childs, the root node being kind of special. So, staying with your example, we would build a tree that would looks like this (sorry for the bad looking ASCII-art

    Code:
               root
           /    |    \
          /     |     \
         /      |      \
       a        b        c
     / | \    / | \    / | \
    a  b  c  a  b  c  a  b  c
    Once built, to generate every "permutations", you would basically only had to do a breadth-first search, and on every node, print the path from the root to the current node. The problem is, even if it's not really complex to implement if you know trees, it's still a bit "overkill" and will take a good time to write, plus the (really) bad complexity. This just can't be a great solution.

    So, I came up with another idea. How about using numbers (with a logical representation in base M), where each digit would be associated with a character from your set. So, to generates every possible combination, you would only have to increment your number, then "split" his digits and print the associated character. Staying with our example:

    Code:
    Str  Number  Number
         base 3  base 10
    a    0       0
    b    1       1
    c    2       2
    aa   00      0
    ab   01      1
    ac   02      2
    ba   10      3
    bb   11      4
    bc   12      5
    ca   20      6
    cb   21      7
    cc   22      8
    In our example, a is associated with the digit 0, b is associated with the digit 1 and c is associated with the digit 2. So, from a number in base M, you can "derive" a string.

    But there's a little problem. How do we know if 0 should yield to the string "a" or "aa" or "aaaaaa". Well, we can't, but this isn't a big problem anyway, so let's not care.

    Knowing that if you want to generate every possible strings of length I with M characters, you will get M^I different strings, you can easily calculate where your "number" has to stop for every length of string. This solution has a really good spatial complexity, is fairly easy to implement and will work whatever the length of the string you want to generate. Just be careful about the possible overflow of the integer you chose to hold the "number"; e.g, if you chose an unsigned int for holding this number, you need that UINT_MAX > M ^ N.

    Anyway, I tested it, and it worked, like I tough. I'm giving the solution (everyone will hate me on this board ), but --

    -- well, in fact, no, I'll not be giving the solution. Except if you ask.

    Also, if you can think of another solution, try it, there's numerous of them.
    I hate real numbers.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    This is how you do it:

    Start off with the empty string.
    Repeat until you reach the maximum length desired:
    For ALL strings produced so far:
    Produce more strings with the new char in position 0 to strlen(x)
    End repeat


    So initially you have ""
    The new char is 'a'. We put this in the only position that exists in the empty string, and produce "a".
    Now we have two strings: "" and "a".

    We now repeat with 'b'
    Inserting 'b' into "" is trivial and gives us "b"
    Then we insert 'b' into all possible placs in "a". This gives us "ab" and "ba"
    We now have 5 strings: "", "a", "b", "ab", "ba"

    We now repeat with 'c'
    Inserting 'c' into "" is again trivial and gives us "c"
    Then we insert 'c' into all possible placs in "a". This gives us "ac" and "ca"
    Then we insert 'c' into all possible placs in "b". This gives us "bc" and "cb"
    Then we insert 'c' into all possible placs in "ab". This gives us "abc", "acb", and "cab"
    Then we insert 'c' into all possible placs in "ba". This gives us "bac", "bca", and "cba"
    We now have 15 strings:
    "", "a", "b", "ab", "ba", "ac", "ca", "bc", "cb", "abc", "acb", "cab", "bac", "bca", "cba".

    You could now proceed to insert 'd' into every string produced so far, in every possible place.
    You can sort the results however you wish. The above is simply the order they were created.
    Last edited by iMalc; 06-28-2008 at 03:38 PM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Here's another way to do it:

    Start with an empty string or with every different strings of length 1. Push them in a queue.
    While the queue isn't empty
    • Pop the next value of the queue. Do whatever you want with the generated string (print/store).
    • Push in the queue the string you just popped, concatenated with every characters, if the original string is shorter than the max length.


    Example. You start with a queue with the string "a", "b", "c".
    You pop the string "a". Let's say you print it. Than you push "aa", "ab", "ac"
    You pop the string "b". You print it. Than you push "ba", "bb", "bc"
    You pop the string "c". You print it. Than you push "ca", "cb", "cc"
    You pop the string "aa". You print it. Since "aa".length == 2 and you don't want to look for string longer than 2 characters, you don't push any new strings.
    Etc...
    I hate real numbers.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Oh I missed that you wanted to allow duplicate letters. In that case it's trivial; foxman has the answer.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User bradszy's Avatar
    Join Date
    Jan 2008
    Posts
    114
    Thanks for your help everyone!
    I got it made, I used for loops, indexes, and 3 counter variables.
    OS: Windows XP Home Edition SP3, Windows 7 Ultimate Beta Build 7000
    LANGUAGES: C++, VB6
    SKILL: Novice/Intermediate

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How to determine a DSL in an Alphabet using C codes
    By sonamjha in forum C Programming
    Replies: 4
    Last Post: 05-19-2008, 05:34 AM
  2. shorter way to represent alphabet
    By cdkiller in forum C++ Programming
    Replies: 5
    Last Post: 09-30-2006, 12:38 PM
  3. [help]count alphabet
    By PUI in forum C Programming
    Replies: 12
    Last Post: 09-25-2006, 07:17 AM
  4. incremental save
    By paludez in forum C++ Programming
    Replies: 4
    Last Post: 05-25-2005, 09:47 PM
  5. i,j, n,m, x,y,z ... We dont know the alphabet
    By lightatdawn in forum A Brief History of Cprogramming.com
    Replies: 11
    Last Post: 11-20-2002, 05:46 AM