Hello,

I'm trying to rewrite an old (~94) function in a program. The function is called reg_assoc. The rewrite includes changing the function from using the old regexp engine to the external PCRE library. This I have already successfully done. However I believe that the function would benefit from a total rewrite instead, and this is where I need some assistance.

The program is a VM, and in the interpreted language (LPC) the function takes 1 string argument (the subject), an array of strings (the patterns), an array of mixed (tokens), and an optional mixed (no match token). The function works through matching each pattern to the subject and exploding the subject string on match, each match corresponds to a token.

Function application (in LPC): Reg assoc - Desolation
C source Code (last function): http://svn2.assembla.com/svn/aml/tru...s2.018/array.c

Trivial working example of function
The string "abcdefgh" will be matched against the regular expressions ({ "bc", "e", "gh" }). The token array is ({ "bc-match", "e-match", "gh-match" }). The fourth and optional argument is the string "no-match". The return value of the function is an array with 2 sub arrays, the first corresponding to all exploded string bits, the second is the token array. Eg. arr1[n] has the token arr2[n].

Code:
reg_assoc("abcdefgh", ({ "bc", "e", "gh" }), ({ "bc-match", "e-match", gh-match"}), "no-match");
             ^                  ^                                 ^                       ^
          subject             regexps                          tokens                nomatch token
Steps (with LPC datatypes)
  1. Match "abcdefgh" against "bc":
    Code:
    ({ ({ "a", "bc", "defgh"}), ({ "no-match", "bc-match", "no-match"}) })
  2. Match "e" against all non-matches from previous step
    Code:
     ({ ({ "a", "bc", "d", "e", "fgh"}), ({ "no-match", "bc-match", "no-match", "e-match", "no-match"}) })
  3. Match "gh" against all non-matches from previous step
    Code:
     ({ ({ "a", "bc", "d", "e", "f", "gh" }), ({ "no-match", "bc-match", "no-match", "e-match", "no-match", "gh-match"}) })


When all regular patterns have executed against the subject the function ends.

So I wonder. How would you have solved it? Feels like it could be done with recursion in some way. My biggest problem though is trying to find a good way to store all the intermediate data, and the overall structure that would be best suited. Any help would be appreciated.