Thread: filename pattern matching

  1. #1
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057

    Lightbulb filename pattern matching

    Here's a function that matches a filter (like *.c) to a filename (like hippo.c).

    Code:
    int match(char *f, char *s) {
        char *t;
    
        while(1) {
            if(*f == '?' || *s == '?') ;
            else if(*f == '*') {
                t = s;
                while(1) {
                    if(match(f+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*s == '*') {
                t = f;
                while(1) {
                    if(match(s+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*f != *s) return 0;
    
            if(!*s || !*f) break;
            s++, f++;
        }
    
        return *f || *s ? 0 : 1;
    }
    It returns 1 if there's a match, 0 otherwise.

    There's nothing wrong with it (as far as I can see), I was just wondering how it could be improved. It's a little too slow for me.

    Before anyone points it out, the * matching could be removed from one of the arguments.
    Last edited by dwks; 07-12-2005 at 04:47 PM.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Here's a quick lazy way:
    Code:
    int match( char *f, char *s )
    {
        char *fend, *send;
        fend = f + strlen( f ) - 1;
        send = s + strlen( s ) - 1;
    
        while( *fend == *send && fend > f && send > s )
        {
            fend--;
            send--;
        }
    
        if( *fend == '*' )
            return 1;
        return 0;
    }
    Something like that? Or do you want stuff like this to work too?
    Code:
    int restult = match ( "hip*.c", "hippo.c" ); /* returns 1 */

    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    I want everything to work - like hi??o.exe matches hippo.*
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  4. #4
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    So what are all of your matching rules? You'll need to actually list what you consider a match first. Define the rules for matching and wild cards.


    Quzah.
    Hope is the first step on the road to disappointment.

  5. #5
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    All DOS pattern matching . . . Like ? matches any one character, and * matches zero or more characters.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  6. #6
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Code:
    #include <windows.h>
    #include <shlwapi.h>
    
    int match(char *f, char *s)
    {
        return PathMatchSpec(f, s);
    }

  7. #7
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    That's sort of what I'm trying to duplicate . . . but I don't have <windows.h>.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  8. #8
    Registered User major_small's Avatar
    Join Date
    May 2003
    Posts
    2,787
    you may want to check the boost libraries... I think they have libraries for what's called regular expressions (or REGEX for short)

    note: their site seems to be not working ATM...

    edit: it works, but it's slow as anything... (>2 minute load time)
    Join is in our Unofficial Cprog IRC channel
    Server: irc.phoenixradio.org
    Channel: #Tech


    Team Cprog Folding@Home: Team #43476
    Download it Here
    Detailed Stats Here
    More Detailed Stats
    52 Members so far, are YOU a member?
    Current team score: 1223226 (ranked 374 of 45152)

    The CBoard team is doing better than 99.16% of the other teams
    Top 5 Members: Xterria(518175), pianorain(118517), Bennet(64957), JaWiB(55610), alphaoide(44374)

    Last Updated on: Wed, 30 Aug, 2006 @ 2:30 PM EDT

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Aren't the Boost libraries for C++? At any rate, yours fails a few tests it should pass. At least from what I can gather by your rules. Maybe not? Any way, mine's a bit different. Perhaps not better. Either way, they don't always give the same results, so one of us is wrong. :P
    Code:
    #include<stdio.h>
    #include<string.h>
    
    int match(char *f, char *s) {
        char *t;
    
        while(1) {
            if(*f == '?' || *s == '?') ;
            else if(*f == '*') {
                t = s;
                while(1) {
                    if(match(f+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*s == '*') {
                t = f;
                while(1) {
                    if(match(s+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*f != *s) return 0;
    
            if(!*s || !*f) break;
            s++, f++;
        }
    
        return *f || *s ? 0 : 1;
    }
    
    int qmatch( char *f, char *s )
    {
        while( *f && *s )
        {
            if( *f == '*' )
            {
                while( *f && (*f == '*' || *f == '?') ) f++;
                while( *s && *s != *f )*s++;
            }
            if( !*s || !*f ) break;
            if( *s == '*' )
            {
                while( *s && (*s == '*' || *s == '?') ) s++;
                while( *f && *f != *s )*f++;
            }
            if( !*s || !*f ) break;
            if( *f == *s || *f == '?' || *s == '?' )
            {
                f++;
                s++;
            }
            else
                return 0;
        }
        return (*f || *s) || (*f==*s) ? 1 : 0;
    
    }
    
    int main( void )
    {
        char *s[] =
        {
            "foo.c",
            "?oo.c",
            "*.c",
            "f?*",
            "?*?",
            "*",
            "?.*",
            "foo.*"
        };
    
        char *f[] =
        {
            "foo.c",
            "?oo.c",
            "*.c",
            "?*.c",
            "*?.?",
            "foo.?",
            "foo.*",
            "f?o?c",
            "f*o*"
            "*",
            "?????"
        };
    
        int x,y;
    
        for( y = 0; y < sizeof f / sizeof f[0]; y++ )
        {
            for( x = 0; x < sizeof s / sizeof s[0]; x++ )
            {
                printf("%8s vs %8s = %d, %d\n", f[ y ], s[ x ],
                    match( f[ y ], s[ x ] ),
                    qmatch( f[ y ], s[ x ] )
                );
            }
    
        }
    
        return 0;
    }
    /*
       foo.c vs    foo.c = 1, 1
       foo.c vs    ?oo.c = 1, 1
       foo.c vs      *.c = 1, 1
       foo.c vs      f?* = 1, 1
       foo.c vs      ?*? = 1, 1
       foo.c vs        * = 1, 1
       foo.c vs      ?.* = 0, 0
       foo.c vs    foo.* = 1, 1
       ?oo.c vs    foo.c = 1, 1
       ?oo.c vs    ?oo.c = 1, 1
       ?oo.c vs      *.c = 0, 1
       ?oo.c vs      f?* = 1, 1
       ?oo.c vs      ?*? = 1, 1
       ?oo.c vs        * = 0, 1
       ?oo.c vs      ?.* = 0, 0
       ?oo.c vs    foo.* = 1, 1
         *.c vs    foo.c = 1, 1
         *.c vs    ?oo.c = 0, 1
         *.c vs      *.c = 1, 1
         *.c vs      f?* = 1, 1
         *.c vs      ?*? = 1, 1
         *.c vs        * = 1, 1
         *.c vs      ?.* = 1, 1
         *.c vs    foo.* = 1, 1
        ?*.c vs    foo.c = 1, 1
        ?*.c vs    ?oo.c = 1, 1
        ?*.c vs      *.c = 1, 1
        ?*.c vs      f?* = 1, 1
        ?*.c vs      ?*? = 1, 1
        ?*.c vs        * = 0, 1
        ?*.c vs      ?.* = 1, 1
        ?*.c vs    foo.* = 1, 1
        *?.? vs    foo.c = 1, 1
        *?.? vs    ?oo.c = 0, 1
        *?.? vs      *.c = 1, 1
        *?.? vs      f?* = 1, 1
        *?.? vs      ?*? = 0, 1
        *?.? vs        * = 0, 1
        *?.? vs      ?.* = 1, 1
        *?.? vs    foo.* = 1, 1
       foo.? vs    foo.c = 1, 1
       foo.? vs    ?oo.c = 1, 1
       foo.? vs      *.c = 1, 1
       foo.? vs      f?* = 1, 1
       foo.? vs      ?*? = 1, 1
       foo.? vs        * = 1, 1
       foo.? vs      ?.* = 0, 0
       foo.? vs    foo.* = 1, 1
       foo.* vs    foo.c = 1, 1
       foo.* vs    ?oo.c = 1, 1
       foo.* vs      *.c = 1, 1
       foo.* vs      f?* = 1, 1
       foo.* vs      ?*? = 1, 1
       foo.* vs        * = 1, 1
       foo.* vs      ?.* = 0, 0
       foo.* vs    foo.* = 1, 1
       f?o?c vs    foo.c = 1, 1
       f?o?c vs    ?oo.c = 1, 1
       f?o?c vs      *.c = 1, 1
       f?o?c vs      f?* = 1, 1
       f?o?c vs      ?*? = 0, 1
       f?o?c vs        * = 1, 1
       f?o?c vs      ?.* = 1, 1
       f?o?c vs    foo.* = 1, 1
       f*o** vs    foo.c = 1, 1
       f*o** vs    ?oo.c = 1, 1
       f*o** vs      *.c = 1, 1
       f*o** vs      f?* = 1, 1
       f*o** vs      ?*? = 1, 1
       f*o** vs        * = 1, 1
       f*o** vs      ?.* = 1, 1
       f*o** vs    foo.* = 1, 1
       ????? vs    foo.c = 1, 1
       ????? vs    ?oo.c = 1, 1
       ????? vs      *.c = 0, 1
       ????? vs      f?* = 0, 1
       ????? vs      ?*? = 0, 1
       ????? vs        * = 0, 1
       ????? vs      ?.* = 0, 1
       ????? vs    foo.* = 1, 1
    */
    Mine was an attempt to really cover all the cases I could think up for valid matching. I don't know if they fit all of your rules, but that's my take on it.

    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Let's stop this madness; only one string should be parsed for wildcards; the filename should not.

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    That wasn't a requirement given. Thus, I matched his code, except mine works. As I recall, you weren't the one setting the rules for this little event.


    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Quzah, you're right: mine was wrong. The if(*s=='?' || *f=='?') was in the wrong place.

    Code:
    int match(char *f, char *s) {
        char *t;
    
        while(1) {
            if(*f == '*') {
                t = s;
                while(1) {
                    if(match(f+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*s == '*') {
                t = f;
                while(1) {
                    if(match(s+1, t)) return 1;
                    if(!*t++) break;
                }
    
                return 0;
            }
            else if(*f == '?' || *s == '?') ;
            else if(*f != *s) return 0;
    
            if(!*s || !*f) break;
            s++, f++;
        }
    
        return *f || *s ? 0 : 1;
    }
    If there was a ?, it ignored the other character - even if it was a *. This should work - run it through your test again.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 5
    Last Post: 06-01-2009, 07:54 PM
  2. Pattern matching
    By GSLR in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 11-06-2003, 07:17 AM
  3. Going out of scope
    By nickname_changed in forum C++ Programming
    Replies: 9
    Last Post: 10-12-2003, 06:27 PM
  4. pattern matching
    By ahahplz in forum C Programming
    Replies: 5
    Last Post: 02-07-2003, 07:15 PM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM