Thread: Split string with regular expression

  1. #16
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    A perfectly valid assumption for a function named StringIsUpper
    You are wrong. The function *actually* checks to see if the string is easily incorporated into a rap and emails Matthew Sobol's daemon. (along with other background tasks...) :P
    Fact - Beethoven wrote his first symphony in C

  2. #17
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Spent some time last reworking the code with some suggestions.

    I decided to just split on the space and check each token to see if it matches with the all words.

    Seems like this works exactly how I need it.

    Code:
    #include <regex.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    enum {
        FALSE,
        TRUE
    };
    
    typedef struct {
        char* name;
        char words[500];
    } Person;
    
    int main()
    {
        int start;
        int count;
    
        char* re;
        char* token;
        char* regex;
    
        char words[100][500];
        char* speakers[100];
    
        regmatch_t m[2];
        regex_t reg;
    
        size_t length;
    
        re = " ";
        regex = "^ *[A-Z]+ *: *(.*)";
    
        char text[1000] = "BOB: Hello there, my friend. ROBERT: How are you, friend? Have you heard about Justin's new game? SPEAKER: I don't know what you're talking about, homie. BOB: Yes, you know what I mean. BOB: I also know what I'm talking about.";
    
        token = strtok(text, re);
    
        start = TRUE;
    
        count = 0;
        while (token != NULL) {
    
            if (regcomp(&reg, regex, REG_EXTENDED) != 0) {
                perror("regcomp");
                exit(EXIT_FAILURE);
            }
    
            if (regexec(&reg, token, 2, m, 0) == 0) {
    
                // remove colon
                length = strlen(token);
                token[length - 1] = '\0';
    
                if (start == TRUE) {
                    start = FALSE;
                    speakers[count] = token;
                    token = strtok(NULL, re);
                } else {
                    count++;
                    speakers[count] = token;
                    token = strtok(NULL, re);
                }
            }
    
            strcat(words[count], token);
            strcat(words[count], " ");
    
            token = strtok(NULL, re);
        }
    
        Person people[count];
        Person p;
    
        for (int i = 0; i < count; i++) {
            p.name = speakers[i];
            strcpy(p.words, words[i]);
            people[i] = p;
        }
    
        for (int i = 0; i < count; i++) {
            printf("{%s | %s}\n", people[i].name, people[i].words);
        }
    }

  3. #18
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Could be easier if you use ':' and punctuation ('.!?') as delimiters to strtok. No need to regex.

  4. #19
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    This is just an example, but the real text I'm using contains colons. Here is an example:

    "BOB: I love the weather, but there is a problem: I hate going outdoors."

  5. #20
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Well, that's not necessarily a problem. What you haven't done is precisely define the input format. Regex is one way to do that, but of course any reasonable human-readable way to express the grammar could work.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #21
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Quote Originally Posted by laserlight View Post
    of course any reasonable human-readable way to express the grammar could work.
    Can you please elaborate a little on this?

  7. #22
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    You could use English, as long as you were sufficiently detailed.

    My point is that when you want to parse something, you need to be absolutely clear as to its format/grammar. Unless you literally mean that you only have that specific string in mind, saying "I'm trying to split this string" is not good. People will make suggestions based on what they think are the salient points of the example, but they could be wrong. So, save them the effort. Don't say in post #19 that "This is just an example, but the real text I'm using contains colons.". From the outset; state what the format is so people know that embedded colons are possible. Spaces between words? Multiple spaces that you want to retain? Whitespace other than spaces? Alphanumeric? How do we know to identify SPEAKER, SPEAKER2? Like, are they really SPEAKER followed by a colon or a number then a colon? Or could it be BOB?
    Last edited by laserlight; 06-20-2019 at 10:23 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Simple regular expression to find word in a string?
    By BC2210 in forum C Programming
    Replies: 1
    Last Post: 03-28-2010, 07:41 PM
  2. Regular Expression
    By csonx_p in forum Tech Board
    Replies: 8
    Last Post: 09-03-2008, 09:10 AM
  3. Regular Expression
    By stevesmithx in forum C Programming
    Replies: 0
    Last Post: 02-18-2008, 11:00 AM
  4. Regular Expression
    By tintifaxe in forum C++ Programming
    Replies: 3
    Last Post: 06-14-2006, 07:16 AM
  5. Regular Expression..
    By vasanth in forum Tech Board
    Replies: 3
    Last Post: 08-03-2004, 07:56 AM

Tags for this Thread