Thread: simple string, search in input

  1. #1
    Registered User
    Join Date
    Aug 2005
    Posts
    6

    simple string, search in input

    Hello everybody,

    I have made an activeworlds bot which have to react on chat strings.
    It has to filter badwords out. The activeworlds software prevents to do that realtime, but reacting on badwords that are said is possible.

    If I use this if statement:
    Code:
    if (aw_string(AW_CHAT_MESSAGE)=="badword")
    The problem is this only works when someone 'only' says "badword" and not when saying "hello badword".

    I got a suggestion from a friend:
    Code:
    char* msg = aw_string
    
    (AW_CHAT_MESSAGE);
    if (strcmp ("badword", msg) == 0) {
    But here also the bot only reacts when I say "badword" and not when saying "hello badword".


    I must admit i'm not very good in C++, more like just beginning. I've worked myself trough some examples and the tutorials and faq on this website, but couldn't find an answer for this.
    If any one knows a place where this is explained then I would love to hear!

  2. #2
    Registered User
    Join Date
    May 2002
    Posts
    66
    strstr() finds a string in a string.

  3. #3
    Registered User Tonto's Avatar
    Join Date
    Jun 2005
    Location
    New York
    Posts
    1,465
    Unless we are really bound by char*'s here, I would recommend std::string for C++ programming. And if we made this conversion, we could use the find to search for badword substrings. But with these kind of things if I said...um...like, "bass" it would target that. So what you could do is tokenize the string using the stringstream technqiue described in our very own FAQ entry on the subject

  4. #4
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    There are always tradeoffs in doing such a search. Looking for substrings that you want to indentify within the original string is the basic principle (achacha gives one example of how this can be done), but what if the "badword" is embedded in another word. Would you want to identify "assiduous" if the first three letters met screening criteria? Still, parsing strings for substrings is a time honored and often used algorhithm with many alternative strategies based on individual needs, so continue on your journey; just beware you may find potholes along the way.
    You're only born perfect.

  5. #5
    Registered User
    Join Date
    Aug 2005
    Posts
    6
    Thanks for your answers!

    I now use this piece of code:
    Code:
       char* msg = aw_string
    
       (AW_CHAT_MESSAGE);
       if (!strstr (msg,"badword") == 0) {
    
       sprintf (message, "you shouldn't say that", aw_string (AW_AVATAR_NAME), (WORLD));
    Which works perfectly, it reacts on "badword", "hi badword" ánd "hibadword". Just what I need


    But now another question comes up. Some people will report words that they normaly use which will conflict with my "badword" search. So it should be possible to also exclude words out. Can this be done by after this check, checking for words that are ok and then skip the warning?


    Now I'm thinking about this... maybe working with lists of words is more efficient, like in a badwords (and goodwords) lists, just in a .txt file.
    I've looked at the ifstream tutorial on the website, that doesn't seem very difficult I could read and write to files.

    But how do I get to check if the message (said in the chat) contains any word from that file? I know a bit of the array command from php, can I put the words in the file in some kind of array and then check the chat message for that array?

  6. #6
    Banned
    Join Date
    Jun 2005
    Posts
    594
    id use a string and find with a vector holding all possiblies to
    search for, and have the vertor filled by a file contain the words
    to be edited out. but making a file to contain all possible words
    will take more effort then right this code.
    if your interested in this but dont have an idea on what the code
    woudl look like, i can/will provide you with a sample.

  7. #7
    Cat Lover
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    109
    using c++ strings and find

    Code:
    if(message.find(badword, 0) != string::npos)
    {
        //stuff to do here
    }
    That will find in a word. If you didn't want to find in a word, only if the word's by itself, could try just adding on a space to the start and end and searching for that
    Code:
    badword = " "+badword+" ";
    if(msg.find...
    checking for exceptions can also be done
    Code:
    if(msg.find(badword, 0) != string::npos && msg.find(exception) == string::npos)
    I'll show you an example using a file of bad words later on today maybe if I get time and noone else has, but it's easy enough to do.

  8. #8
    Cat Lover
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    109
    Ok, from a file, it's easy enough. You'd just have your file of bad words, in a lovely little list separated by new lines or spaces, then just read them into an array.

    Code:
    ifstream in_file;
    in_file.open("badwords.txt");
    string bad_words[500];
    int ncnt = 0;
    while(in_file.good())
    {
        in_file >> bad_words[ncnt];
        ncnt++;
    }
    Simple . Then from there you can just check against them in a loop.

  9. #9
    Registered User
    Join Date
    Aug 2005
    Posts
    6
    Thanks Dweia! This works ok now.
    Only I haven't got much of an idea about how to check this in a loop. The tutorials don't provide me the information I'm looking for (or I'm not looking in a good way). Can someone help me further with any example of how to check the said chat message on the array of badwords?

  10. #10
    Cat Lover
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    109
    Code:
    while(whatever)
    {
        if(msg.find(bad_words[ncnt], 0) != string::npos)
        {
            //your stuff here
        }
        ncnt++;
    }

  11. #11
    Registered User
    Join Date
    Mar 2002
    Posts
    1,595
    Other examples:

    Goal: search a vector of strings to see if current string is in vector of strings and if so do something, otherwise do nothing.

    Code:
    //USING C++ with STL stuff
    std::vector<string> badwords;
    //fill badwords with individual words that will trigger an action here
    std::ifstream fin("myFile.txt");
    std::string temp;
    std::vector<string>::iterator start = badwords.begin();
    std::vector<string>::iterator stop = badwords.end();
    while(fin >> temp)
    {
        for( ; start != stop; ++start)
        {
           if(temp == *start)
              cout << "Naughty, naughty." << std::endl;
       }
    }
    Code:
    //USING C++ without STL 
    char badwords[100][20];
    //fill badwords with desired vocabulary here
    ifstream fin("myFile.txt");
    char temp[20];
    int i = 0;
    while(fin >> temp)
    {
       for(i = 0; i < 100; ++i)
      {
         if(strncmp(temp, badwords[i]) == 0)
            cout << "Naughty, naughty" << endl;
      }
    }
    You're only born perfect.

  12. #12
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Quote Originally Posted by lode
    Thanks for your answers!

    I now use this piece of code:
    Code:
       char* msg = aw_string
    
       (AW_CHAT_MESSAGE);
       if (!strstr (msg,"badword") == 0) {
    
       sprintf (message, "you shouldn't say that", aw_string (AW_AVATAR_NAME), (WORLD));
    Which works perfectly, it reacts on "badword", "hi badword" ánd "hibadword". Just what I need
    Does it filter "Badword"
    What about "baDword"?
    What if they write "baDw0rd"?
    What about "B A D W O R D"?

    And there are things like fvck and f\/ck. And f\_/ck.

    I recommend at least converting characters to lowercase before comparing them. This should at least stop the idiots who curse because it's thrilling or something, without disrupting the "geniunely profane."
    Last edited by Rashakil Fol; 08-04-2005 at 11:47 AM.

  13. #13
    Registered User
    Join Date
    Aug 2005
    Posts
    6
    Quote Originally Posted by elad
    Other examples:
    ...
    I tried many different ways of making this while, but didn't manage to get it working, accept for this last one you provided. I now got this part:

    Code:
    char badwords[20][20];
    ifstream file_bwchk ("badwords.txt");
    char temp[20];
    strcpy (temp, chattext);
    int i = 0;
    cout << temp << endl;
    while(file_bwchk >> temp) {
     for(i = 0; i < 20; ++i) {
       if(strncmp(temp, badwords[i]) == 0) {
         cout << "Naughty, naughty" << endl;
       }
     }
    }
    This one gives me this error:
    /usr/include/string.h:100: error: too few arguments to function `int
    strncmp(const char*, const char*, unsigned int)'
    Just adding an int there won't help.. I guess I have to use that last int somewhere, but I can't exactly figure out where. When not using that int anymore than in the required line the console returns "Naughty, naughty" all the time, also when saying "test" or something like that.
    Do you know how I have to use that extra int?




    Quote Originally Posted by Rashakil Fol
    I recommend at least converting characters to lowercase before comparing them. This should at least stop the idiots who curse because it's thrilling or something, without disrupting the "geniunely profane."
    You're correct.. I didn't checked that
    I searched on this site, other c++ sites, google and more, but provided examples didn't get me further than making text like "TEST" translating to "tEST" or "tEsT"...

    This is where I make the variable:
    Code:
    char chattext[256];
    char chatname[256];
    strcpy (chattext, aw_string (AW_CHAT_MESSAGE));
    strcpy (chatname, aw_string (AW_AVATAR_NAME));
    And with these two methods (the only two I got working without errors):
    ["tEST"]
    Code:
    if (strcmp (chattext,"TEST") == 0) {
    
    printf ("Before conversion: %s\n", chattext);
    *chattext = tolower(*chattext);
    printf ("After conversion: %s\n", chattext);
    }
    "tEsT"
    Code:
    if (strcmp (chattext,"TEST") == 0) {
    printf ("Before conversion: %s\n", chattext);
    for (char *iter = chattext; *iter != '\0'; ++iter) {
        *iter = tolower(*iter);
        ++iter;
    }
    printf ("After conversion: %s\n", chattext);
    }
    But they but don't fulle lowercase the whole word.. Is there something wrong with the translation process, or do I use the wrong type of variable?

  14. #14
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    Quote Originally Posted by lode
    Code:
    for (char *iter = chattext; *iter != '\0'; ++iter) {
        *iter = tolower(*iter);
        ++iter;
    }
    You're incrementing iter twice each time through the loop. This should only happen once.

  15. #15
    Registered User
    Join Date
    Aug 2005
    Posts
    6
    Quote Originally Posted by Rashakil Fol
    You're incrementing iter twice each time through the loop. This should only happen once.
    Ah thanks! I was a bit confused by the *iter != '\0'; part (i didn't know what '\0' meant) so I didn't read the rest quite well :$

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. char Handling, probably typical newbie stuff
    By Neolyth in forum C Programming
    Replies: 16
    Last Post: 06-21-2009, 04:05 AM
  2. Checking array for string
    By Ayreon in forum C Programming
    Replies: 87
    Last Post: 03-09-2009, 03:25 PM
  3. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  4. Another overloading "<<" problem
    By alphaoide in forum C++ Programming
    Replies: 18
    Last Post: 09-30-2003, 10:32 AM
  5. ........ed off at functions
    By Klinerr1 in forum C++ Programming
    Replies: 8
    Last Post: 07-29-2002, 09:37 PM