Thread: Regex match ISBN number

  1. #1
    Registered User
    Join Date
    Mar 2016
    Posts
    203

    Regex match ISBN number

    I want to regex match an ISBN number of the format: 1234-5678-9x ie. 4 digits [hyphen] 4 digits [hyphen] 1 digit 1 alphanum but the following program rejects valid output. Any help much appreciated (also, if not too much trouble, please tell me where and how I'm wrong as well):
    Code:
    #include <iostream>
    #include <string>
    #include <regex>
    int main()
    {
        std::regex ISBN ("[0-9][{4}][-][0-9][{4}][-][0-9][::alnum::]");
        std::cout << "Enter ISBN number \n";
        std::string input{};
        getline(std::cin, input);
        if (std::regex_match(input, ISBN))
        {
            std::cout << "Success \n";
        }
        else
        {
            std::cout << "Fail \n";
        }
    }
    Thank you

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    There is a [::digit::] class too in POSIX.

    regex assumes you will use ECMAScript grammar, try constructing like this:

    Code:
    std::regex isbn ("[::digit::][{4}][-][::digit::][{4}][-][::digit::][::alnum::]", std::regex::basic);
    std::basic_regex - cppreference.com

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,666
    You have too many braces, and incorrect character class names.
    Code:
        std::regex ISBN ("[0-9]{4}-[0-9]{4}-[[:digit:]][[:alnum:]]");
    Modified ECMAScript regular expression grammar - cppreference.com
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    Also don't forget that the last character may only be an 'x', 'X', or any valid digit.

    Jim

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I feel bad for giving such a broken example. Regex does support escaped character classes though, as the docs in Salem's link explains.

    This does match ISBNs:
    Code:
            string re = "(\\d{4})-(\\d{4})-((\\d)(\\d|X))";
            if (regex_match(tests[i], regex(re)))

  6. #6
    Registered User
    Join Date
    Mar 2016
    Posts
    203
    Many thanks to all for your replies, reverse engineering can see I was going wrong in:
    (a) putting brackets around the preceding atom match quantifiers
    (b) not putting brackets around class names
    (c) putting an extra : in front and behind :alnum:
    (d) also brackets around the hyphen character seems optional
    whiteflags: I tweaked your code slightly to take into a/c all words including the 'X' from my OP example
    So putting everything together, here are a few that work and I'm sure there are several others:
    Code:
    std::regex ISBN ("[0-9]{4}[-][0-9]{4}[-][0-9][[:alnum:]]");
    std::regex ISBN ("(\\d{4})-(\\d{4})-((\\d)(\\d|[[:alpha:]]))"); (whiteflags, modified)
    std::regex ISBN ("(\\d{4})-(\\d{4})-((\\d)(\\d|\\w))"); (whiteflags, modified)
    std::regex ISBN ("[0-9]{4}[-][0-9]{4}[-][0-9][[:alnum:]]"); (Salem)
    std::regex ISBN ("[[:d:]]{4}[-][[:d:]]{4}[-][[:d:]][[:alnum:]]");
    std::regex ISBN ("[\\d]{4}[-][\\d]{4}[-][\\d][\\w]");

  7. #7
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
    I couldn't resist :P

  8. #8
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    (a) putting brackets around the preceding atom match quantifiers
    (b) not putting brackets around class names
    (c) putting an extra : in front and behind :alnum:
    (d) also brackets around the hyphen character seems optional
    whiteflags: I tweaked your code slightly to take into a/c all words including the 'X' from my OP example
    If you put [] around a hyphen it just turns it into a character class.
    \w is really too permissive when you mean X (you could include x but frankly I would reject that as well).

    Additionally, the () in the regex is largely for grouping. In my last one, match groups 1 through 3 would be all the numbers, and 5 is just the check digit.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. regex in c (posix regex)
    By baxy in forum C Programming
    Replies: 1
    Last Post: 11-16-2012, 01:15 PM
  2. regex: it doesn't match
    By jeanluca in forum C Programming
    Replies: 4
    Last Post: 06-06-2009, 09:45 AM
  3. Replies: 1
    Last Post: 09-22-2008, 01:38 PM
  4. isbn number.
    By xxbleh in forum C Programming
    Replies: 4
    Last Post: 11-12-2007, 06:44 AM
  5. <regex.h> regex syntax in C
    By battersausage in forum C Programming
    Replies: 7
    Last Post: 03-24-2004, 01:35 PM

Tags for this Thread