Thread: Hungry REGEX to grab onion sites

  1. #1
    Registered User
    Join Date
    Jun 2014
    Posts
    16

    Hungry REGEX to grab onion sites

    I have a REGEX that is very hungry. I need to grab onion URL's inside page source. However, my regex gets the .onion sites with no issue. It also gets some of the .onion/more_stuff as well. But unfortunately it grabs entire lines like .onion/stuff" </a.... etc etc. Here's what I'm working with.

    Code:
    "/https:\/\/[^\/]*\.onion\//"
    I also would like http or https but when I do [s]* means 0 or more times right? How can I say 0 or 1? Here is how the regex is getting setup just in case I am doing something that can be slightly simplified.

    Code:
    string regex_onionurl = "/https:\/\/[^\/]*\.onion\//";
    regex onionSearch(regex_onionurl, regex_constants::icase);
    return regex_search(string_to_search_through ,onionSearch);

  2. #2
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    For 0 or 1 times use ? instead of *, as in: "https?:\/\/[^\/]*\.onion\//".

    As for getting more than the regex, that depends on how you use regex_search. In the code you provided you're not looking at what part of the string matches, just if any part matches.

    Another thing that might be useful is to use "*?" instead of "*" to match as few characters as possible.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. regex in c (posix regex)
    By baxy in forum C Programming
    Replies: 1
    Last Post: 11-16-2012, 01:15 PM
  2. Hungry?
    By stevesmithx in forum A Brief History of Cprogramming.com
    Replies: 9
    Last Post: 12-25-2008, 06:51 AM
  3. onion stlye of coding
    By manav in forum A Brief History of Cprogramming.com
    Replies: 23
    Last Post: 04-18-2008, 10:04 AM
  4. inquiry from a hungry mac os x user
    By terabyter in forum C Programming
    Replies: 3
    Last Post: 06-23-2006, 09:04 AM
  5. <regex.h> regex syntax in C
    By battersausage in forum C Programming
    Replies: 7
    Last Post: 03-24-2004, 01:35 PM

Tags for this Thread