Hungry REGEX to grab onion sites

**seanrsolutions** · 10-17-2015

I have a REGEX that is very hungry. I need to grab onion URL's inside page source. However, my regex gets the .onion sites with no issue. It also gets some of the .onion/more_stuff as well. But unfortunately it grabs entire lines like .onion/stuff" </a.... etc etc. Here's what I'm working with.

Code:

"/https:\/\/[^\/]*\.onion\//"

I also would like http or https but when I do [s]* means 0 or more times right? How can I say 0 or 1? Here is how the regex is getting setup just in case I am doing something that can be slightly simplified.

Code:

string regex_onionurl = "/https:\/\/[^\/]*\.onion\//";
regex onionSearch(regex_onionurl, regex_constants::icase);
return regex_search(string_to_search_through ,onionSearch);

**King Mir** · 10-17-2015

For 0 or 1 times use ? instead of *, as in: "https?:\/\/[^\/]*\.onion\//".

As for getting more than the regex, that depends on how you use regex_search. In the code you provided you're not looking at what part of the string matches, just if any part matches.

Another thing that might be useful is to use "*?" instead of "*" to match as few characters as possible.

Thread: Hungry REGEX to grab onion sites

Thread Tools

Search Thread

Display

Hungry REGEX to grab onion sites

Similar Threads

regex in c (posix regex)

Hungry?

onion stlye of coding

inquiry from a hungry mac os x user

<regex.h> regex syntax in C

Tags for this Thread