Are you human?

**Sebastiani** · 05-24-2010

So I've been working on this captcha algorithm recently, and I have a pretty good feeling about it. Unfortunately, I haven't yet had the time to set up a proper testing environment (eg: OCR analyzers and such), but my gut feeling is that it should be sufficiently secure against attacks.

Below, I've attached two sample output files (the input file used a black font over a white background). Notice that in the first one (using the default settings) the smaller fonts are fairly hard to read. In that case, the parameters can be tweaked to sharpen the image (the second snapshot). The image can also be blurred, if necessary.

Anyway, what do you guys think? Will it work?

.....

**zacs7** · 05-24-2010

I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.

But they do look good, although they're a bit hard to read -- even for my 20 year old eyes

. The first is a lot harder to read than the second, it would be good to see how OCR does on the first vs the second. If I didn't know what the words were, I doubt being able to read a few characters would help me. For example "destruction" could easily read "description" IMO.

**nvoigt** · 05-25-2010

Originally Posted by zacs7

I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.

How about reverting it? We put porn pictures on the site instead of a captcha and if the user clicks through in less then ($porn) seconds, its a bot and gets kicked

**Mario F.** · 05-24-2010

Can't really read the first image except for the bigger letters. And even then badly. That would be a terrible choice.

The second one... well I can't read the two first lines. The rest I could read. But it would depend on whether you would use normal English or jumbled words. If the latter, probably not. With that font I would probably be confused with the "h" looking like an "n", etc.

**SlyMaelstrom** · 05-24-2010

I read the second no problem, the first however is pretty tough. Specifically the sixth word on the first line. That word is even kind of difficult to make out in the second one and I might not have gotten it out of context.

**Sebastiani** · 05-25-2010

Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

...

**System_159** · 05-25-2010

Originally Posted by Sebastiani

Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

...

You want a good reliable way to beat the machines? Take a simple one word/character string input, print it to the screen using your noise-a-fier twice along with a third word/character string of equal length (all in random order, of course). Tell your user to type out the word that appears twice.

This not only gives your human user two chances to distinguish the word, but also confuses the machine a bit more by adding a layer of logic instead of just edge detection.

**SlyMaelstrom** · 05-25-2010

Originally Posted by Sebastiani

What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background. Running this through a filter that replaces all exposed, non-black pixels with white pixels, and recolors all non-black pixels surrounded by black pixels (all adjacent, or perhaps a minimum 7 of 9 adjacent pixels) to black would yield a very readable image that even the most simple OCR algorithms can read.

**Mario F.** · 05-25-2010

Originally Posted by SlyMaelstrom

I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background.

Yeah, I was thinking about this too.

For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.

Attachment 9804

**brewbuck** · 05-25-2010

Originally Posted by Mario F.

Yeah, I was thinking about this too.

For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.

That image can be even further cleaned up by a dilation/erosion:

Attachment 9806

Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.

**Sebastiani** · 05-25-2010

Originally Posted by brewbuck

That image can be even further cleaned up by a dilation/erosion:

Attachment 9806

Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.

Wow, yeah, I'm definitely going to need a more sophisticated algorithm!

Thanks again for all of the input, guys.

**laserlight** · 05-25-2010

Originally Posted by Sebastiani

Are you human?

I have not had the opportunity to pass the test of humanity involving the gom jabbar.

**Sebastiani** · 05-25-2010

Originally Posted by laserlight

I have not had the opportunity to pass the test of humanity involving the gom jabbar.

Hmm, that would've come in handy on my last date (don't ask)...

**hk_mp5kpdw** · 05-25-2010

Originally Posted by laserlight

I have not had the opportunity to pass the test of humanity involving the gom jabbar.

I kept my hand in the box.

**EVOEx** · 05-25-2010

Fairly unreadable, the first one. However, both of them become a fair share more readable if you open it in gimp and run the despeckle filter with the following settings:
Adaptive: yes
Recursive: no
Radius: 1
Black level: -1
White level: 103

But of course that would also mean that it becomes a fair share more readable to computers by simply running that filter through it. So, all in all, a computer program would have little difficulty reading it, I believe.

Thread: Are you human?

Thread Tools

Search Thread

Display

Hybrid View

Are you human?

Similar Threads

Alice....

Mouse to have human brain

Virtual bar tender.

First Human clone

Do constructors get inherited?