Thread: Are you human?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708

    Talking Are you human?

    So I've been working on this captcha algorithm recently, and I have a pretty good feeling about it. Unfortunately, I haven't yet had the time to set up a proper testing environment (eg: OCR analyzers and such), but my gut feeling is that it should be sufficiently secure against attacks.

    Below, I've attached two sample output files (the input file used a black font over a white background). Notice that in the first one (using the default settings) the smaller fonts are fairly hard to read. In that case, the parameters can be tweaked to sharpen the image (the second snapshot). The image can also be blurred, if necessary.

    Anyway, what do you guys think? Will it work?

    .....
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.

    But they do look good, although they're a bit hard to read -- even for my 20 year old eyes . The first is a lot harder to read than the second, it would be good to see how OCR does on the first vs the second. If I didn't know what the words were, I doubt being able to read a few characters would help me. For example "destruction" could easily read "description" IMO.
    Last edited by zacs7; 05-24-2010 at 05:30 PM.

  3. #3
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    Quote Originally Posted by zacs7 View Post
    I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.
    How about reverting it? We put porn pictures on the site instead of a captcha and if the user clicks through in less then ($porn) seconds, its a bot and gets kicked
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  4. #4
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Can't really read the first image except for the bigger letters. And even then badly. That would be a terrible choice.

    The second one... well I can't read the two first lines. The rest I could read. But it would depend on whether you would use normal English or jumbled words. If the latter, probably not. With that font I would probably be confused with the "h" looking like an "n", etc.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  5. #5
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    I read the second no problem, the first however is pretty tough. Specifically the sixth word on the first line. That word is even kind of difficult to make out in the second one and I might not have gotten it out of context.
    Sent from my iPadŽ

  6. #6
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

    The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

    ...
    Last edited by Sebastiani; 05-25-2010 at 01:30 AM.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  7. #7
    Registered User
    Join Date
    Aug 2006
    Posts
    163
    Quote Originally Posted by Sebastiani View Post
    Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

    The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

    ...

    You want a good reliable way to beat the machines? Take a simple one word/character string input, print it to the screen using your noise-a-fier twice along with a third word/character string of equal length (all in random order, of course). Tell your user to type out the word that appears twice.

    This not only gives your human user two chances to distinguish the word, but also confuses the machine a bit more by adding a layer of logic instead of just edge detection.

  8. #8
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by Sebastiani View Post
    What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.
    I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background. Running this through a filter that replaces all exposed, non-black pixels with white pixels, and recolors all non-black pixels surrounded by black pixels (all adjacent, or perhaps a minimum 7 of 9 adjacent pixels) to black would yield a very readable image that even the most simple OCR algorithms can read.
    Sent from my iPadŽ

  9. #9
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by SlyMaelstrom View Post
    I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background.
    Yeah, I was thinking about this too.

    For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.

    Attachment 9804
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  10. #10
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Mario F. View Post
    Yeah, I was thinking about this too.

    For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.
    That image can be even further cleaned up by a dilation/erosion:

    Attachment 9806

    Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  11. #11
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by brewbuck View Post
    That image can be even further cleaned up by a dilation/erosion:

    Attachment 9806

    Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.
    Wow, yeah, I'm definitely going to need a more sophisticated algorithm!

    Thanks again for all of the input, guys.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Sebastiani
    Are you human?
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by laserlight View Post
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    Hmm, that would've come in handy on my last date (don't ask)...
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  14. #14
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by laserlight View Post
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    I kept my hand in the box.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  15. #15
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Fairly unreadable, the first one. However, both of them become a fair share more readable if you open it in gimp and run the despeckle filter with the following settings:
    Adaptive: yes
    Recursive: no
    Radius: 1
    Black level: -1
    White level: 103

    But of course that would also mean that it becomes a fair share more readable to computers by simply running that filter through it. So, all in all, a computer program would have little difficulty reading it, I believe.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Alice....
    By Lurker in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 06-20-2005, 02:51 PM
  2. Mouse to have human brain
    By nickname_changed in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 03-10-2005, 05:39 PM
  3. Virtual bar tender.
    By adrianxw in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 11-14-2004, 11:17 AM
  4. First Human clone
    By Commander in forum A Brief History of Cprogramming.com
    Replies: 56
    Last Post: 12-30-2002, 04:46 PM
  5. Do constructors get inherited?
    By Shadow12345 in forum C++ Programming
    Replies: 28
    Last Post: 08-21-2002, 11:41 AM