Thread: Are you human?

  1. #1
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708

    Talking Are you human?

    So I've been working on this captcha algorithm recently, and I have a pretty good feeling about it. Unfortunately, I haven't yet had the time to set up a proper testing environment (eg: OCR analyzers and such), but my gut feeling is that it should be sufficiently secure against attacks.

    Below, I've attached two sample output files (the input file used a black font over a white background). Notice that in the first one (using the default settings) the smaller fonts are fairly hard to read. In that case, the parameters can be tweaked to sharpen the image (the second snapshot). The image can also be blurred, if necessary.

    Anyway, what do you guys think? Will it work?

    .....
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.

    But they do look good, although they're a bit hard to read -- even for my 20 year old eyes . The first is a lot harder to read than the second, it would be good to see how OCR does on the first vs the second. If I didn't know what the words were, I doubt being able to read a few characters would help me. For example "destruction" could easily read "description" IMO.
    Last edited by zacs7; 05-24-2010 at 05:30 PM.

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Can't really read the first image except for the bigger letters. And even then badly. That would be a terrible choice.

    The second one... well I can't read the two first lines. The rest I could read. But it would depend on whether you would use normal English or jumbled words. If the latter, probably not. With that font I would probably be confused with the "h" looking like an "n", etc.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    I read the second no problem, the first however is pretty tough. Specifically the sixth word on the first line. That word is even kind of difficult to make out in the second one and I might not have gotten it out of context.
    Sent from my iPadŽ

  5. #5
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

    The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

    ...
    Last edited by Sebastiani; 05-25-2010 at 01:30 AM.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Sebastiani
    Are you human?
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by laserlight View Post
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    Hmm, that would've come in handy on my last date (don't ask)...
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  8. #8
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Quote Originally Posted by laserlight View Post
    I have not had the opportunity to pass the test of humanity involving the gom jabbar.
    I kept my hand in the box.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  9. #9
    Registered User
    Join Date
    Oct 2008
    Posts
    1,262
    Fairly unreadable, the first one. However, both of them become a fair share more readable if you open it in gimp and run the despeckle filter with the following settings:
    Adaptive: yes
    Recursive: no
    Radius: 1
    Black level: -1
    White level: 103

    But of course that would also mean that it becomes a fair share more readable to computers by simply running that filter through it. So, all in all, a computer program would have little difficulty reading it, I believe.

  10. #10
    Registered User
    Join Date
    Aug 2006
    Posts
    163
    Quote Originally Posted by Sebastiani View Post
    Yeah, the value of the parameters definitely depend on the font type and size, as well as the foreground/background colors being used. Choosing the best fit is something of a black art, I suppose.

    The three images attached below (pasted together into one) were generated with the same settings (0.2, 0.025), and IMO pretty easy to read (it may help for some to either lean back a bit, or better yet, look at them at a sharp angle). What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.

    ...

    You want a good reliable way to beat the machines? Take a simple one word/character string input, print it to the screen using your noise-a-fier twice along with a third word/character string of equal length (all in random order, of course). Tell your user to type out the word that appears twice.

    This not only gives your human user two chances to distinguish the word, but also confuses the machine a bit more by adding a layer of logic instead of just edge detection.

  11. #11
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by Sebastiani View Post
    What I'd really like to know is if they really could resist an advanced OCR engine, or if I'm, in fact, seriously underestimating the problem? Well, that's the next step, anyway, and I plan on running it through as many OCR's as possible, for good measure.
    I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background. Running this through a filter that replaces all exposed, non-black pixels with white pixels, and recolors all non-black pixels surrounded by black pixels (all adjacent, or perhaps a minimum 7 of 9 adjacent pixels) to black would yield a very readable image that even the most simple OCR algorithms can read.
    Sent from my iPadŽ

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Picking the same font face and size makes possible attacks a lot easier. If people find the right noise filter, the result is dead easy for any OCR to deal with.

    Make each letter
    - a different size (within some limit)
    - a different font
    - a different face (bold, italic)
    - a different orientation

    Where letters end up overlapping, experiment with AND/OR/XOR logic for combining pixels.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  13. #13
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by EVOEx View Post
    But of course that would also mean that it becomes a fair share more readable to computers by simply running that filter through it. So, all in all, a computer program would have little difficulty reading it, I believe.
    Quote Originally Posted by Salem View Post
    Picking the same font face and size makes possible attacks a lot easier.
    Yeah, I thot most of what makes an effective CAPTCHA is not obscuring via filter but warping and overlapping. A lot of them don't obscure at all, they are just very warped.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #14
    Unregistered User Yarin's Avatar
    Join Date
    Jul 2007
    Posts
    2,158
    I agree the first one is harder to read, I wouldn't use something that hard as a CAPTCHA.

    I don't think the second one is good though. Think about it, hunt down all the black(ish) pixels, heed only those, trace the edges, and compare the general shape with a database of that single, simple, font that your using.

    I would advise using a variety of fonts and colors, and do a lot of warping, resizing, combine it with a similar simple background to further hinder tracing methods. Static like that applied over the whole thing only hurts a machine as much as a human, i.e., don't do it.

  15. #15
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by MK27 View Post
    Yeah, I thot most of what makes an effective CAPTCHA is not obscuring via filter but warping and overlapping. A lot of them don't obscure at all, they are just very warped.
    Sometimes too warped. Bs and 8s, Is and 1s, 5s and Ss... they all get very confusing even to the human eye, which is what I feel is the most innate problem with CAPTCHA technology in the first place. In many cases, they demand multiple tries from the end-user and generally offer solutions to continuously bypass complicated CAPTCHAs in order to find a suitable one that you can fill out. The same methods are frequently applied by the tools designed to crack CAPTCHAs.

    By the way, I'm curious to know if any one has developed a CAPTCHA cracking tool that uses sound analysis to take advantage of the blind-accessible CAPTCHA forms. Has anyone seen something like this?
    Sent from my iPadŽ

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Alice....
    By Lurker in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 06-20-2005, 02:51 PM
  2. Mouse to have human brain
    By nickname_changed in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 03-10-2005, 05:39 PM
  3. Virtual bar tender.
    By adrianxw in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 11-14-2004, 11:17 AM
  4. First Human clone
    By Commander in forum A Brief History of Cprogramming.com
    Replies: 56
    Last Post: 12-30-2002, 04:46 PM
  5. Do constructors get inherited?
    By Shadow12345 in forum C++ Programming
    Replies: 28
    Last Post: 08-21-2002, 11:41 AM