Thread: Are you human?

  1. #16
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by SlyMaelstrom View Post
    I'm no OCR expert by any means, but to me the easiest way to read this with OCR would be to acknowledge the simple fact that the readable text is black text on a white background.
    Yeah, I was thinking about this too.

    For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.

    Attachment 9804
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #17
    Unregistered User Yarin's Avatar
    Join Date
    Jul 2007
    Posts
    2,158
    Quote Originally Posted by SlyMaelstrom View Post
    By the way, I'm curious to know if any one has developed a CAPTCHA cracking tool that uses sound analysis to take advantage of the blind-accessible CAPTCHA forms. Has anyone seen something like this?
    I've wondered about this too. I haven't tried, but I think you could refine Sphinx to do that pretty well. For the most part CAPTCHAs hold back individuals, I'm sure groups, like professional spammers, aren't hindered.
    Last edited by Yarin; 05-25-2010 at 07:38 PM. Reason: correction

  3. #18
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    I was just thinking about this. Why not use a large set of photographs or drawings of simple objects and ask the user to identify them? I suppose it would be hard to find enough objects that have unique names, but you could couple that with a human language question, such as "Is this a man or a woman?" for one pic, "What color is the bus?" for another, or "How many apples are in the bowl?" for another.

    You could generate large sets of such picture/question combinations, and then automate simple changes in the images to prevent someone from simply building a library of correct answers. Just making such a library work would require a lot more complex work than cracking a captcha, and a sufficiently clever system of this sort might be unbreakable.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #19
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by MK27
    Why not use a large set of photographs or drawings of simple objects and ask the user to identify them?
    It is certainly possible, and such initiatives already exist, e.g., ASIRRA.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #20
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by MK27 View Post
    I was just thinking about this. Why not use a large set of photographs or drawings of simple objects and ask the user to identify them? I suppose it would be hard to find enough objects that have unique names, but you could couple that with a human language question, such as "Is this a man or a woman?" for one pic, "What color is the bus?" for another, or "How many apples are in the bowl?" for another.
    It's been done and I've been on the user end of it. I can't recall exactly where I've seen it, but I have. The problem with this method is that the data sample would be small enough that one could simply brute force it through prior knowledge of one or more of the questions being asked. Even if you were to have 1,000 unique questions and gave users two attempts before blocking the IP, you'd still only need a relatively modest list of anonymous proxies in order to break it while only knowing the solution to a single question. You could probably even apply the birthday paradox to this one as one might expect at least a few solutions to be a low number between 1 and 9.
    Sent from my iPadŽ

  6. #21
    the hat of redundancy hat nvoigt's Avatar
    Join Date
    Aug 2001
    Location
    Hannover, Germany
    Posts
    3,130
    Quote Originally Posted by zacs7 View Post
    I think CAPTCHAs are flawed by nature anyway, due to people "beating them" by putting the CAPTCHA on another website (i.e. a porn website) so a real human fills it out.
    How about reverting it? We put porn pictures on the site instead of a captcha and if the user clicks through in less then ($porn) seconds, its a bot and gets kicked
    hth
    -nv

    She was so Blonde, she spent 20 minutes looking at the orange juice can because it said "Concentrate."

    When in doubt, read the FAQ.
    Then ask a smart question.

  7. #22
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by SlyMaelstrom
    The problem with this method is that the data sample would be small enough that one could simply brute force it through prior knowledge of one or more of the questions being asked.
    Yes, this is a challenge. From the ASIRRA website:
    Past projects have used photographs to tell computers and humans apart. Examples include Carnegie Mellon's PIX CAPTCHA, Oli Warner's KittenAuth, and work done by Chew and Tygar. These projects have a common weakness: they use relatively small image databases. There's a fundamental reason for this. It's difficult for a computer to automatically classify pictures with high accuracy — that's why the task is useful as a HIP. An image database small enough to be constructed manually by a researcher is also small enough to be manually reconstructed by an attacker.

    Asirra is different because of our unique partnership with Petfinder.com, the world's largest site devoted to finding homes for homeless pets. They've provided us with over three million images of cats and dogs, manually classified by people at thousands of animal shelters across the United States.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #23
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by SlyMaelstrom View Post
    It's been done and I've been on the user end of it. I can't recall exactly where I've seen it, but I have. The problem with this method is that the data sample would be small enough that one could simply brute force it through prior knowledge of one or more of the questions being asked
    Sure, a naive implementation will have this problem. But if you literally did something like generated a large random list of objects regularly (say once a week) then retrieved images with google images (to automate), and let the user have three guesses (or move on to another object) "brute force thru prior knowledge" (this is what I meant by a library of answers) will be a failing venture.

    Incorporating the question adds some problems because if the element is simple enough to automate changes to (color or number) then it would also be easy to brute force *unless* you are only given one chance. If the question is truly simple ("How many apples in the bowl?") a human being will get it right the first time, but a bot never will. Of course, you really do have to vary the number of apples in the bowl. I think it will be much easier to present images with numbers of things that will be much more difficult to machine analyze than letters in a CAPTCHA, but also very easy to keep varied (3 pieces of fruit, 4 pieces of fruit, 5 pieces of fruit -- a bowl with an apple, an orange, a banana, and a truck in it, hmmm.)

    Vis the birthday problem, that is a problem, but I only spent <5 minutes thinking about it so far It would not be hard to start collecting the "really dumb == bot" IP's in that case. If a human being cannot tell a piece of fruit from a truck 90% of the time or better, they aren't using the web anyway and hence will not get unfairly blacklisted.
    Last edited by MK27; 05-25-2010 at 01:07 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  9. #24
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    The ASIRRA solution (asking you to correctly identify a set of pictures rather than just one) looks very good because a human will always get 10 out of 10, they have a constant supply of millions of appropriate photographs (could never be collected for the bot database) and this very much reduces the birthday problem (there are 1024 possibilities just using "dog or cat"). You will have to wait for the year after next for your party

    I'd bet black and white shots of dissimiliar objects taken from any and all angles could give you a pretty good database of this sort and be well beyond the potential of most cracking systems -- image analysis on that level is just way too processor intensive. You might as well hire someone at $2/hour somewhere to do it for you at that point.
    Last edited by MK27; 05-25-2010 at 01:19 PM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #25
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by Mario F. View Post
    Yeah, I was thinking about this too.

    For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.

    Attachment 9804
    Ouch, yeah, even the crudest path-finding AI could connect the dots there quite easily. Unfortunately, when I posted that I had been programming for several days straight, and by the time I got around to creating the sample images, I hadn't put much time into selecting the settings! Can you confirm similar results with the "ibrkfy" sample I posted?

    Quote Originally Posted by Yarin View Post
    I would advise using a variety of fonts and colors, and do a lot of warping, resizing, combine it with a similar simple background to further hinder tracing methods. Static like that applied over the whole thing only hurts a machine as much as a human, i.e., don't do it.

    Quote Originally Posted by Salem View Post
    Picking the same font face and size makes possible attacks a lot easier. If people find the right noise filter, the result is dead easy for any OCR to deal with.

    Make each letter
    - a different size (within some limit)
    - a different font
    - a different face (bold, italic)
    - a different orientation

    Where letters end up overlapping, experiment with AND/OR/XOR logic for combining pixels.
    Good ideas, thanks!
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  11. #26
    Registered User lpaulgib's Avatar
    Join Date
    May 2010
    Posts
    65
    Captchas are pretty cool. Thats a long sentence to put in though. I like websites that ask you do answer a question to do something like "What's 1 plus four?" or, "How many hands does the average human have?". I wonder if those will gain in popularity in the future.

  12. #27
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by Sebastiani View Post
    Can you confirm similar results with the "ibrkfy" sample I posted?
    I can't. The ibrkfy sample offers a lot more protection against that type of attack, as you can see below. You offer a lot less contrast and you have noise with the same luminance levels as the letters, which doesn't allow me to get rid of it while maintaining legibility for the OCR. I think it can still be read. But you'd need a very good OCR.

    Problem is that the "ibrkfy" sample is also a lot harder to be read by a human
    Especially if you apply some of the good advise above, like warping and different fonts between letters.

    Attachment 9805
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  13. #28
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by Mario F. View Post
    I can't. The ibrkfy sample offers a lot more protection against that type of attack, as you can see below. You offer a lot less contrast and you have noise with the same luminance levels as the letters, which doesn't allow me to get rid of it while maintaining legibility for the OCR. I think it can still be read. But you'd need a very good OCR.

    Problem is that the "ibrkfy" sample is also a lot harder to be read by a human
    Especially if you apply some of the good advise above, like warping and different fonts between letters.

    Attachment 9805
    Hmm, yep, I think you're right. And, of course, OCR's are only getting better, so even if it's sufficient now, it won't be for long...

    Anyway, thanks so much for running that through, Mario. Cheers!
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  14. #29
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Mario F. View Post
    Yeah, I was thinking about this too.

    For illustration purposes, I ran the second picture in lab mode, extracted the lightness channel and applied a levels adjustment to it (reducing darkness and increasing the midtones). The image below was the result. Fairly easy to OCR. Now, my approach may not even be the easiest. But I think that by using lab mode, I guarantee I can tackle for any kind of noise in the image. As far as coding is concerned, I suppose this could be fairly easily achieved by someone bent on defeating this captcha.
    That image can be even further cleaned up by a dilation/erosion:

    Attachment 9806

    Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  15. #30
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by brewbuck View Post
    That image can be even further cleaned up by a dilation/erosion:

    Attachment 9806

    Random noise can usually be removed. In this case I think it could be almost completely removed by calculating the variance of the chrominance. The noise is randomly distributed in chroma space, the signal is not.
    Wow, yeah, I'm definitely going to need a more sophisticated algorithm!

    Thanks again for all of the input, guys.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Alice....
    By Lurker in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 06-20-2005, 02:51 PM
  2. Mouse to have human brain
    By nickname_changed in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 03-10-2005, 05:39 PM
  3. Virtual bar tender.
    By adrianxw in forum A Brief History of Cprogramming.com
    Replies: 22
    Last Post: 11-14-2004, 11:17 AM
  4. First Human clone
    By Commander in forum A Brief History of Cprogramming.com
    Replies: 56
    Last Post: 12-30-2002, 04:46 PM
  5. Do constructors get inherited?
    By Shadow12345 in forum C++ Programming
    Replies: 28
    Last Post: 08-21-2002, 11:41 AM