Comment Spam Resumes

Have spammers figured out how to pick reCAPTCHA‘s lock? All of a sudden I am getting hundreds of comment spam blocked by Akismet. When I added reCAPTCHA, it dropped to a few a month. Now 409 in a week.

Guess this is why layers of security are good.

UPDATE: Scanned through for false positives. The first word of many of them were Xanth characters: Bink, Chameleon, Dolph, Iris, Smash, Goldy, Grundy, Cherie, Chester, Roogna, Imbri.

reCAPTCHA and Chrome

Was using this RSVP form with Google Chrome and found the reCAPTCHA was telling me I repeatedly failed the Turing test. After the sixth time, I decided it might be my browser, so I tried it in Firefox which worked fine.

Curious, I went looking for a possible problem between reCAPTCHA and Chrome. According to a post there, the Transitional XHTML DOCTYPE is the cause. Changing that DOCTYPE to Strict ought to fix the issue. Given the audience, I doubt there is anyone else using Chrome to fill it. So fixing it probably isn’t worth it to them.

Interesting. I’ll have to look into issues with Chrome and the XHTML Transitional DOCTYPE.

Technorati : , reCAPTCHA,
Del.icio.us : , ,
Flickr : , ,

reCAPTCHA and Chrome

Was using this RSVP form with Google Chrome and found the reCAPTCHA was telling me I repeatedly failed the Turing test. After the sixth time, I decided it might be my browser, so I tried it in Firefox which worked fine.

Curious, I went looking for a possible problem between reCAPTCHA and Chrome. According to a post there, the Transitional XHTML DOCTYPE is the cause. Changing that DOCTYPE to Strict ought to fix the issue. Given the audience, I doubt there is anyone else using Chrome to fill it. So fixing it probably isn’t worth it to them.

Interesting. I’ll have to look into issues with Chrome and the XHTML Transitional DOCTYPE.

Technorati : , reCAPTCHA,
Del.icio.us : , ,

Turing Digitalization

Some 60 million CAPTCHAs are solved daily according to Luis von Ahn (on Wired Science on PBS). His technology project reCAPTHCA will use unknown words in these challenges for solving the unknown words in OCR digitalizing books to solve these words in an a quasi-automated sort of way.

I wonder though. Even if reCAPTCHA a) becomes the default at major sites like Yahoo or Google and b) is solved 100% right ever time, then how many books would be completed per day? Certainly no one really comments on this blog, so its almost why bother. (hint, hint)

tag: ,

UPDATE: Trying to clarify. reCAPTCHA integrates two technologies.

Optical Character Recognition always has questionable results. The worse the quality of the text (age or damage), the less capable the software. It takes a human on average about 10 seconds to recognize and provide the correct spelling of a piece of unknown text.

CAPTCHAs are the little pictures used to verify you are a human and not a spammer at various web sites. The problem is coming up with good digital letters OCR software cannot easily recognize.

Luis’ reCAPTCHA idea is if OCR software has trouble with a piece of text from these scanned books, then they have would make excellent candidates for objects to confuse the spammer bots trying to defeat CAPTCHAs. At the same time, humans validate the correctness of the unknown words where the OCR was confused.

Better?