440

Jonathan Hoefler (whose surname, fittingly, could include not one but two consecutive ligatures, œ and fl) recently pointed to Louis von Ahn’s reCAPTCHA project, which I think is pretty incredible. A CAPTCHA is one of those graphically-distorted bits of text you’re asked to enter before submitting a comment or form on the internet to prove you’re a human being and not some bot or wayward spammer. But reCAPTCHA repurposes that intelligence by not only filtering out bots but also filtering in random words from scanned texts that humans can decipher but computers can’t. Von Ahn explains, “Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one.” 60 million CAPTCHAs per day x 10 seconds per CAPTCHA = About 150,000 of labor per day. As von Ahn suggests, “What if we could make positive use of this human effort? […] Currently, we are helping to digitize books from the Internet Archive and old editions of the New York Times.” Clay Shirky is definitely onto something.

04 September 2008 — Recommended Readings