On Thursday 09 February 2006 06:57, Bèr Kessels wrote:
Captcha is bad. Evil. ;) http://drupal.org/node/46666 is a faaaar better approach. (imo). It validated the email first over SMTP on the remote server, and then reports back.
Nice module...thanks for the tip. I will probably install it when I (soon!) upgrade my site to a newer Drupal version. Unfortunately, it doesn't solve my immediate problem. The specific domain being spoofed (yahoo.com) is one of those that returns a 250 (accepted) for any recipient, whether valid or not, so the module you suggest won't block my DoS attack. That being said, since Email Verify is user-transparent, there's no reason not to use it in addition to other precautions. I agree with you that I Captcha is not an optimum solution. I'm thinking about adapting the algorithm so that instead of displaying an image and asking people to type the text, I would create a module that asks a simple, randomly-generated question that any human being could answer. I could actually draw the questions from content on my site. What I have in mind is to pull text from a random article and then ask a question that is permutated from sentence structure and possibly from some keywords found in the text. The possibilities are finite, but with careful algorithm design it could be made so that there are a very large number of possibilities. Example questions, assuming the paragraph above is the random text: * What is the last word before the first period? * What is the first hyphenated phrase in the text? * How many times does the word "create" appear? * What is 3 times 5? (who says we have to just ask about the text?) * Retype the user name you have chosen but put an extra Q on the end. * What is the third word of the second sentence? * What is the second-to-last word of the text? * What word appears right after the first occurrence of "not"? I've come up with these eight qualitatively different questions without really thinking about it. Now, in the module, there would be intentionally different ways of phrasing each one using quirks of human language, such as "Type the word that appears before the first hyphen" rather than the second question. Add into that the use of random mathematical questions, common knowledge ("What is the name of this planet?", "What ocean is between Africa and South America?", "What continent includes the country of Germany?" and so on... carefully chosen questions that are culturally neutral.) One could also allow the site admin to add in a list of Q&A that any person registering at their site would know based on the topic of the site. For Drupal.org, questions might be "CMS stands for ____ management system:" or "What word means to obtain a copy of Drupal from our web site to your computer so you can install it?" or "How many eyes does our logo have?" or "How many menu tabs are at the top-right corner of the Drupal home page?" Site admins could add general-knowledge questions that are not culturally-neutral based on the audience for their site. For example, if you are building a site about Canadian politics, it is not unreasonable to expect visitors to be able to name the Prime Minister of Canada or state how many provinces there are or name the province just west of Manitoba. The biggest challenge I can see is to make the questions patterned enough to use the t() function with replaceable parameters to allow translations, yet still have enough patterns to make a spammer's job difficult. One approach is to have multiple patterns for each question, e.g., $patterns[0] = array( "What word appears %location the first %punc?", "The first occurrence of %punc appears just %location what word?", "What's the word right %location %punc, the first time %punc occurs?", "There is a word immediately %location %punc in this text. What is it?", "Look for %punc in the text and type the word %location it:", ); $patterns[3] = array( "What is %expression?", "Compute the value of %expression:", "In mathematics, %expression equals what?", "Tell me the numeric value for %expression.", "%expression is how many?" ); The neat thing about this is that if we have multiple translations of the questions, the spammers have to follow us. And linguistic syntax variance will cause the replaceable parameters to appear at different points in different languages. Nine questions permutated five ways each is only 45 pregenerated questions, but this is improved by the fact that some of them (like the math one) can have %expression also written in multiple language-neutral ways: 3x5, 3 * 5, (2+1)x(2+3) so that the spammer now also has to have an arithmetic expression evaluator. Make it tougher by adding simple algebra: Y * (2+1) = X, and Y is 5. What is X? or grade-school-level "word problems" from math class: Johnny has 12 candies and shares 4 with his sister. How many are left? Sally shares 16 coins equally among 8 people. How many does each receive? There are lots of ways to word these, as any teacher can tell you. The system is made stronger if the same replaceable parameters are used in the same word positions in different questions that have different answers, e.g., "What word appears [after] the first [.]?" versus "First letter appearing [in] the first [sentence]?" -- two questions that parse similarly but have non-overlapping answer domains. Would it be possible to build an AI that outsmarted this system? Of course. But it would be nontrivial, and it would have to be Drupal-specific, and if the site admins are diligent in adding their own questions, it would have to be at least partially site-specific. It's not foolproof, but it would at least make things a little more challenging for the spammers, and it doesn't rely on images. Comments? -- ------------------------------------------------------------------------------- Syscrusher (Scott Courtney) Drupal page: http://drupal.org/user/9184 syscrusher at 4th dot com Home page: http://4th.com/