[development] Captcha module -- possible alternative approach

Syscrusher syscrusher at 4th.com
Thu Feb 9 19:31:51 UTC 2006


On Thursday 09 February 2006 13:08, Arnab Nandi wrote:
> Hi,
> 
> All the 8+ challenges that you mention are things that it's easy to
> write a script for. Plus, since captcha.module is a publicly available
> script, script writers can use it to come up with code that solves the
> challenges. For every set of N problems that you pose to the user, a
> determined enemy will come up with something that solves the N
> problems.

All true. I don't think I'm dealing with a "determined" enemy, though.

But I'll concede your point that what I proposed is scriptable.

[...]
> I'm not fond of typing in strange words from images. It's rather
> demeaning, actually. However, an image captcha it is the ONLY
> challenge mechanism that you CAN NOT write a feasible script for. By

Here I disagree. According to my research on the 'net, there are plenty
of existing scripts, and...

> feasible, I mean something that can execute in limited time and memory
> for DDoS attacks. (Image recognition requires CPU and memory). Hence,
> a good challenge test is something that a human requires very little
> effort to do, and a computer requires a lot of CPU and memory.

...Moore's Law will soon change that, if it hasn't already. The image
is small, after all -- not that many pixels.

> 
> So how do we get rid of the stupid image checks? I've been looking
> into trapdoor functions for a while, which are very easy to pose and
> check, but take time to solve. Factorization of prime multiples is a
> possible challenge. For example, If I take 31, and 13, and multiply
> them and tell you, "403". How long does it take to divide and tell me
> the solution? A while, if I use large prime numbers instead of 13 and
> 31. But if you do give me the answer, it's easy to check if you're
> right. This is the basis of modern cryptography, btw.

I knew that last bit, but hadn't thought of using JavaScript to let
the *browser* do the validation. That's a very cool idea!

> 
> Hence, if we write some code that poses this challenge on the server
> side, and asks a smal bit of javascript to do this on the client side,
> any DDoS attacker will give up, because his CPU will die. However, if
> you're just writing casual comments on a web page, a little CPU spike
> is ignorable. You computer solves the problem by the time you type
> things, and we validate everything on the server side, and the human
> is not bothered at all.

Especially since in this situation the C/R mechanism only happens when
the human wants to create an account or post something. Most page views
are read-only, requiring no activation of the trapdoor.

> But there are some problems to this approach, 
> too lengthy to explain here :)

How about this (pseudocode):

// Make a random 40-character string using A-Z and a-z chars.
$random_string = make_random_chars(40, 'ABCDE.....XYZabcde.....xyz');
$n = 4;       // Measure of complexity of challenge; admin-adjustable
$hash = md5($random_string);
$challenge = substring($random_string, 0, 40-$n) . ":" . $hash . ":" . $n;

Now, the client has something like this:
           "AxEvBw......rUs:51cae0322f00f123.....93c:4"

It knows the MD5 of the full string, and it knows the first 36 characters
of that string. Now it needs to guess the rest of the characters by
brute force (52**$n permutations), concatenating each set with the
known 36 character random string until it finds the one whose MD5 matches
the MD5 supplied by the server. The correct answer is the last four
characters of $random_string.

A lookup table in a bot-client won't help, because the challenge is
random each time -- unless the client has enough disk to store
52**40 precomputed MD5 sums. Since 52**4 is about 7.3 million, the
client will have to compute an average of just over 3 million MD5s
for each iteration if $n is 4. If $n is 3, that number drops to an
average of about 70 thousand MD5 computes on average.

You can adjust the steepness of the complexity by changing the number
of allowed characters in the allowed set (e.g., using only uppercase
would cause $n=3 to average about 8500 computes and $n=4 to average
about 225 thousand. This, and the ability to change $n to a higher
number, allow the site admin to keep up with computational speed
advances over time.

Just sending the MD5 of $n random characters won't help, because
it just might be feasible to store 7.3 million pregenerated MD5s
in a bot. :-) That's the purpose of the added one-time-pad to the
string. Even though we disclose it in the clear, it's virtually
unique to this transaction. The PHP session ID could be used instead
to save some compute time on the server generating all those random
characters. You would use the PHPSESSID as the openly-disclosed part
of $random_string, and the server would generate only the $n characters
to append to PHPSESSID. (Remember that the client can easily obtain
PHPSESSID from the GET variable or cookie, so the concealed "answer"
part of $random_string can't be from PHPSESSID.)

> 
> Another possible, and simple solution is to restrict form submits by
> an IP address to 3 in a second, for example. However, this fails if
> you have a group of people behind a proxy.

Won't help in this case. They are submitting at a slow relative rate,
relying on the nuisance of the extra accounts rather than actually
trying to bring down my server. (I suspect they may be probing for
Drupal sites that allow authenticated users to post without moderation,
so they can post spams. My sites all are fully moderated, so that tactic
fails.)

> 
> CONCLUSION: If you ARE interested in writing up a function like this,
> OR a function amongst the ones you had suggested, no problem! simply
> pickup the captcha.module in cvs, and start coding! It has an API that
> allows you to write simple _challenge() and _response parts, and it
> does the rest.
> For example, the default challenge in captcha.module cvs is the math
> problem you had posed (3 times 5).

Been there, done that...My site still runs Drupal 4.4 (and a BETA at that!),
and I actually had to back-port captcha to get it working with that old Drupal.
I was operating in urgent-mode to get the site at least minimally protected
against this 'bot. Captcha as it stands seems to be working in that regard.
I'll look into coding some enhanced C/R capabilities in a newer version. First,
though, I need to get this site updated at least to Drupal 4.6. :-)

Thanks for the comments!

Scott

-- 
-------------------------------------------------------------------------------
Syscrusher (Scott Courtney)          Drupal page:   http://drupal.org/user/9184
syscrusher at 4th dot com            Home page:     http://4th.com/   


More information about the development mailing list