[drupal-devel] simple and effective comment spam prevention exists and works
after reading this article: http://www.dvorak.org/blog/?p=2904 i'm now using this rule in .htaccess (in rewrite block above the "?q" rewrite rule [which would override this rule]): # Try to prevent comment spam. Attempts to post comments are 403 if they # aren't coming from within site. This will prevent clients that don't send # referrer from posting comments, but I'm not aware of any modern browser that # does not send a referrer RewriteCond %{HTTP_REFERER} "!^http://(www.)?slaughters.com/.*$" [NC] RewriteCond %{THE_REQUEST} "POST /comment/reply/.*" RewriteRule .* - [F] i believe the domain name can be replaced with a var to make it generic. i'm just not sure if there are cases where a valid client does not send a referrer. i switched from MT to Drupal largely because my site had become a spam repository. i know there are lots of modules that try to prevent spam, but i prefer simplicity where possible. of course, it's probably only a matter of time until spammers spoof headers to circumvent this, but i'm sure it can be tweaked to keep up with the bastards. anyway, this is the anti-spam approach i'm going to use for now. thought there might be more general interest in it.
i believe the domain name can be replaced with a var to make it generic. i'm just not sure if there are cases where a valid client does not send a referrer.
Plenty. HTTP_REFERER is not something to rely on.
Karoly Negyesi wrote:
i believe the domain name can be replaced with a var to make it generic. i'm just not sure if there are cases where a valid client does not send a referrer.
Plenty. HTTP_REFERER is not something to rely on.
i'd be very curious as to what browser does not send a referer header when posting from a form. the only cases i could imagine where a referer would be missing would be non-browser clients (like scripts that post comments). the referer header has been around since day one. as far as relying on this header, it depends on what you're relying on it for. since the only clients that would be omitting this field would almost certainly be spammers (or users whose browsers are so obscure i've never heard of them), i consider it reliable enough to use as part of an anti-spam technique. sure spammers will easily bypass this method as soon as it becomes commonly used, but that is the nature of all anti-spam techniques. all anti-spam tools enter this game of escalation. the fact that a spammer can circumvent or overcome a given anti-spam technique is not a reasonable excuse for not implementing it. and i certainly wasn't suggesting this go in core as it's not the type of thing all people would want (like those that want to be able to use methods other than a traditional browser to POST content).
Harry The scripts for spam bots can be easily modified to include a referer that is just the domain name of the site being attacked. This renders the referer defense completely useless. On 10/1/05, Harry Slaughter <harry@slaughters.com> wrote:
Karoly Negyesi wrote:
i believe the domain name can be replaced with a var to make it generic. i'm just not sure if there are cases where a valid client does not send a referrer.
Plenty. HTTP_REFERER is not something to rely on.
i'd be very curious as to what browser does not send a referer header when posting from a form. the only cases i could imagine where a referer would be missing would be non-browser clients (like scripts that post comments). the referer header has been around since day one.
as far as relying on this header, it depends on what you're relying on it for. since the only clients that would be omitting this field would almost certainly be spammers (or users whose browsers are so obscure i've never heard of them), i consider it reliable enough to use as part of an anti-spam technique.
sure spammers will easily bypass this method as soon as it becomes commonly used, but that is the nature of all anti-spam techniques. all anti-spam tools enter this game of escalation. the fact that a spammer can circumvent or overcome a given anti-spam technique is not a reasonable excuse for not implementing it.
and i certainly wasn't suggesting this go in core as it's not the type of thing all people would want (like those that want to be able to use methods other than a traditional browser to POST content).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01 Oct 2005, at 5:58 PM, Khalid B wrote:
The scripts for spam bots can be easily modified to include a referer that is just the domain name of the site being attacked. Indeed.
Hell, you could just use wget --referer='domain.com' and post that way. - -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFDPrcLgegMqdGlkasRAjqbAKCBMs29eicgCHG4ptvGoQR/kEJangCcCv2x PfWkKiPYpSgZybKpYNbsjMc= =yehi -----END PGP SIGNATURE-----
Adrian Rossouw wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 01 Oct 2005, at 5:58 PM, Khalid B wrote:
The scripts for spam bots can be easily modified to include a referer that is just the domain name of the site being attacked.
Indeed.
Hell, you could just use wget --referer='domain.com' and post that way.
i know it's easy to get around. maybe that's why nobody uses this and spammers don't bother falsifying referer fields. it's fine by me if nobody else does it, then i don't have to worry about spammers spending the effort to bypass it and i'm done with spam for good :)
RewriteCond %{HTTP_REFERER} "!^http://(www.)?slaughters.com/.*$" [NC]
This should not go in core. Faking a referer is the easiest thing in the world todo, and it'll be a short amount of time before the spammer software is upgraded to support it. -- Morbus Iff ( two! four! six! eight! who do god and jesus hate! ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
This defense may work for a while, but will be very short lived. Spam bots will be upgraded to fake a referer that contains the domain name. The spam arms race continues ...
One method we may want to look into. When a session is created a for user and they are on a page that allows comments, we come up with a unique hash based on say the node ID and session ID. We store this in the user's session. When the user goes to create a comment, we pass this unique hash with a hidden input field and when they click "post comment" we verify this input hidden hash against one stored in the user's session. This should prevent most spam comments, IMO. ted On 10/1/05, Khalid B <kb@2bits.com> wrote:
This defense may work for a while, but will be very short lived.
Spam bots will be upgraded to fake a referer that contains the domain name.
The spam arms race continues ...
Um, isn't that the idea behind a captcha? We've got that already. http://drupal.org/project/captcha On Saturday 01 October 2005 10:44 am, Theodore Serbinski wrote:
One method we may want to look into. When a session is created a for user and they are on a page that allows comments, we come up with a unique hash based on say the node ID and session ID. We store this in the user's session. When the user goes to create a comment, we pass this unique hash with a hidden input field and when they click "post comment" we verify this input hidden hash against one stored in the user's session. This should prevent most spam comments, IMO.
ted
On 10/1/05, Khalid B <kb@2bits.com> wrote:
This defense may work for a while, but will be very short lived.
Spam bots will be upgraded to fake a referer that contains the domain name.
The spam arms race continues ...
-- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012 "If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
No, a captcha is very different. A captcha relies on some output (letters/numbers in a graphic or an equation to solve) that is human readable/solvable. This is almost impossible for a bot to decipher and hence is a good defense. A referer is trivial to fake. Just put the domain name in the referer and voila: it is not a defence. So, it just becomes another arsenal in the arms race that becomes useless quickly, giving a false sense of securty, wasting effort, and bloating the code. On 10/1/05, Larry Garfield <larry@garfieldtech.com> wrote:
Um, isn't that the idea behind a captcha? We've got that already.
http://drupal.org/project/captcha
On Saturday 01 October 2005 10:44 am, Theodore Serbinski wrote:
One method we may want to look into. When a session is created a for user and they are on a page that allows comments, we come up with a unique hash based on say the node ID and session ID. We store this in the user's session. When the user goes to create a comment, we pass this unique hash with a hidden input field and when they click "post comment" we verify this input hidden hash against one stored in the user's session. This should prevent most spam comments, IMO.
ted
On 10/1/05, Khalid B <kb@2bits.com> wrote:
This defense may work for a while, but will be very short lived.
Spam bots will be upgraded to fake a referer that contains the domain name.
The spam arms race continues ...
-- Larry Garfield AIM: LOLG42 larry@garfieldtech.com ICQ: 6817012
"If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it." -- Thomas Jefferson
Um, isn't that the idea behind a captcha? We've got that already.
There are already scripts that defeat CAPTCHA, and CAPTCHA restricts the ability of disabled folks to use your site: http://www.w3.org/TR/2003/WD-turingtest-20031105/ -- Morbus Iff ( morbus == grumblestiltskin ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Saturday 01 October 2005 10:44 am, Theodore Serbinski wrote:
One method we may want to look into. When a session is created a for user and they are on a page that allows comments, we come up with a unique hash based on say the node ID and session ID. We store this in the user's session. When the user goes to create a comment, we pass this unique hash with a hidden input field and when they click "post comment" we verify this input hidden hash against one stored in the user's session. This should prevent most spam comments, IMO.
Anything you can think of, I can program a bot for. The above isn't a solution. -- Morbus Iff ( morbus == grumblestiltskin ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Sat, 1 Oct 2005 11:44:51 -0400 Theodore Serbinski <tss24@cornell.edu> wrote:
One method we may want to look into. When a session is created a for user and they are on a page that allows comments, we come up with a unique hash based on say the node ID and session ID. We store this in the user's session. When the user goes to create a comment, we pass this unique hash with a hidden input field and when they click "post comment" we verify this input hidden hash against one stored in the user's session. This should prevent most spam comments, IMO.
The spammer has access to the node ID and the session ID, so they can easily fake the hash you suggest. But if you tie it together with a private key (owned by the website), then you've got something. Something similar is in core already, and will be in Drupal 4.7. It currently cuts out over 99% of the spam I see on KernelTrap: http://drupal.org/node/28420 (#20, #21 and #26 in particular) There are potential issues to be solved, but it's a step in the right direction. -Jeremy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02 Oct 2005, at 6:10 AM, Jeremy Andrews wrote:
Something similar is in core already, and will be in Drupal 4.7. It currently cuts out over 99% of the spam I see on KernelTrap: http://drupal.org/node/28420
This has been integrated into the form api. To make any form require a token, you set $form[token] = $key; Where key is something specific to the form .. in the case of comment : $form[token] = 'comment' . $edit['nid'] . $edit['pid']; It's still fairly easy to download the page first and grep out the token to send back though, but it's extra work for the spammer. - -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFDP7/SgegMqdGlkasRArSaAJwND55A4jdH/DwS7e3fUjKTvlZ6EQCgutav whI0qd2ZvcvxJvu4aKCuezM= =Kl2Q -----END PGP SIGNATURE-----
On Sun, 2 Oct 2005 13:09:05 +0200 Adrian Rossouw <adrian@bryght.com> wrote:
Something similar is in core already, and will be in Drupal 4.7. It currently cuts out over 99% of the spam I see on KernelTrap: http://drupal.org/node/28420
This has been integrated into the form api.
Cool! :)
To make any form require a token, you set $form[token] = $key;
Where key is something specific to the form .. in the case of comment : $form[token] = 'comment' . $edit['nid'] . $edit['pid'];
It's still fairly easy to download the page first and grep out the token to send back though, but it's extra work for the spammer.
Yes. The best solution I have come up with is to track token use, preventing token re-use. I had a nearly working patch a while ago (it tracked the last n-used tokens), but ran out of time. It had some issues telling previews and submits apart, as well as with handling followup edits. When it becomes necessary, I will surely dust it off again. Cheers, -Jeremy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02 Oct 2005, at 2:22 PM, Jeremy Andrews wrote:
Yes. The best solution I have come up with is to track token use, preventing token re-use. I had a nearly working patch a while ago (it tracked the last n-used tokens), but ran out of time. It had some issues telling previews and submits apart, as well as with handling followup edits. When it becomes necessary, I will surely dust it off again. Why not add a count variable to the token generation, and have a db table / variable keeping track of how many times you have used the token (ie: successful submission).
- -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFDP9RXgegMqdGlkasRAgCHAKDEJJBVAQjOfH+SvSeSmogi/7KwmwCeLayL N8Kbuz7xq1A/XRRtutjInNo= =Zu5l -----END PGP SIGNATURE-----
On Sun, 2 Oct 2005 14:36:39 +0200 Adrian Rossouw <adrian@bryght.com> wrote:
On 02 Oct 2005, at 2:22 PM, Jeremy Andrews wrote:
Yes. The best solution I have come up with is to track token use, preventing token re-use. I had a nearly working patch a while ago (it tracked the last n-used tokens), but ran out of time. It had some issues telling previews and submits apart, as well as with handling followup edits. When it becomes necessary, I will surely dust it off again.
Why not add a count variable to the token generation, and have a db table / variable keeping track of how many times you have used the token (ie: successful submission).
There is some confusion introduced from the fact that it is perfectly legitimate to "preview" a comment as many times as you like - so the forms logic would have to know the difference. It's also legitimate to "submit" a comment multiple times as there maybe errors that have to be fixed - that's a little more difficult to work around. Finally, it can also be legitimate to "edit" a comment many times. In all of these cases, the same token is used. A simpler solution I just thought of would require the introduction of a three column table: token, type, id The token column holds the token. The type column is a text string holding the content type (ie, comment, node, etc...) The id column holds the uniqe id for that form (ie, the cid, the nid, etc...) Insert the token/type/id combo when generating the token, but be sure the token hasn't been used with another id for that data type. This avoids all the problems described above, and would prevent token re-use by comment spammers. The only problem is that this solution doesn't work for forms that don't have unique id's, such as the contact form. Perhaps that's okay. (You could have an 'id' of 0 in such cases.) -Jeremy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02 Oct 2005, at 3:16 PM, Jeremy Andrews wrote:
This avoids all the problems described above, and would prevent token re-use by comment spammers. The only problem is that this solution doesn't work for forms that don't have unique id's, such as the contact form. Perhaps that's okay. (You could have an 'id' of 0 in such cases.)
all forms have unique id's in the form api. - -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFDP+SBgegMqdGlkasRAuiAAKDYzr5vT+Eq6+Vt1Qaspr0cbqKAhgCfU6GU a2YP8FDfPD2To7EhIhIh7JI= =Zgjp -----END PGP SIGNATURE-----
On Sun, 2 Oct 2005 15:45:36 +0200 Adrian Rossouw <adrian@bryght.com> wrote:
This avoids all the problems described above, and would prevent token re-use by comment spammers. The only problem is that this solution doesn't work for forms that don't have unique id's, such as the contact form. Perhaps that's okay. (You could have an 'id' of 0 in such cases.)
all forms have unique id's in the form api.
In what way? For example, how would it work on this form: http://drupal.org/contact If I load the form twice, does it have a different id each time? How about if two different people load the form? Thanks, -Jeremy
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02 Oct 2005, at 3:59 PM, Jeremy Andrews wrote:
all forms have unique id's in the form api.
In what way? For example, how would it work on this form: http://drupal.org/contact form_id for that form is 'contact_mail_page'.
Well, we could make the key $form_id + $session_id + $x + [optional $key]. Where $x is how many times that specific combination has been used. Also, we have an $form_id_execute process now, if a form validates, it tries to execute, and not before that.. We could handle incrementing $x in that process.
If I load the form twice, does it have a different id each time? How about if two different people load the form? Every time you submit the form, it will be different on subsequent reloads. Every person will have a different token, due to the session id being part of it.
- -- Adrian Rossouw Drupal developer and Bryght Guy http://drupal.org | http://bryght.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iD8DBQFDQTvCgegMqdGlkasRAjQeAJ4r3YgdRXzHZRZPAyPYTlUgxCtLOgCdHdrO t8TfGUgGe98h19tA/g30RK8= =Q+Un -----END PGP SIGNATURE-----
On Mon, 3 Oct 2005 16:10:10 +0200 Adrian Rossouw <adrian@bryght.com> wrote:
If I load the form twice, does it have a different id each time? How about if two different people load the form?
Every time you submit the form, it will be different on subsequent reloads.
Every person will have a different token, due to the session id being part of it.
However, as the session id is stored on the client, it can be controlled by the spammer. Thus, a spammer could simply use the same session_id to submit the same form with different data. We have to allow multiple submits from the same session_id to handle previews and submits with errors... -Jeremy
Harry Slaughter wrote: I guess nobody understood the point I was trying to make, which turns out to be good for me. As long as this simple solution is not broadly used, the spammers are unlikely to bother adding the one line workaround for it. So everybody please disregard this suggestion. Captcha, anonymous comment disabling and moderation are really the best solutions.
participants (8)
-
Adrian Rossouw -
Harry Slaughter -
Jeremy Andrews -
Karoly Negyesi -
Khalid B -
Larry Garfield -
Morbus Iff -
Theodore Serbinski