My site is under attack (trackbacks, spam and cpu usage).
Hello, I am curious: is anyone using the trackback.module and allowing incoming trackbacks? Spammers have a vicious script designed for Drupal, that submits spam trackbacks in a loop, every few minutes, 24/24h. Even though not ONE of their trackbacks has EVER been published on the site, once your site is entered into their registry, they'll never bother take it off. It seems the only human intervention is to ADD new sites to spam in the robot's registry, never to remove any. Even though I have disabled the trackback.module weeks ago (!!!), my logs are still flooded with "warning page not found trackback/$nid not found. " In such a situation, I wonder how anyone could be using the trackback.module for any length of time. My particular concern at this time is server resources. I know there is a spam module that can automatically delete spam trackbacks, but it won't solve the resources problems. My site hasn't had any new content for a week, and the Drupal cache should be working at its best, and the CPU load should be at its lowest. However, the opposite is true. --------------------------------------------- | For the week | For the day | | rank - % | rank - % | --------------------------------------------| cpu | 86th - 0.216% | 190th - 0.107% | hit | 504th - 0.032% | 543th - 0.030% | Bandwidth | 404th - 0.045% | 370th - 0.030% | --------------------------------------------- See the high cpu usage compared to hits and bandwidth. The relatively lower cpu rank for the day is only due to a server upgrade which rendered spamming impossible. Now, I have already noted a few weeks ago that the cpu usage of a Drupal site is higher than the cpu usage of other sites. Another Drupal site I have and which has never used the trackback module (and therefore never been entered in the spammers' registry) is showing the same pattern of a higher cpu usage. However, it is not as bad as this site. For the sake of the other web sites co-hosted on the same server, I'd like to drastically cut down on cpu usage. I'd like to add a directive at the top of .htaccess that ends straightaway any request to trackback/$nid (so that Drupal never gets bootstrapped). Would that work? What would I need to add to .htaccess? If you have some insights on the wider spam issue and trackback spam in particular, please do share. I repeat that the spam.module is not an option: it would increase even further the cpu usage when I want to minimize it. thanks, Augustin. -- http://www.wechange.org/ Because we and the world need to change. http://www.reuniting.info/ Intimate Relationships, peace and harmony in the couple.
Hi Augustin, I had a similar problem some time ago, and ended up writing the very simple 'trackback_blackhole' module which solved the resources issues for me. The module is distributed with the v2 spam module, available here: http://kerneltrap.org/jeremy/drupal/spam/#downloads You can download the whole tarball, and then just install the trackback_blackhole module without installing the spam module. I hope that helps. Cheers, -Jeremy On Mon, 18 Sep 2006 11:14:49 +0800 "Augustin (Beginner)" <drupal.beginner@wechange.org> wrote:
Hello,
I am curious: is anyone using the trackback.module and allowing incoming trackbacks?
Spammers have a vicious script designed for Drupal, that submits spam trackbacks in a loop, every few minutes, 24/24h.
Even though not ONE of their trackbacks has EVER been published on the site, once your site is entered into their registry, they'll never bother take it off. It seems the only human intervention is to ADD new sites to spam in the robot's registry, never to remove any. Even though I have disabled the trackback.module weeks ago (!!!), my logs are still flooded with "warning page not found trackback/$nid not found. "
In such a situation, I wonder how anyone could be using the trackback.module for any length of time.
My particular concern at this time is server resources. I know there is a spam module that can automatically delete spam trackbacks, but it won't solve the resources problems.
My site hasn't had any new content for a week, and the Drupal cache should be working at its best, and the CPU load should be at its lowest. However, the opposite is true.
--------------------------------------------- | For the week | For the day | | rank - % | rank - % | --------------------------------------------| cpu | 86th - 0.216% | 190th - 0.107% | hit | 504th - 0.032% | 543th - 0.030% | Bandwidth | 404th - 0.045% | 370th - 0.030% | ---------------------------------------------
See the high cpu usage compared to hits and bandwidth. The relatively lower cpu rank for the day is only due to a server upgrade which rendered spamming impossible. Now, I have already noted a few weeks ago that the cpu usage of a Drupal site is higher than the cpu usage of other sites. Another Drupal site I have and which has never used the trackback module (and therefore never been entered in the spammers' registry) is showing the same pattern of a higher cpu usage. However, it is not as bad as this site.
For the sake of the other web sites co-hosted on the same server, I'd like to drastically cut down on cpu usage. I'd like to add a directive at the top of .htaccess that ends straightaway any request to trackback/$nid (so that Drupal never gets bootstrapped).
Would that work? What would I need to add to .htaccess?
If you have some insights on the wider spam issue and trackback spam in particular, please do share.
I repeat that the spam.module is not an option: it would increase even further the cpu usage when I want to minimize it.
thanks,
Augustin.
@everybody: see the second part of this email about a better long term solution... On Monday 18 September 2006 11:40 am, Jeremy Andrews wrote:
I had a similar problem some time ago, and ended up writing the very simple 'trackback_blackhole' module which solved the resources issues for me. The module is distributed with the v2 spam module, available here: http://kerneltrap.org/jeremy/drupal/spam/#downloads
You can download the whole tarball, and then just install the trackback_blackhole module without installing the spam module.
Thanks Jeremy, I had a look. Actually, I wanted to write a simple module myself to do exactly the same thing. I find your use of hook_init() very clever in this case. However, I think the use of a .htaccess directive would be more efficient, because there is not even the need of a partial Drupal bootstrap. I've added the following code to my .htaccess at www.wechange.org : <FilesMatch "^(trackback)"> Order deny,allow Deny from all </FilesMatch> It denies access to any request starting with trackback like: trackback/123 trackbacks_are_evil etc, but allows requests like: blog/how_to_deal_with_trackback_spam I'll report back in a week on the difference it makes on cpu usage. Jeremy: you might want to add this code snippet (or any improved version of it) in your INSTALL or README files. One day, I will want to reopen comment submissions by anonymous users and then I will need the spam.module, so I appreciate your efforts on your spam.module. Thanks a lot! @everybody: the .htaccess solution works for my immediate need, but it is a bit selfish because it doesn't help anyone else. What follows is not specific to trackback spam, but is relevant to any kind of spam being propagated via compromised servers or computers. <strong class="must-understand" > The only thing needed for evil to win, is that good people do nothing. </strong> At first, all the trackback spam came from the same IP, but then they upgraded their software, so that each spam submission came from a different IP. Certainly, each of those IP correspond to a compromised Windows(TM) box, or a compromised web site (using a CMS minus security updates), don't they? (or do I misunderstand the way open relays can be used?) For now, I have successfully denied trackback spammers access to my site, but they are still free to spam the rest of the world. What bothered me the most about cpu usage, was that it was such a waste: it was not even helping the spammers who never got a single of their links published. Now, if my cpu power can be put to better uses, I don't mind the extra resources needed: is there a way to collect those IPs used by spammers, and share them among us, and with organizations fighting spam. The aim would be to get wormed or trojaned windows(TM) boxes (or compromised web sites) to upgrade to a safe version or shut down. If all Drupal web sites were collaborating on gathering useful data, and passing on this data to relevant organizations, we might collectively achieve something. One spam report against one IP might achieve nothing, but a concerted effort to systematically denounce bad IPs might force people to take positive actions. I really don't know how such a thing could be organized. One has to study first how organizations fighting spam and organizations setting up blacklists operate. Maybe the developers on this list have better, more concrete ideas... Blessings, augustin. P.S. 50 minutes since the .htaccess update, and since the last log entry about "trackback/123 not found". Yeah! :) -- http://www.wechange.org/ Because we and the world need to change. http://www.reuniting.info/ Intimate Relationships, peace and harmony in the couple.
On Mon, Sep 18, 2006 at 03:26:42PM +0800, Augustin (Beginner) wrote:
One day, I will want to reopen comment submissions by anonymous users and then I will need the spam.module, so I appreciate your efforts on your spam.module. Thanks a lot!
I've been using the textimage module and it works fine. Anonymous comments are approved by default. Thanks to the module ;-) -- GNU/Linux registered user #224950 Proud Egyptian GNU/Linux User Group <www.eglug.org> Member. Life powered by Debian, Homepage: www.foolab.org -- Don't send me any attachment in Micro$oft (.DOC, .PPT) format please Read http://www.gnu.org/philosophy/no-word-attachments.html Preferable attachments: .PDF, .HTML, .TXT Thanx for adding this text to Your signature
On Mon, 18 Sep 2006 15:26:42 +0800 "Augustin (Beginner)" <drupal.beginner@wechange.org> wrote: [...snip...]
I had a look. Actually, I wanted to write a simple module myself to do exactly the same thing. I find your use of hook_init() very clever in this case.
However, I think the use of a .htaccess directive would be more efficient, because there is not even the need of a partial Drupal bootstrap. I've added the following code to my .htaccess at www.wechange.org :
Prior to creating my module, I was occasionally getting DDoS'd by trackback and comment spam coming from tens of thousands of IP addresses. It was absurd. I created this module to deal with trackbacks, and form tokens to deal with comment spam, and suddenly my CPU went back to nearly idle as it should be. At that point I lost my inspiration to explore the problem further, as it was no longer a problem. I'm not convinced you need anything more than the trackback_blackhole module to solve the resource issues, though obviously the sooner you can stop the attack the better. Cheers, -Jeremy
On Monday 18 September 2006 11:40 am, Jeremy Andrews wrote:
Hi Augustin,
I had a similar problem some time ago, and ended up writing the very simple 'trackback_blackhole' module which solved the resources issues for me.
I'm curious: how long have you been using the trackback_blackhole? If you disable the module, do you find out that they are still trying to submit spam? (my question really is: do they ever give up?) thanks, Augustin. -- http://www.wechange.org/ Because we and the world need to change. http://www.reuniting.info/ Intimate Relationships, peace and harmony in the couple.
On Mon, 18 Sep 2006 15:43:12 +0800 "Augustin (Beginner)" <drupal.beginner@wechange.org> wrote:
On Monday 18 September 2006 11:40 am, Jeremy Andrews wrote:
Hi Augustin,
I had a similar problem some time ago, and ended up writing the very simple 'trackback_blackhole' module which solved the resources issues for me.
I'm curious: how long have you been using the trackback_blackhole? If you disable the module, do you find out that they are still trying to submit spam? (my question really is: do they ever give up?)
I created the module back in April, and have not tried disabling it. There is a certain amount of idle curiosity if the trackback spammers still target my website, but not enough to disable the module and find out. Out of site, out of mind (pun intended). Cheers, -Jeremy
Spammers have a vicious script designed for Drupal, that submits spam trackbacks in a loop, every few minutes, 24/24h.
Yes, this has happened to me. I redirect their traffic back to them: RedirectMatch /comment/reply/.*/comment/reply http://127.0.0.1/ RedirectMatch /node/.*/trackback.* http://127.0.0.1/ Redirect /cgi-bin/mt/mt-comments.cgi http://127.0.0.1/ Redirect /trackback http://127.0.0.1/ Alternatively, you could block all the IPs at the firewall (thus, dropping the traffic instead of responding to it, above), but that's a bit more difficult to do unless you own the box. -- Morbus Iff ( united we're bland ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
Alternatively, you could block all the IPs at the firewall (thus, dropping the traffic instead of responding to it, above), but that's a bit more difficult to do unless you own the box.
I started blocking the IPs in .htaccess. This worked for a while when I had comment spam attacks. Then they smartened up and started bombadring from various IP addresses which makes this a cat and mouse game.
I started blocking the IPs in .htaccess. This worked for a while when I had comment spam attacks. Then they smartened up and started bombadring from various IP addresses which makes this a cat and mouse
Yep - that's eventually why I started blocking it at a URL level access too: blocking 2 dozen IPs an hour isn't fun, and I'm not confident enough that blocking entire C classes wouldn't harm valid users. -- Morbus Iff ( two! four! six! eight! who do god and jesus hate! ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Mon, 18 Sep 2006, Augustin (Beginner) wrote:
For the sake of the other web sites co-hosted on the same server, I'd like to drastically cut down on cpu usage. I'd like to add a directive at the top of .htaccess that ends straightaway any request to trackback/$nid (so that Drupal never gets bootstrapped).
Would that work? What would I need to add to .htaccess?
We have some .htaccess directives at weblabor.hu to cut down on pointless CPU usage. One is denying requests based on referers (which is trackback related too). SetEnvIfNoCase Referer ".*(casino).*" BadReferrer SetEnvIfNoCase Referer ".*(pharmacy).*" BadReferrer SetEnvIfNoCase Referer ".*(gambling).*" BadReferrer SetEnvIfNoCase Referer ".*(poker).*" BadReferrer SetEnvIfNoCase Referer ".*(pills).*" BadReferrer deny from env=BadReferrer Also if you would like to send a proper(!) "Gone" HTTP code to user agents who try to request your previously available trackback URLs, you can do: RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^trackback - [G] This sends a "Gone" HTTP status to the requester. This is better then an "Access denied" status, since you explictly state that the resource does not exist anymore, and any reference to it should be removed. The actual difference in meaning is only relevant for well-behaving bots, not the spammers, but it is nice to accurately inform well-behaving bots about the situation.
I repeat that the spam.module is not an option: it would increase even further the cpu usage when I want to minimize it.
Do not even think about loading Drupal modules in these pointless cases. The sooner you catch these requests the better. Gabor
Do spammers really leave referrers? 2006/9/18, Gabor Hojtsy <gabor@hojtsy.hu>:
We have some .htaccess directives at weblabor.hu to cut down on pointless CPU usage. One is denying requests based on referers (which is trackback related too).
SetEnvIfNoCase Referer ".*(casino).*" BadReferrer SetEnvIfNoCase Referer ".*(pharmacy).*" BadReferrer SetEnvIfNoCase Referer ".*(gambling).*" BadReferrer SetEnvIfNoCase Referer ".*(poker).*" BadReferrer SetEnvIfNoCase Referer ".*(pills).*" BadReferrer deny from env=BadReferrer
Also if you would like to send a proper(!) "Gone" HTTP code to user agents who try to request your previously available trackback URLs, you can do:
RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^trackback - [G]
This sends a "Gone" HTTP status to the requester. This is better then an "Access denied" status, since you explictly state that the resource does not exist anymore, and any reference to it should be removed. The actual difference in meaning is only relevant for well-behaving bots, not the spammers, but it is nice to accurately inform well-behaving bots about the situation.
We are talking about *trackback* spam, where one of the goals of spammers is to poison your page with links based on the referer value. Gabor On Mon, 18 Sep 2006, Johan Forngren wrote:
Do spammers really leave referrers?
2006/9/18, Gabor Hojtsy <gabor@hojtsy.hu>:
We have some .htaccess directives at weblabor.hu to cut down on pointless CPU usage. One is denying requests based on referers (which is trackback related too).
SetEnvIfNoCase Referer ".*(casino).*" BadReferrer SetEnvIfNoCase Referer ".*(pharmacy).*" BadReferrer SetEnvIfNoCase Referer ".*(gambling).*" BadReferrer SetEnvIfNoCase Referer ".*(poker).*" BadReferrer SetEnvIfNoCase Referer ".*(pills).*" BadReferrer deny from env=BadReferrer
Also if you would like to send a proper(!) "Gone" HTTP code to user agents who try to request your previously available trackback URLs, you can do:
RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^trackback - [G]
This sends a "Gone" HTTP status to the requester. This is better then an "Access denied" status, since you explictly state that the resource does not exist anymore, and any reference to it should be removed. The actual difference in meaning is only relevant for well-behaving bots, not the spammers, but it is nice to accurately inform well-behaving bots about the situation.
Erm, excuse me, I mixed the issue with referer spam :) Too many issues to care about at the same time. Gabor On Mon, 18 Sep 2006, Gabor Hojtsy wrote:
We are talking about *trackback* spam, where one of the goals of spammers is to poison your page with links based on the referer value.
Gabor
On Mon, 18 Sep 2006, Johan Forngren wrote:
Do spammers really leave referrers?
2006/9/18, Gabor Hojtsy <gabor@hojtsy.hu>:
We have some .htaccess directives at weblabor.hu to cut down on pointless CPU usage. One is denying requests based on referers (which is trackback related too).
SetEnvIfNoCase Referer ".*(casino).*" BadReferrer SetEnvIfNoCase Referer ".*(pharmacy).*" BadReferrer SetEnvIfNoCase Referer ".*(gambling).*" BadReferrer SetEnvIfNoCase Referer ".*(poker).*" BadReferrer SetEnvIfNoCase Referer ".*(pills).*" BadReferrer deny from env=BadReferrer
Also if you would like to send a proper(!) "Gone" HTTP code to user agents who try to request your previously available trackback URLs, you can do:
RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^trackback - [G]
This sends a "Gone" HTTP status to the requester. This is better then an "Access denied" status, since you explictly state that the resource does not exist anymore, and any reference to it should be removed. The actual difference in meaning is only relevant for well-behaving bots, not the spammers, but it is nice to accurately inform well-behaving bots about the situation.
I see, thanks for the clarification. 2006/9/18, Gabor Hojtsy <gabor@hojtsy.hu>:
We are talking about *trackback* spam, where one of the goals of spammers is to poison your page with links based on the referer value.
Do spammers really leave referrers?
Oh yes! Many blogs actually used to display referrers in their blocks, and that's when that really started getting into vogue. Nowadays, it's not seen so much, but it's more effort to remove a "feature" than to leave it in, of course. Also, many people don't protect their web log analysis directories from search engines and those referers are counted by search engines as valid links. I'm seeing new comment spam nowadays which is kinda interesting: * The URL looks legit, like http://www.uiboston.net/ * Visiting it /redirects/ you to a legit site (like boston.com). * The comment is suitably generic: "you know what I like about your blog? you talk about your interests!" The assumption I'm making is: * Spammer redirects to a valid site for the duration of attack. * After attack stops, spammer removes redirect, and shows spam site. * All comments that remain, which have been checked by the admin and "seen" as legitimate (due to the redirect to a valid site) now link to the spam site, and an admin probably won't recheck 'em. -- Morbus Iff ( two! four! six! eight! who do god and jesus hate! ) Technical: http://www.oreillynet.com/pub/au/779 Culture: http://www.disobey.com/ and http://www.gamegrene.com/ icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus
On Monday 18 September 2006 11:14 am, Augustin (Beginner) wrote:
--------------------------------------------- | For the week | For the day | | rank - % | rank - % | --------------------------------------------| cpu | 86th - 0.216% | 190th - 0.107% | hit | 504th - 0.032% | 543th - 0.030% | Bandwidth | 404th - 0.045% | 370th - 0.030% | ---------------------------------------------
See the high cpu usage compared to hits and bandwidth. The relatively lower cpu rank for the day is only due to a server upgrade which rendered spamming impossible. Now, I have already noted a few weeks ago that the cpu usage of a Drupal site is higher than the cpu usage of other sites. Another Drupal site I have and which has never used the trackback module (and therefore never been entered in the spammers' registry) is showing the same pattern of a higher cpu usage. However, it is not as bad as this site.
As promissed, here is an update on the cpu usage, more than a week after applying the fix (in .htaccess): --------------------------------------------- | For the week | For the day | | rank - % | rank - % | --------------------------------------------| cpu | 337th - 0.059% | 697th - 0.021% | hit | 708th - 0.021% | 857th - 0.016% | Bandwidth | 692th - 0.024% | 840th - 0.017% | --------------------------------------------- So, Drupal is still using much more CPU than other applications hosted at the same place, but the discrepancy between cpu and hit/bandwidth is not as bad as when I was spammed for trackbacks. Augustin. -- http://www.wechange.org/ Because we and the world need to change. http://www.reuniting.info/ Intimate Relationships, peace and harmony in the couple.
participants (8)
-
Augustin (Beginner) -
Gabor Hojtsy -
Jeremy Andrews -
Jeremy Andrews -
Johan Forngren -
Khalid B -
Mohammed Sameer -
Morbus Iff