[development] preventing update_status abuse vs. drupal.org privacy
drupal at dwwright.net
Tue Dec 18 03:26:00 UTC 2007
The update_status module (D5 contrib, D6 core) queries drupal.org for
information about the available releases for a given project.
Included in the request is an anonymous hash to uniquely but
anonymously identify the site, so we can record usage statistics
about d.o projects. This was the topic of a long discussion about
drupal.org privacy, whether update.module in D6 core should be opt-in
or opt-out, etc. One of the open tasks coming out of that
However, once these usage statistics become more visible on d.o,
once you have the ability to sort modules based on usage, etc, the
temptation for people to try to abuse the system will inevitably
grow. :( Of course, we generally try to trust everyone as much as
possible, but Joomla went through a very similar situation of abuse
 and had to take a number of steps to try to manage it.
So, there's now a patch (thanks, drewish!) sitting in the project
issue queue that would at least help us notice abuse and begin to be
able to do something about it:
This would change the project_usage module (the thing on d.o itself
that's recording the usage queries and will be making the data
available to the project* UI as soon as enough people who care about
that start helping) to record the IP address of the incoming usage
query, along with the project being requested and the unique key.
This would allow us to notice if a large number of requests for usage
were coming from the same IP using different (bogus) keys. Of
course, that would also happen on a multi-site install, or via shared
hosting, but we can certainly factor that in before declaring abuse.
I should note that the httpd logs _already_ record the IP addresses
of all incoming requests to *.d.o, so it's not like this is
fundamentally new data we'd be collecting. It's just that the data
would live in the d.o DB itself, not just on the d.o filesystem
(which is harder to get to). However, unless we have this data
directly in the same usage-related DB tables, it'll be _MUCH_ more
difficult and time consuming to try to track down suspicions of abuse
and be able to do anything about it if/when we discover any.
So, what do y'all say? Should I commit this patch to make it easier
for d.o admins to limit/prevent abuse of the update_status usage
tracking system? Is that too much for the privacy-conscious to
handle? If I commit the patch, what additional warnings/indications
should we give to site admins via the UI to know what's happening
when they enable update(_status).module? Are the existing warnings
good enough to let the privacy-conscious know they shouldn't enable it?
Don't reply here, please use the issue:
More information about the development