[development] preventing update_status abuse vs. drupal.org privacy

Derek Wright drupal at dwwright.net
Tue Dec 18 03:26:00 UTC 2007


The update_status module (D5 contrib, D6 core) queries drupal.org for  
information about the available releases for a given project.   
Included in the request is an anonymous hash to uniquely but  
anonymously identify the site, so we can record usage statistics  
about d.o projects.  This was the topic of a long discussion about  
drupal.org privacy, whether update.module in D6 core should be opt-in  
or opt-out, etc[1].  One of the open tasks coming out of that  
discussion is the creation of a drupal.org privacy policy[2].

However, once these usage statistics become more visible on d.o[3],  
once you have the ability to sort modules based on usage, etc, the  
temptation for people to try to abuse the system will inevitably  
grow. :(  Of course, we generally try to trust everyone as much as  
possible, but Joomla went through a very similar situation of abuse 
[4] and had to take a number of steps to try to manage it.

So, there's now a patch (thanks, drewish!) sitting in the project  
issue queue that would at least help us notice abuse and begin to be  
able to do something about it:

http://drupal.org/node/168009

This would change the project_usage module (the thing on d.o itself  
that's recording the usage queries and will be making the data  
available to the project* UI as soon as enough people who care about  
that start helping[3]) to record the IP address of the incoming usage  
query, along with the project being requested and the unique key.   
This would allow us to notice if a large number of requests for usage  
were coming from the same IP using different (bogus) keys.  Of  
course, that would also happen on a multi-site install, or via shared  
hosting, but we can certainly factor that in before declaring abuse.

I should note that the httpd logs _already_ record the IP addresses  
of all incoming requests to *.d.o, so it's not like this is  
fundamentally new data we'd be collecting.  It's just that the data  
would live in the d.o DB itself, not just on the d.o filesystem  
(which is harder to get to).  However, unless we have this data  
directly in the same usage-related DB tables, it'll be _MUCH_ more  
difficult and time consuming to try to track down suspicions of abuse  
and be able to do anything about it if/when we discover any.

So, what do y'all say?  Should I commit this patch to make it easier  
for d.o admins to limit/prevent abuse of the update_status usage  
tracking system?  Is that too much for the privacy-conscious to  
handle?  If I commit the patch, what additional warnings/indications  
should we give to site admins via the UI to know what's happening  
when they enable update(_status).module?  Are the existing warnings  
good enough to let the privacy-conscious know they shouldn't enable it?

Don't reply here, please use the issue:

http://drupal.org/node/168009#comment-form

Thanks,
-Derek (dww)


[1] http://drupal.org/node/178581
[2] http://drupal.org/node/178776
[3] http://drupal.org/node/165380
[4] http://www.joomla.org/component/option,com_jd-wp/Itemid,33/p,344/





More information about the development mailing list