[development] using #downloads for quality of modules

Sat Jan 28 05:48:14 UTC 2006

Hi,

On Wed, 2006-01-25 at 09:29 +0100, Dries Buytaert wrote:
> > > client-side functionality has been implemented, we still need to
> > > implement the backend infrastructure to process the data, and to
> > > integrate it with the project module.  Want to help?  Let me know.
> >
> > Yes I think I could spear some time for something like this.
> 
> Awesome.  In that case, take a look at the database scheme Nedjo has
> implemented and try to work out a way to obtain ratings from it.  We
> also need to figure out a way to make the system tamper-proof.  The
> third and last action item is to show the ratings in the project
> module.  We've been discussing some of this in the project module's
> issue tracker so make sure to tune in.   I'd focus on the first two
> items because these are most pressing; if it turns out we have to make
> changes to the drupal.module, we want to do that before Drupal 4.7.0
> is released.

I have had a look at the drupal.module to see what information is being
past, and see if it could be tampered with so that it could alter the
results.

All the information that is provided is quite good to create some kind
of rating system, but I would add 1 more piece of information. I know
that we have users, I would also have active users. This would most
likely be something like number of users that have accessed the site in
the last month, or number of users that have posted something in the
last month. The later would bring the number down a lot, so maybe the
first would be enough. 

What I would like to see is a simple rating, like a number between 0.0
and 10.0 which would be made up of a number of different components
which either increase or decrease the value.

I like calculated rating systems. Ones which require people to vote or a
counter of the number or downloads isn't a good indication. Downloads
for drupal is actually a poor indicator because I know that I never
download though drupal.org and use cvs get a version of the required
version. Now this isn't counted. Also voting means that the less sexy
modules will not get voted on at all.

Something that would be a large part of the over all figure would be the
the percentage of the number of sites that provide a module list who run
the module. Other factors like maybe number of open bugs should detract
from the rating. 

Things like the total number of active users of sites which use your
module. So if a module is only installed in 2% of sites, but were sites
with larger user bases and your module had 60% of the active user base,
then this would something that should increase your rating.

Using this method of building up the rating would also stop tampering as
there would be figures out of people control. Also having something like
number of open issues detract from a rating would make it a good
incentive for people to work issues.

Having other measures like downloads would be ok but I would weight
these so that they do not have as much effect on the rating, as say
number of sites running it.

Adding active users to the rpc would be a good idea, because I know that
drupal.org has over 40000 (I think) users but they would not be all
active. 

What we could do is create a api system so that all of the individual
parts of the ratings are individual calls and then people can think of
different methods of rating modules, and just implements this
calculation. This could then be added and removed easily and also allow
these to be rating to be weighted depending on the importance of the
calculation. So if we had some obscure calculation like modules owned by
user 959 get an additional 5 points will be weighted down.

We could also do something like the google page rank, or for Trekkie's
warp factor in that each value is twice that of it predecessor, so
getting a module to 10, would just about be impossible.

Also these calculations could get quite complicated and take a while to
run, so what we could do is only update ratings only once a month, as a
bad rating will hang around for a bit.

Basically the more components that harder to tamper with the rating.

Just a few of my ideas.
Gordon.