[development] Module to Track RSS Subscribers

John Handelaar john at userfrenzy.com
Tue Jan 17 06:20:28 UTC 2006


Greg Knaddison wrote:
> On 1/16/06, Nick Lewis <nick at smartcampaigns.com> wrote:
> 
>>John Handelaar wrote:
>>
>>
>>>IP-based *anything* - Just Say No.  It's *spectactularly* inaccurate
>>>and, frankly, an amateur's mistake.
> 
> <snip>
> 
>>I disagree with the above assumption (though I acknowledge that it is
>>correct -- in *some* ways, and in *some* situations) on the basis of
>>personal experience.
> 
> It does use IP, so it could present problems, but for most situations
> it is a good generalization of what is going on.

Again, not if your tracked users are behind balanced proxies.

There are entire countries which fit that description, and
other surprisingly-large places like the UK which are heavily
affected.

So if (for example) you're in the UK, it's just BROKEN for the 30%
of *everybody* who's on AOL, and another 20%-ish on ja.net,
and (let's be generous) no more than one in twenty others.

55% isn't *some*, ffs.  And by NO definition would the remaining
45% count as "most situations".

I'm taking a maximal estimate there of JaNet usage, but those
numbers don't get any prettier if you reduce that number to zero.

Honestly, I'm a little surprised one can be *in* the analytics
business and not know this stuff.

> John's solution of using the sessionID confuses me (can you expand a
> bit more) but my understanding of it is that it either presents a
> privacy problem or would be confusing to the user or both.

"Solution" is pushing it.  "Wild suggestion out of left field" is
closer :)

Certainly it's not confusing to end users, since it's transparent,
and there are no privacy issues connected to values derived from
session IDs [1] which don't already exist in the fact that Drupal
uses sessions all over the place in the method prescribed by the
authors of the PHP language.

It goes like this:

1)  Module alters the link element in the RSS feed on a per-user
     basis.  Links are amended to force clickthroughs (and referred
     links) through that module's handler.  The new link contains
     an ID [1] and the original destination. [2]

2)  When someone clicks on one of those links, the module logs the
     click and "passes through" to the original destination.

3)  If you want to collect IPs as well, you can use the relational
     database we all have access to to group them by SID:

     SELECT DISTINCT remote_ip FROM linktracker WHERE...


I mean, if you're going to log IPs, you need context.  Otherwise
you end up with either i) too many IPs per session and no trail,
or ii) a metric assload of people hiding behind only one IP who
look like one person if you ignore the context of the session.

You avoid this by basing your primary ID for tracking on the
session which generated the feed.  IP info is secondary, and
you may even get the bonus of it being useful sometimes.



jh

[1]  You can't use actual session IDs for security reasons, but you
      can use something derived from them, like an MD5 hash

[2]  This has caching implications which would need to be addressed.




More information about the development mailing list