[development] Scalable Internal Activity Stream Module
josh at chapterthree.com
Wed Dec 31 18:42:38 UTC 2008
I'm working on what feels like the hundredth (but is really the 4th or
5th) project that includes some variety of Facebook-like "Activity
Stream" for a Drupal-based community. Having tackled this problem in a
number of different ways in the past couple years -- none of which
I've really ever loved -- I'm tempted to launch a new module project
to solve this thing once and for all.
My primary concerns are modularity -- such that anything can
potentially be an Activity -- and scalability to work with 100s of
1000s of actions and users.
After some initial review, I found both this existing module:
However, this seems like a different concept since A) it's focused on
external sources and B) it's based on turning each activity into a
node. Basically, it seems better suited for an individual site
aggregating internet-wide activity than a community reporting on itself.
On the upside, this module (and many others, e.g. userpoints) show a
nice way to handle things modularly. I'm not too worried about that,
really. But I am worried about scaling, and the architecture of
activitystream got me thinking about whether or not the activity-as-
node architecture was workable or not. I'd love opinions here.
Nodes are functional. Facebook already lets you comment on every
little thing that goes on. This is fun! It would be good for this
module to do that too. It also makes future integration with
notification/messaging updates possible, as well as every other
wonderful thing nodes can do.
This will mean huge amounts of nodes, bloating the table. It also
means more overhead when logging activity (node_save vs a single
optimized db_query). I'm also skeptical that the core node table
structure has the right stuff to be queried with maximum efficiency
(e.g. nothing to group similar queries by unless I make a ton of node
There's also the question of unwanted node functions. We don't really
want anyone to edit activities. We also don't want them to start
showing up in search queries.
I could see a possible solution in maintaining an optimized index
table for queries, as well as nodes for functionality, individual page
views, etc. The bloat problem could conceivably be solved by giving
activity nodes (and index entries) a maximum TTL ala watchdog and
other big tables.
I've already got a table/query design down for indexing that seems to
scale very well to 200k activity entries grouped over 20 types and
5000 users. I suppose the next step is to do some testing around what
the overall effects are of having short-lived nodes, and whether or
not the other edge cases can be solved.
I'm wondering if anyone has done any of their own thinking along these
lines and has any comments to add.
Happy New Year!
Josh Koenig, Partner, CTO
AOL IM: chap3josh
More information about the development