Greetings Drupalistas, I'm working on what feels like the hundredth (but is really the 4th or 5th) project that includes some variety of Facebook-like "Activity Stream" for a Drupal-based community. Having tackled this problem in a number of different ways in the past couple years -- none of which I've really ever loved -- I'm tempted to launch a new module project to solve this thing once and for all. My primary concerns are modularity -- such that anything can potentially be an Activity -- and scalability to work with 100s of 1000s of actions and users. After some initial review, I found both this existing module: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/activitystream/ However, this seems like a different concept since A) it's focused on external sources and B) it's based on turning each activity into a node. Basically, it seems better suited for an individual site aggregating internet-wide activity than a community reporting on itself. On the upside, this module (and many others, e.g. userpoints) show a nice way to handle things modularly. I'm not too worried about that, really. But I am worried about scaling, and the architecture of activitystream got me thinking about whether or not the activity-as- node architecture was workable or not. I'd love opinions here. IN FAVOR: Nodes are functional. Facebook already lets you comment on every little thing that goes on. This is fun! It would be good for this module to do that too. It also makes future integration with notification/messaging updates possible, as well as every other wonderful thing nodes can do. AGAINST: This will mean huge amounts of nodes, bloating the table. It also means more overhead when logging activity (node_save vs a single optimized db_query). I'm also skeptical that the core node table structure has the right stuff to be queried with maximum efficiency (e.g. nothing to group similar queries by unless I make a ton of node types, etc). There's also the question of unwanted node functions. We don't really want anyone to edit activities. We also don't want them to start showing up in search queries. I could see a possible solution in maintaining an optimized index table for queries, as well as nodes for functionality, individual page views, etc. The bloat problem could conceivably be solved by giving activity nodes (and index entries) a maximum TTL ala watchdog and other big tables. I've already got a table/query design down for indexing that seems to scale very well to 200k activity entries grouped over 20 types and 5000 users. I suppose the next step is to do some testing around what the overall effects are of having short-lived nodes, and whether or not the other edge cases can be solved. I'm wondering if anyone has done any of their own thinking along these lines and has any comments to add. Happy New Year! -josh ------------------------------------------ Josh Koenig, Partner, CTO http://www.chapterthree.com AOL IM: chap3josh 1-888-496-3238