SNA - status report
Hi, I would like to share you what I've done in the SoC. At first I test the performance of various data storing method (SQL, gdb, etc). The next step to me was create a graph from a Drupal-based site's sql data. The source of the graph is the node and the comments module. The scripts count activity between users (ie: reply to an article, reply to a comment) then create a directed, weighted graph from these information. Now the graph is just serialized out to a file (might be changed). Here is the data structure: [uidA] => [uid1] => [weight] [uidB] => [uid1] => [weight] [uid2] => [weight] [uidN] => [weight] This is an adjacency list. I store the adjacent of each user. I also write a function that create a graphviz input from this array. The graphviz generated site maps are interesting for me but I know this is not the main purpose of my project. I had very big performance problem with graphviz when I tried to visualize a graph with thousand of edges but I realized that an image with many-many edges is rather useless. Then I simply throw out low-scored edges. I haven't written yet the not-important-edge detector function. I've just set a limit by hand in the script. I generate the social map of a very popular Hungarian portal (~8000 users and ~250000 comments). As far as I can see the next step is to implement the Dijkstra algorithm with this data structure and to test how effective this data structure and data storing. All the results above (images, source codes and test results) can be viewed at http://sna.drupaler.net Aron Novak
Excuse me if we are not on the same page when it comes to your goals. I'm going to be making a few assumptions. The graphs you generated are pretty interesting. However, aren't we more interested in clustering? Changing to a non-directed graph would significantly decrease your data set without really losing much. Also, your graphs didn't seem to reflect taxonomy. I think it would be very valuable to visualize clustering of user posts around subjects. Also, check out Dan Kaminsky's network graph experiments. Besides his unique DNS hacks, he has generated some LARGE, and beautiful, network traffic graphs. He also uses Drupal :) http://www.doxpara.com/ http://www.doxpara.com/?q=node/1133 http://www.opte.org/ ~Rob Novák Áron wrote:
Hi,
I would like to share you what I've done in the SoC. At first I test the performance of various data storing method (SQL, gdb, etc). The next step to me was create a graph from a Drupal-based site's sql data. The source of the graph is the node and the comments module. The scripts count activity between users (ie: reply to an article, reply to a comment) then create a directed, weighted graph from these information. Now the graph is just serialized out to a file (might be changed). Here is the data structure: [uidA] => [uid1] => [weight] [uidB] => [uid1] => [weight] [uid2] => [weight] [uidN] => [weight] This is an adjacency list. I store the adjacent of each user. I also write a function that create a graphviz input from this array. The graphviz generated site maps are interesting for me but I know this is not the main purpose of my project. I had very big performance problem with graphviz when I tried to visualize a graph with thousand of edges but I realized that an image with many-many edges is rather useless. Then I simply throw out low-scored edges. I haven't written yet the not-important-edge detector function. I've just set a limit by hand in the script. I generate the social map of a very popular Hungarian portal (~8000 users and ~250000 comments).
As far as I can see the next step is to implement the Dijkstra algorithm with this data structure and to test how effective this data structure and data storing.
All the results above (images, source codes and test results) can be viewed at http://sna.drupaler.net
Aron Novak
-- ---------------------------------------------------------- It is by Caffeine alone that I set my mind in motion It is by the beans of Java, that my thoughts acquire speed The hands acquire shakes; the shakes become a warning It is by Caffeine alone that I set my mind in motion
2006. 06. 10, szombat keltezéssel 13.44-kor Robert Wohleb ezt írta:
Excuse me if we are not on the same page when it comes to your goals. I'm going to be making a few assumptions.
The graphs you generated are pretty interesting. However, aren't we more interested in clustering? Changing to a non-directed graph would significantly decrease your data set without really losing much. Also, your graphs didn't seem to reflect taxonomy. I think it would be very valuable to visualize clustering of user posts around subjects. Hi,
It's another Summer of Code project here: http://2006.planet-soc.com/node/248 - Content Recommedation Engine. I think this engine should clustering nodes around subjects. I'll do experiments with undirected graphs too. In my humble opinion the real social networks are directed (for example: famous person, many in-edge and relatively few out-edge).
Also, check out Dan Kaminsky's network graph experiments. Besides his unique DNS hacks, he has generated some LARGE, and beautiful, network traffic graphs. He also uses Drupal :)
http://www.doxpara.com/ http://www.doxpara.com/?q=node/1133 http://www.opte.org/
I'm going to study his solution to handle large graph-visualizing. Thanks for the feedback, Aron Novak
Hi, I agree, true social networks are directed (orkut style). I must not be seeing the reason behind the graphs. I assumed the graphs were for seeing the big picture, where clustering would be evident. ~Rob Novák Áron wrote:
2006. 06. 10, szombat keltezéssel 13.44-kor Robert Wohleb ezt írta:
Excuse me if we are not on the same page when it comes to your goals. I'm going to be making a few assumptions.
The graphs you generated are pretty interesting. However, aren't we more interested in clustering? Changing to a non-directed graph would significantly decrease your data set without really losing much. Also, your graphs didn't seem to reflect taxonomy. I think it would be very valuable to visualize clustering of user posts around subjects.
Hi,
It's another Summer of Code project here: http://2006.planet-soc.com/node/248 - Content Recommedation Engine. I think this engine should clustering nodes around subjects.
I'll do experiments with undirected graphs too. In my humble opinion the real social networks are directed (for example: famous person, many in-edge and relatively few out-edge).
Also, check out Dan Kaminsky's network graph experiments. Besides his unique DNS hacks, he has generated some LARGE, and beautiful, network traffic graphs. He also uses Drupal :)
http://www.doxpara.com/ http://www.doxpara.com/?q=node/1133 http://www.opte.org/
I'm going to study his solution to handle large graph-visualizing.
Thanks for the feedback,
Aron Novak
-- ---------------------------------------------------------- It is by Caffeine alone that I set my mind in motion It is by the beans of Java, that my thoughts acquire speed The hands acquire shakes; the shakes become a warning It is by Caffeine alone that I set my mind in motion
On 6/10/06, Robert Wohleb <rob@techsanctuary.com> wrote:
seeing the reason behind the graphs. I assumed the graphs were for seeing the big picture, where clustering would be evident.
One thing is evident - webchick is POPULAR! http://sna.drupaler.net/files/planet-soc.svg Aron - these graphs are really fun to look at and will also provide great value to many many projects. Thanks for doing this and for documenting the work in public. Regards, Greg -- Greg Knaddison | Growing Venture Solutions Denver, CO | http://growingventuresolutions.com Technology Solutions for Communities, Individuals, and Small Businesses
participants (3)
-
Greg Knaddison - GVS -
Novák Áron -
Robert Wohleb