[documentation] [task] Feedback on how to do A/B testing using a multi-site configuration

njivy drupal-docs at drupal.org
Thu Feb 16 00:17:00 UTC 2006


Issue status update for 
http://drupal.org/node/49561
Post a follow up: 
http://drupal.org/project/comments/add/49561

 Project:      Documentation
 Version:      <none>
 Component:    Admin Guide
 Category:     tasks
 Priority:     normal
 Assigned to:  njivy
 Reported by:  njivy
 Updated by:   njivy
 Status:       active

Actually, with the $base_url hack the modifications to .htaccess in the
shadow Drupal installation (configuration B) are not necessary.  So
step 3 is simplified, and this should improve performance.




njivy



Previous comments:
------------------------------------------------------------------------

Wed, 15 Feb 2006 19:05:37 +0000 : njivy

Here is what I want from you:  I want you to review the validity of my
approach to A/B testing and to recommend your alternate approaches for
different operational circumstances.  With your feedback (and after an
unspecified length of time), I am willing to draft documentation based
on our collective wisdom for your review.


I have invested a few hours in understanding Drupal's multi-site
capabilities and how they can be used to perform A/B testing of site
themes.  I looked through the available documentation but found a lack
of helpful information on this topic.  In other words, we have an
opportunity to improve the documentation of multi-site setups and A/B
testing procedures.


Although the topic of multi-site setups could undoubtedly benefit from
independent improvement, it plays a key role in my approach to A/B
testing.  So I am including some multi-site tips here.


But first, let me explain what I mean by A/B testing.  In A/B testing,
we test the performance of A versus B.  For example, we may wish to
test the effect of a font change on ad click-through rates.  So we
setup two site configurations and sample (i.e. measure) visitor
behavior under both.  Then using statistical techniques (e.g.
hypothesis testing or factorial design), we can estimate the true
effect of B versus A.


The operational circumstances dictate which approaches to A/B testing
are feasible.  In my case, I am using a shared LAMP [1] server.  With a
dedicated Apache process, I would have considered the following:



* Apache's RewriteMap [2]
* A front-end proxy like Squid [3]
* Sub-domains.  (I am too cheap to pay my host, who is otherwise
excellent [4].)

But given my limitations, here is how I setup a multi-site Drupal
4.7.0-beta4 installation for A/B testing.



* Create a shadow installation of Drupal for configuration B.

* Create a directory in Drupal's root directory to hold the shadow
installation.
$ cd /path/to/drupal
$ mkdir b
* Link all but index.php of the shadow installation to the real
installation.
$ cd b
$ ln -s ../database database
$ ln -s ../files files
$ ln -s ../includes includes
$ ln -s ../misc misc
$ ln -s ../modules modules
$ ln -s ../sites sites
$ ln -s ../themes themes
$ ln -s ../xmlrpc.php xmlrpc.php
* Copy index.php to the shadow installation.  (We'll edit it later.)
$ cp ../index.php .


* Setup the settings file for configuration B using Drupal's multi-site
capabilities.

* Create a directory for configuration B's settings file. (Substitute
your domain name, of course.)
$ cd sites
$ mkdir example.com.b
* Copy the default settings file into configuration B's directory.
$ cp default/settings.php example.com.b
* Edit configuration B's settings file at your discretion. For example,
you could specify an alternate Drupal theme.


* Edit the .htaccess files to ensure web visitors view pages under the
correct configuration.  There are two basic options for this step.

* /Option 1:/ Randomly assign configuration B to visitors of a certain
page.  Below is the extra mod_rewrite [5] code for the .htaccess file
in the default Drupal installation (configuration A).  Place the
codeimmediately after RewriteBase.
  RewriteCond %{REQUEST_METHOD} =GET
  RewriteCond %{REQUEST_URI}    ^/path-of-page-to-test$
  RewriteCond %{TIME_SEC}       >30
  RewriteRule ^(.*)$            /b/$1 [L,QSA]
Adjust the number 30 to affect what proportion of visitors are randomly
directed to configuration B.
  
* /Option 2:/ Require configuration B for some pages and configuration
A for others.  Again, the following code goes immediately after
RewriteBase in the default .htaccess.
  RewriteCond %{REQUEST_METHOD} =GET
  RewriteCond %{REQUEST_URI}   
^/(regular-expression-describing-the-pages-for-B)$ [OR]
  RewriteCond %{REQUEST_URI}    ^/(another-regular-expression)$
  RewriteRule ^.*$                       /b/%1 [L]
  

  In either case, the .htaccess file in the shadow Drupal installation
(configuration B) needs to match.
  RewriteCond %{REQUEST_METHOD}    =GET
  RewriteCond %{REQUEST_URI}      
!^/b/(themes/|misc/|files/|index.php).*
  RewriteCond %{REQUEST_URI}      
!^/b/regular-expression-describing-the-pages-for-B$
  RewriteCond %{REQUEST_URI}       !^/b/another-regular-expression$
  RewriteCond %{REQUEST_URI}       !^/b/path-of-page-to-test$
  RewriteRule ^(.*)$                        http://example.com/$1  
[L,R]

* To make sure a visitor (like the Google-bot) does not get stuck in
configuration B, we redefine the $base_url immediately before the page
is themed. drupal_get_html_head() relies on this variable to set <base
href="http://example.com" />, but we cannot redefine it earlier because
the multi-site code apparently depends on it as well.
Edit index.php in the shadow Drupal installation.
  default:
    if (!empty($return)) {
      global $base_url;
      $base_url = 'http://example.com';
      print theme('page', $return);
    }
    break;


This approach has some surprising advantages.  First, visitors return
to configuration A by following any relative URL on a configuration-B
page.  Secondly, the redirects are all internal to the web server.  The
address in the browser bar never changes, so there is no opportunity for
broken bookmarks.  And finally, because the browser-visible addresses
are the same for both configurations A and B, context-sensitive ads
should not be disrupted.


After using this approach, it is my intention to set use web analytics
to measure the difference in performance between the configurations A
and B.  Then, with statistical hypothesis testing methods, I intend to
estimate the true difference and determine if B is indeed better than
A.


But this approach has not been tested.  So, once again, here is what I
want from you:  I want you to review the validity of my approach to A/B
testing and to recommend your improvements and alternatives under
different operational circumstances.  Then we can document our
collective expertise for the benefit of each other and the Drupal
community.


Nic Ivy
[1] http://en.wikipedia.org/wiki/L.A.M.P
[2] http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteMap
[3] http://www.squid-cache.org/
[4] http://futurequest.net/
[5] http://drupal.org/httpd.apache.org/docs/1.3/mod/mod_rewrite.html






More information about the documentation mailing list