[drupal-devel] Request for input on Quiz module data structure [long]

Wed Oct 12 23:35:41 UTC 2005

Hi, folks. Hope that this is the right place for this. Apologies in advance for 
the length, but I wanted to make sure everyone's on the same page before asking 
for feedback.

== BACKGROUND ==
As most know, I worked over the summer on the Quiz module. Thanks to a generous 
injection of funding from Ejovi Nuwere at SecurityLab.net, I've been able to 
both complete most of the work that was not done by the other student assigned 
to this project during SoC, and am also able to now focus on adding additional 
features to the module.

The quiz module was developed around Robert Douglass's specification (which I've 
temporarily mirrored here: http://drupaldev.snarkles.net/node/2). In a nutshell, 
this means rather than creating a simple module that could do multiple choice 
questions, instead creating a quiz *framework* which could potentially support 
almost limitless question types, from traditional multiple choice/fill in the 
blank, to really innovative ideas such as "reading" a question aloud to gage 
reading comprehension and so on.

== HOW IT CURRENTLY WORKS ==
The current schema can be viewed here: 
http://cvs.drupal.org/viewcvs/drupal/contributions/modules/quiz/quiz.mysql?rev=1.10
For the more visually inclined, I've uploaded an E-R Diagram here: 
http://www.webchick.net/soc/quiz.module/quiz-schema-old.png

This is based off the original proposal, where serialized arrays (/me ducks in 
advance) are used to store answer, feedback and user response data.

NOTE: Skip to "THE PROBLEM" if you don't care about how this stuff is stored 
currently :P

A typical result array (stored in quiz_results.results) might look like:

Array
(
    [10] => Array
        (
            [input] => Array
                (
                    [0] => 0
                    [1] => 2
                    [2] => 3
                )

            [feedback] => Array
                (
                    [0] => Correct!
                    [1] => Correct!
                    [2] => Correct!
                )

            [score] => 1
        )

    [6] => Array
        (
            [input] => Array
                (
                    [0] => 2
                )

            [feedback] => Array
                (
                    [0] => Nope, fooled you!
                )

            [score] => 0
        )
   ...
)

The array keys (10 and 6) are question node IDs... in other words, this is how 
the user responded to both question 10 and question 6. Over the course of the 
quiz, this array keeps growing, with one array of results per question.

"input" is an array of the answer(s) that the user selected on that question. 
The first question was a multiple choice question, so the user plugged in three 
answers (answers 1, 3, and 4 -- remember that arrays start at 0 index). The 
second question was only a single-choice question, and the user selected answer 3.

"feedback" is an array of the feedback to the answer(s) the user selected for 
input. As you can see, the user answered correctly on all three answers on the 
first question, but did not answer correctly on the second question.

Finally, "score" is either a 1 (indicating the answer was correct) or a 0 
(indicating the answer was not correct).

== THE PROBLEM ==

The question was raised to me:

"If there's a million questions, and a
million students, and we want to find out for a given
question, how many student got it right how do we do that?"

Right now, the answer is something along the lines of this:

-----------------------------------------------------------------------

function get_question_results($quiz_nid, $question_nid) {
  // Array to store question results
  $question_results = array(
    'right' => 0,
    'wrong' => 0
  );

  // Grab a list of the results array for each quiz attempt
  $quiz_results = array();
  $results = db_query("SELECT r.results FROM {quiz_results} r WHERE r.quiz_nid = 
%d", $quiz_nid);
  while ($result = db_fetch_object($results)) {
    $quiz_results[] = unserialize($result->results);
  }

  // Now, loop through each result and look for the given question
  foreach ($quiz_results as $key => $value) {
    if ($value[$question_nid]) {

      // If question is found check if it was answered correctly or not
      if ($value[$question_nid]['score'] == 1) {
        $question_results['right']++;
      } else {
        $question_results['wrong']++;
      }
    }
  }
  return $question_results;
}

// Call the function and output results
$results = get_question_results(12, 10);
echo $results['right'] ." students answered this question right, " . 
$results['wrong'] ." answered it wrong.";
-----------------------------------------------------------------------

This is quite processor and RAM-intensive, since it has to load ALL of the 
results for each attempt on a given quiz first, then parse through those to find 
individual question attempts, and then finally retrieve the answers from each 
and evaluate whether or not they were correct.

And this same basic logic has to be completed for each potential query someone 
might want to throw at it.. Such as,  "What is the average score of this user 
among all exams?"

In short, it doesn't look like this solution is going to be very scalable in the 
long-term for the kind of ad-hoc querying that educators are likely going to 
really want to be able to do.

== THE SOLUTION? ==
I've created an alternate, normalized data schema and wanted to get your 
feedback on it, if you get a chance.
http://webchick.net/soc/quiz.module/newschema.sql
E-R Diagram: http://webchick.net/soc/quiz.module/quiz-schema-new.sql

Changes:
- quiz/quiz_questions tables stay the same

- multichoice turns into 'quiz_question' containing only a primary key and a 
"properties" field to take the place of "multiple_answers." I envision this 
field as being more flexible than multiple_answers, as it could potentially 
store a plethora of options for very diverse array question types, perhaps in 
the form of key/value pairs or something (multichoice=1;email_results=1; etc.)

- answers are now broken out into a 'quiz_question_answer' table, which links 
back to quiz_question with separate fields for answer, feedback, and points. The 
answer field itself could store a wide variety of options, such as "Answer" for 
a multichoice question, "St. Paul|Saint Paul" for a fill-in-the-blank question, 
and so on. Since answer display is handled in each question type module in the 
_form and _render_question hooks, we can still have the advantage of storing 
data independently from the actual quiz module. Feedback, too, can be stored in 
whatever way makes sense for the answer, and points can indicate whether a 
question is incorrect (0 points) or can also be used for 'weighting' questions 
(answer 1 is 20 points, answer 2 is 50 points, etc.)

- Results are now stored in two different tables: quiz_result for the overall 
properties (start time/end time, etc. -- essentially the same as the current 
table) and quiz_question_results which provides the textual representation of 
what that user put for his/her answer on each question attempt. This could be 
stored in a similar manner to the answers table... for multiple answer 
questions, it could be 23,63,3 to indicate the answer ids selected, for 
fill-in-the-blank it could have 'Minneapolis', etc. Each question type's 
_evaluate_question hook would take the data from the answer field and check it 
against the given question's answers.

- Fields representing times and dates are switched to DATETIME instead of 
INTEGER (for UNIX timestamp) in order to help with date range calculation

And if you made it through that, a hearty congratulations. :P

Any thoughts?