Issue status update for http://drupal.org/node/18817 Project: Drupal Version: cvs Component: base system Category: bug reports Priority: normal Assigned to: Steven Reported by: Steven Updated by: Steven Status: patch For nodes/comments the responsibility lies with the theme engine anyway, not the template. The other cases are not themed as often. Steven Previous comments: ------------------------------------------------------------------------ March 13, 2005 - 03:00 : Steven Attachment: http://drupal.org/files/issues/check.patch (91.83 KB) Back story Bart Jansens brought up some possible XSS issues recently which prompted me to reevaluate the text checking in Drupal. The big issue is "plain text": what a user typically enters in single-line boxes. When this text is put into HTML, special characters like < > and & need to be escaped. When it is put into an HTML attribute, " needs to be escaped too. Note that it is not about filtering (what check_output() does), but about handling plain-text (commonly associated with single-line editboxes). The current 'standard' in Drupal is to do escaping when outputting the titles in HTML, and keep the plain-text in the database. We should keep this as it is in line with "keep exactly what the user typed and process it later". Furthermore it is hard to undo this escaping, which is needed when communicating with other services or when exporting data. This brings up the problem: where should text be escaped? There are in fact many places where this escaping is not done yet, which means there are problems if the user enters e.g. < or & in a title. Usually these bugs are in admin areas, so not critical for security, but they should still be fixed in the spirit of XHTML validation and usability. So this means that before the call to check_plain(), the data is plain-text and no HTML can be used in it. Once the text has been run through check_plain(), it is HTML, and tags can be added. Logically, it is easy: data should be escaped as soon as it becomes HTML, but exactly deciding where data becomes HTML is tricky. Any parameter which is passed as plain-text and only checked later in the code can not contain HTML, so I've had to decide where to add missing escapings based on context and common sense. This issue also affects URLs, but there the XSS issues are more important. Plus all of this should not be confused with urlencode() which is simply used to put arbitrary data into an URL without it disrupting normal URL semantics (directories, anchors, query arguments, etc). Concrete changes drupal_specialchars() and check_form() were almost the same, except that one would escape " while the other wouldn't. This difference is negligable and it does no harm to escape too much. In the interest of cleaning up text handling, I merged these into check_plain(). check_url() had a toss at extra XSS protection in it, but it was in fact broken and wholly incorrect. I replaced it with what was probably the original intention. Many URL handling functions in fact worked with HTML-escaped URLs (using & instead of & for example). I got rid of this, and did the escaping in a more logical place (e.g. drupal_goto no longer has to reconvert the &). Node/comment titles used to be run through strip_tags(), which means you couldn't use HTML but were still forced to escape angle brackets and amps. I changed them to be consistent with other plain-text fields in Drupal. Profile checkbox items were handled inconsistently as well and were fixed. The l() function now escapes the title by default, in the interest of keeping the code cleaner. But the old behaviour is still available (and needed in some places) with an extra parameter. Text handling in Drupal after this patch Any piece of user submitted text should be run through one of the check_ functions before being put into HTML. Use check_plain() for plain-text, check_output() for rich text. Dynamic data in URLs should be urlencode()d, regardless of where that URL ends up. URLs in HTML attributes should be check_url()'d. If you are unsure if you're doing the right thing, the easiest test is to try out putting HTML tags in your module's fields where they are not intended. They should be displayed on-screen and not interpreted. ------------------------------------------------------------------------ March 15, 2005 - 14:10 : Bart Jansens Looks good. I spent some time trying to find places where the check_plain() call was missing and I found only one; the URL of an aggregator feed isn't checked. One thing that isn't entirely clear is who is responsible for calling check_plain() or check_output(). In some modules this is done by a theme function, in others the data passed to the theme functions is already checked. I'm not saying this patch should fix all this (it's already big enough), but just something to keep in mind for the future. Currently some themes might forget a check.