From steven@acko.net Thu Mar 31 09:28:14 2005 From: Steven Wittens To: development@drupal.org Subject: [drupal-devel] Plain-text checking / text output in Drupal Date: Thu, 31 Mar 2005 09:28:18 +0000 Message-ID: <424BC325.9070702@acko.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0894617619860363300==" --===============0894617619860363300== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit I just committed the large check_plain() patch after a green light from Dries: http://drupal.org/node/18817 The main idea is to make sure plain text is handled as plain text... so you can use <> or & in a taxonomy term name or comment subject, and not mess up validation or your site. This should've in fact already been okay, but it seems due to general confusion on what the right way was many people didn't pay much attention to it. After my patch, every single-line field should be plain-text in Drupal, including in particular node/comment titles, which used to be stored as stripped HTML instead (which was really bad for usability, see the issue for more info). A few notable changes: - drupal_specialchars() and check_form() have been merged into check_plain() - node and comment titles are now plain-text and need to be check_plain()'ed before output. Note that due to changes in l(), many cases will be caught already. - theme('placeholder', $text) was added for putting dynamic pieces of text into a sentence ("are you sure you want to delete %block"). It outputs ''. check_plain($text) .'' by default and should be used when appropriate. I've written up a short text on text output in Drupal after my patch... once it's cleaned up a bit, it should go into the documentation somewhere. I'll also add a short blurb to the module upgrading guide. Steven ------------- Text output in Drupal When handling and outputting text to HTML, you need to be careful that proper filtering or escaping is done. Otherwise there might be bugs when users try to use angle brackets or ampersands, or worse you could open up XSS exploits. When handling data, the golden rule is to keep exactly what the user typed. On the database and input side of things there isn't much to worry about. Text remains text. Note that you should never use a plain strip_tags() call to clean up user input. This would strip out all tags, but still force you to use entities for angle brackets or ampersands. You get all the disadvantages of HTML without any benefits. Conversion is done on output where the text has to be placed in another format/context, e.g. HTML. There are two types of text in Drupal: 1. Plain text _____________ This is simple text without any markup. What the user entered is displayed exactly on screen as is, and is not interpreted in any form. This is almost always the format used for single-line text fields. It is good to keep this consistency in your own code. When outputting plain-text, you need to pass it through check_plain() before it can be put inside HTML. This will convert quotes, ampersands and angle brackets into entities. Most themable functions and APIs take HTML arguments, but there are a few which already have check_plain() in it for convenience: * l(): the link caption should be passed as plain-text (unless overridden with the $html parameter). * menus: the menu item titles are plain-text. * theme('placeholder'): the placeholder text is plain-text. Some places require HTML which might not be obvious: * page titles set through drupal_set_title(). The page title is displayed in the HTML, where it makes sense to use tags like for clarity. When the page title is displayed in the HTML tag however, all tags will be stripped out. * block titles passed in through hook_block(). For the same reason as the page title, using HTML here is commonly done. Note that functions which logically take 'data' and not 'output' will almost always take plain-text and require no escaping on your side. A good example is the value passed to form_ functions, e.g. a plain-text field's contents. What the user entered is exactly what you should pass to form_textfield. On the other hand, this does not count for the form item's title or description, which are passed as HTML. This is done so that modules can format the item title as they want. 2. Rich text ____________ This is text which is marked up in some language (HTML, Textile, etc). It is stored in the markup-specific format, and converted to HTML with the various filters that are enabled. This is almost always used for multi-line text fields. All you need to do is pass the rich text to check_output() and you'll get HTML returned, safe for outputting. You should also allow the user to choose the input format with a format widget through filter_form() and should pass the chosen format along to check_output(). URLs ____ A note about URLs. URLs require special handling in two ways: - Putting dynamic data into URLs. If you wish to put any sort of dynamic data into an URL, you need to urlencode() it. If you don't, characters like # will disrupt the normal URL semantics. urlencode() will escape them with %XX syntax. - Putting URLs into HTML. URLs are a common attack vector for XSS exploits. Though we have an XSS filter at the beginning of the page request, it is still smart to be careful. When putting an URL inside an HTML attribute (e.g. <a href="...">), you should pass it through check_url(). Check_url() is similar to check_plain(), but it contains some extra XSS protection. Note that all Drupal functions which return URLs (url(...), request_uri(), etc.) output 'real' URLs which have not been HTML escaped in any way. Remember to use check_url() to escape them when outputting HTML (or XML). Don't use check_url() in situations where a real URL is expected, e.g. in the HTTP 'Location: ...' header. In practice ___________ If this sounds all confusing, there are only a limited number of functions where this is important, and you will easily get to know them. Usually you control your own output in your module so the output process is quite transparent. Every piece of plain-text should be converted with check_plain() once before going into HTML. When in doubt, you can always put some test text like "<u>foo</u>" in your text fields, and see how it comes out. For plain-text fields, the underline tag should not be interpreted, but displayed as is. When displaying a piece of user-submitted text in a message, you should pass it through theme('placeholder', $text) to make it stand out. It will also escape the text for you with check_plain(). Note that you cannot pass HTML entities to functions which accept plain-text. If you need to use high Unicode characters in a plain-text string, input them directly in the code with UTF-8 encoding. It's more compact as well. --===============0894617619860363300==-- From dries@buytaert.net Thu Mar 31 11:29:48 2005 From: Dries Buytaert <dries@buytaert.net> To: development@drupal.org Subject: Re: [drupal-devel] Plain-text checking / text output in Drupal Date: Thu, 31 Mar 2005 11:29:48 +0000 Message-ID: <84bd7fac2939d7c624d73f1837cd8233@buytaert.net> In-Reply-To: <424BC325.9070702@acko.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3594632241079260797==" --===============3594632241079260797== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On 31 Mar 2005, at 11:30, Steven Wittens wrote: > I've written up a short text on text output in Drupal after my > patch... once it's cleaned up a bit, it should go into the > documentation somewhere. I'll also add a short blurb to the module > upgrading guide. Looks good, though I'd consider adding a couple examples (dos and don'ts). I guess this kind of documentation belongs in contrib/docs so we can link the function names -- or does it belong in the handbook? (Can't we move the entire developer guide to contrib/docs?) I'd also add a couple checks to the code-style checker to encourage (correct) use of these functions. Great job, -- Dries Buytaert :: http://www.buytaert.net/ --===============3594632241079260797==-- From jchaffer@structureinteractive.com Thu Mar 31 14:03:14 2005 From: Jonathan Chaffer <jchaffer@structureinteractive.com> To: development@drupal.org Subject: Re: [drupal-devel] Plain-text checking / text output in Drupal Date: Thu, 31 Mar 2005 14:03:15 +0000 Message-ID: <6841eb80d4da7af5f5c470cfe575d10e@structureinteractive.com> In-Reply-To: <84bd7fac2939d7c624d73f1837cd8233@buytaert.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2528211936075105791==" --===============2528211936075105791== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit On Mar 31, 2005, at 6:29 AM, Dries Buytaert wrote: > On 31 Mar 2005, at 11:30, Steven Wittens wrote: >> I've written up a short text on text output in Drupal after my >> patch... once it's cleaned up a bit, it should go into the >> documentation somewhere. I'll also add a short blurb to the module >> upgrading guide. > > Looks good, though I'd consider adding a couple examples (dos and > don'ts). I guess this kind of documentation belongs in contrib/docs > so we can link the function names -- or does it belong in the > handbook? (Can't we move the entire developer guide to contrib/docs?) +1 for this move. I recently added code to api.module to allow it to handle HTML files. It grabs everything in the <body>, links the function/file names, and places the output in theme('page'). This update hasn't made it to drupaldocs yet (I think Kj is still quite backlogged). Once it does, the HTML files in contrib/docs/developer/topics should start showing up there. If anyone with The Power wants to update api.module on the site, be my guest. :-) -- Jonathan Chaffer Applications Developer, structure:interactive (616) 364-7423 http://www.structureinteractive.com/ --===============2528211936075105791==-- From dries@buytaert.net Sun Apr 3 15:26:40 2005 From: Dries Buytaert <dries@buytaert.net> To: development@drupal.org Subject: Re: [drupal-devel] Plain-text checking / text output in Drupal Date: Sun, 03 Apr 2005 15:26:41 +0000 Message-ID: <973ddb60c6c85b627d855bf7715a2b4d@buytaert.net> In-Reply-To: <6841eb80d4da7af5f5c470cfe575d10e@structureinteractive.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7340443029972241183==" --===============7340443029972241183== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > I recently added code to api.module to allow it to handle HTML files. > It grabs everything in the <body>, links the function/file names, and > places the output in theme('page'). This update hasn't made it to > drupaldocs yet (I think Kj is still quite backlogged). Once it does, > the HTML files in contrib/docs/developer/topics should start showing > up there. > > If anyone with The Power wants to update api.module on the site, be my > guest. :-) I just upgraded the api.module on drupaldocs.org to HEAD. Does it show the expected content? I can't spot the OOP-topic, for example. -- Dries Buytaert :: http://www.buytaert.net/ --===============7340443029972241183==-- From weitzman@tejasa.com Sun Apr 3 18:41:28 2005 From: Moshe Weitzman <weitzman@tejasa.com> To: development@drupal.org Subject: [drupal-devel] OOP in Drupal essay - now available Date: Sun, 03 Apr 2005 18:41:30 +0000 Message-ID: <2644b7e1274378142f76444077640033@tejasa.com> In-Reply-To: <973ddb60c6c85b627d855bf7715a2b4d@buytaert.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5167226522787946122==" --===============5167226522787946122== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit I just issued a cron.php request and it now appears at http://drupaldocs.org/api/head/file/contributions/docs/developer/ topics/oop.html. comments on this doc are welcome. On Apr 3, 2005, at 11:26 AM, Dries Buytaert wrote: >> I recently added code to api.module to allow it to handle HTML files. >> It grabs everything in the <body>, links the function/file names, and >> places the output in theme('page'). This update hasn't made it to >> drupaldocs yet (I think Kj is still quite backlogged). Once it does, >> the HTML files in contrib/docs/developer/topics should start showing >> up there. >> >> If anyone with The Power wants to update api.module on the site, be >> my guest. :-) > > I just upgraded the api.module on drupaldocs.org to HEAD. Does it > show the expected content? I can't spot the OOP-topic, for example. > > -- > Dries Buytaert :: http://www.buytaert.net/ > --===============5167226522787946122==-- From carl_mcdade@yahoo.com Sun Apr 3 19:17:10 2005 From: Carl McDade <carl_mcdade@yahoo.com> To: development@drupal.org Subject: Re: [drupal-devel] OOP in Drupal essay - now available Date: Sun, 03 Apr 2005 19:17:11 +0000 Message-ID: <42504135.5090307@yahoo.com> In-Reply-To: <2644b7e1274378142f76444077640033@tejasa.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8408953567328196969==" --===============8408953567328196969== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Good write up. Nice and understandable semantics also Here's one of articles that helped me understand. http://www.computerworld.com/developmenttopics/development/story/0,10801,8562= 1,00.html Carl McDade Moshe Weitzman wrote: > I just issued a cron.php request and it now appears at =20 > http://drupaldocs.org/api/head/file/contributions/docs/developer/=20 > topics/oop.html. comments on this doc are welcome. >=20 > On Apr 3, 2005, at 11:26 AM, Dries Buytaert wrote: >=20 >>> I recently added code to api.module to allow it to handle HTML=20 > files. =20 >>> It grabs everything in the <body>, links the function/file names,=20 > and =20 >>> places the output in theme('page'). This update hasn't made it to =20 >>> drupaldocs yet (I think Kj is still quite backlogged). Once it does, =20 >>> the HTML files in contrib/docs/developer/topics should start showing =20 >>> up there. >>> >>> If anyone with The Power wants to update api.module on the site, be =20 >>> my guest. :-) >> >> I just upgraded the api.module on drupaldocs.org to HEAD. Does it =20 >> show the expected content? I can't spot the OOP-topic, for example. >> >> -- >> Dries Buytaert :: http://www.buytaert.net/ >> >=20 >=20 --===============8408953567328196969==-- From prometheus6@gmail.com Sun Apr 3 21:17:20 2005 From: Earl Dunovant <prometheus6@gmail.com> To: development@drupal.org Subject: Re: [drupal-devel] OOP in Drupal essay - now available Date: Sun, 03 Apr 2005 21:17:21 +0000 Message-ID: <c6df037d050403141722cefdc6@mail.gmail.com> In-Reply-To: <2644b7e1274378142f76444077640033@tejasa.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0322704231782945353==" --===============0322704231782945353== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Very good. And the more people that understand object oriented design is MUCH more important than object oriented programming, the better. On Apr 3, 2005 2:41 PM, Moshe Weitzman <weitzman@tejasa.com> wrote: > I just issued a cron.php request and it now appears at > http://drupaldocs.org/api/head/file/contributions/docs/developer/ > topics/oop.html. comments on this doc are welcome. > > On Apr 3, 2005, at 11:26 AM, Dries Buytaert wrote: > > >> I recently added code to api.module to allow it to handle HTML files. > >> It grabs everything in the <body>, links the function/file names, and > >> places the output in theme('page'). This update hasn't made it to > >> drupaldocs yet (I think Kj is still quite backlogged). Once it does, > >> the HTML files in contrib/docs/developer/topics should start showing > >> up there. > >> > >> If anyone with The Power wants to update api.module on the site, be > >> my guest. :-) > > > > I just upgraded the api.module on drupaldocs.org to HEAD. Does it > > show the expected content? I can't spot the OOP-topic, for example. > > > > -- > > Dries Buytaert :: http://www.buytaert.net/ > > > > --===============0322704231782945353==--