On 2/25/06, Rob Thorne <rob@torenware.com> wrote:
This is why Arabic is a great script for doing these kinds of tests:
The first problem with Arabic is that it used to have very different encodings, even from the same vendor. This is now mostly a non-issue with utf-8. Some sites still use windows-1256 though.
it's contextual (meaning that letters need to be drawn differently depending upon what glyphs they're near),
This is also a non-issue now. All operating systems handle the contextualization well. At least Linux KDE and Microsoft does, as well as MS IE and Firefox. Even Konqueror is decent here. Not sure about Safari, but I imagine it would be working too.
bi-directional (byte direction != physical layout), and written right-to-left, so very often, page layout needs to change as well. If you've done Arabic right, you'll do almost anything else right as well.
This is where it gets tricky. The analogy I use is that of when you write your CSS to standards, it mostly works in FireFox, and life is good. Then you start testing in MS IE, and find that your layout is broken. You then have to go iteratively and fix the bugs. Arabic is the same, and there are many surprises. For example, numbers in the middle of an Arabic string, or a line that has Arabic and English in it. The rule of thumb: whatever you estimate for doing this, multiply it by ten and go by trial and error. At least for the first project or two. Other issues: Arabic in URLs does not work well. If you want pathauto for example to have Arabic in the alias, then think twice: MS IE and Firefox will treat these differently (MS IE leaves the letters as Arabic, Firefox encodes them to %CE, ..etc.)
Although you would get this by testing Arabic as well. So we'd solve a lot of problems if we'd just develop Drupal in Arabic and then localize it into English, count to think.
The way I see Arabic being used in Drupal (as well as Hebrew, being the other Semitic language in use today), is that sites are mono-lingual, or "indifferent" lingual. By monolingual, I mean that the site is purely Arabic, e.g. http://csc-sy.net/ By indifferent, I mean a site with an English front end, but content can be either Arabic or English, e.g. http://manalaa.net or http://foolab.org. This is true for blogs mostly.