[development] internationalised paths in URI (was subdomain rewriting)

Khalid B kb at 2bits.com
Tue May 23 14:19:41 UTC 2006


The two major browsers handle this differently at the moment.

For example, here is a site that places non-Western characters in
the URL (Arabic in this case).

Firefox will escape the characters in % notation, e.g.:
http://www.alquds.co.uk/index.asp?fname=2006\05\05-23\z30.htm&storytitle=ff%DD%E6%D6%ED%20%CA%D3%E6%CF%20%E3%CD%C7%DF%E3%C9%20%D5%CF%C7%E3%20%C7%CB%D1%20%D8%D1%CF%20%C7%E1%E3%CD%C7%E3%ED%C9%20%C7%E1%E1%C8%E4%C7%E4%ED%C9fff

Not sure if it is in full compliance with the RFC in question or not.

MS IE 6 will put them as is in the URL:
http://www.alquds.co.uk/index.asp?fname=2006\05\05-23\z30.htm&storytitle=ffفوضي%20تسود%20محاكمة%20صدام%20اثر%20طرد%20المحامية%20اللبنانيةfff

(all this will be unintelligible for most of those on the list).

I like standards like anyone else, but if they continue to be ignored by
those who own 85% of the market, they are no longer a practical
option for the majority of users.

On 5/23/06, drupal <vlado at dikini.net> wrote:
> Staying on the url() issue, I think we should consider implementing the
> recommendations of:
> http://www.ietf.org/rfc/rfc3987.txt
>
> At least the path side. At the moment non-ascii characters are dropped
> from paths, at least when using path alias.
>
> IMO it should be done. This will help supporting with various
> internationalised websites.
>
> A decent overview can be found in
> http://www.w3.org/International/articles/idn-and-iri/
>
> Cheers,
> Vlado
>
>


More information about the development mailing list