[development] [VERY LONG] localization of currency, float, dates accordingly to language chosen

Ivan Sergio Borgonovo mail at webthatworks.it
Thu Oct 11 00:33:53 UTC 2007


On Wed, 10 Oct 2007 16:29:19 +0200
"Gábor Hojtsy" <gabor at hojtsy.hu> wrote:

> - currencies are easy if the module in question has that in t(),
> like t('@symbol @amount') can be translated to '@symbol at amount' or
> '@amount at symbol', although I am not sure you mean this by currencies

I've to dig more into t(). I'll see if I can bend it to my needs but
the following paragraph doesn't give me too much hope.

> - float numbers are printed with number_format(), or they should be
> at least, which does not support different formatting based on
> languages. Feel free to suggest new functionality to do this, which
> can be tested in real life scenarios in D6 and integrated to D7.
> - date formats are Drupal settings and as such are translatable with
> the settings translations features of i18n and localizer

uh you gave me the chance to pontificate... why should I miss it? ;)
Ready for a full Encyclic?

I've used i18n in the past (4.7?) and I had mixed feelings.

I've read about localizer and I like some of its feature and some of
the idea of the author, mainly: different language version of the
same web site should be very lightly coupled.

First I'd say there are 3 kind of users of a cms: readers, editors
and programmers.

I'd sum up what I expect:
- currency, dates, float in the localised format (measures too? maybe
later lbs to Kg??? why not?)
- avoiding duplication of code ( t() + the former should be enough )
- building bridges between content in different languages (more on
this later)
- SEO ( people won't actually remember urls (nearly sure) and rarely
they will read them (less sure) )

Having a multi-language site means:
- avoiding code duplication [ t() + localised numbers/dates ]
- help people switch language
I don't think editors need (may have) to much help (more on this
later)

Avoiding code duplication helps programmers.

Helping people to switch language is useful just for people that know
more than one language.
I think people switch because they find something in one language and
they would like to know if there is more/different content on the
same topic in another language. The switch may be from mother tongue
to foreign language or the opposite or from any foreign language.

You're Italian, you know French, you look on a search engine in
French (cos you know the topic is *very French*, then you see there is
Italian content etc...).
You land on Italian content then you see there is more material in
English and you'd like to switch...
They look for the same stuff they were looking... just in another
language... you've to provide bridges between similar or when
possible same content across languages.
Based on my experience different versions of a web site may grow up
with completely different speed, have slightly different content
etc... so the coupling should be very light.

So I took care of programmers and readers. Now a brief parenthesis on
editors.
If editors ignore their site is multi-language... each version will
take it's way so it won't actually be a multi-language site, you just
have to provide facilities to programmers and content will be so
lightly coupled there won't be any automatic way to help people
switch between different languages.
If editors will be conscious the site is multi-language they are the
one who will need the facilities to build bridges between languages.

OK... let's get a bit more technical and provide some details.

First let me court the category I belong to.
Programmers don't provide content... they provide the interface.
Interface has few content to be translated.
Programmers don't have to be distracted by translations while coding.

I think providing functions that format the usual things accordingly
to the language set could be something that can be introduced
smoothly.
People will just have to surround stuff with those functions. If they
do, they will have localised output, if they don't they will have
what they were having in the past. As you'd expect for t() vs. simple
strings. Everything including modules can be updated gradually.

t() is perfect as it is and I'd consider t() just for the interface
and not for the content. ids are too hard to remember.

t('this_is_the_id_of_a_sentence_that_was_shorter_than_the_id')

Maybe one transparent addition to t() could be to change the "base"
language, so that people could input something like t('frase','it'),
so that programmers could concentrate on coding and not thinking in
something that's not their mother language. I understand the extra
cost of adding indexes to the translation table but maybe some
programmers may appreciate it. I'm not among them... but maybe some
may appreciate it.

I need localised formatting shortly and I'm willing to write it.
To me it looks a no-brainer if I knew:
1) where to put it to be sure to make those functions available
everywhere
2) which variable should be chosen to decide which language is set for
the session.
I'd do it kosher. If it is not clear where to put this stuff,
fortunately the code where I need this stuff is well defined and I
can write some terrible hack without too much shame in the module I'm
just writing for other purposes.

To be clear about the second point there is content and there is
interface: people may want to have interface in their language and
read content in others.
If no specific choice is made by the reader how you decide which
interface you should use is a bit tricky but I think is just a matter
of "taste".
At first landing time you could chose:
1) according to the browser cfg
2) according to the language of the content
but what if the user switch language? Should you change the interface
and the content, just the content and keep the interface in sync with
the browser cfg... Anyway you've to provide a clear path to the user
to switch the interface let them know they can switch content and
interface independently.
I would pair interface (menu etc...) and number, currency...
localisation. Differentiating between localisation of currency,
dates, float and interface is very flexible but I think could be even
confusing.
So a function that localise dates, float etc... should take as an
argument a function that decide accordingly to browser settings,
setup of user profile and a policy defined through settings in the
admin interface which format should be used.
In the admin panel it should be possible to add languages and add
format strings for dates, float etc...



Content language shouldn't be a "variable" as content should be
lightly coupled. Content language should be similar to the root of a
taxonomy, language facilities should just help people to move between
different taxonomies/languages.
People should be helped to land on the "right" language and then be
helped to stay focused (filter by language) on the content of that
language till they want it (it.site.com, en.site.com, site.com,
site.it, browser setting etc... + collapse taxonomies).
There are stuff that actually aren't categorised so "language"
shouldn't actually be the root of different taxonomy, you may have
important content that you didn't have the chance to translate and
you want to show it to everyone too.
You assign a language or a jolly to every node and a language or a
jolly to a vocabulary then you build bridges between taxonomies and
nodes. So language should be a propriety of vocabularies and nodes
and there should be a jolly language.

Jolly language should be a "not filtered language".

Let's see what I'd expect when people switch language.
First there should be a language selector block.
People could surf nodes or taxonomies.
When they switch... they should land on the most similar content in
the different language.

You could have a one to one mapping between nodes or you could slide
to the nearest bridge editors built between taxonomies.

As if you had 2 different trees, you could decide to make a one to
one mapping between leaves and branches or you could link just some
branches of the 2 taxonomies...
The algorithm should go down towards the root till it find a bridge
and cross it.
So if there is not perfect match the user will be taken to the most
similar content available.


Let's come to the editors...
Editors have 3 tasks:
- writing nodes
- building the taxonomies
- building bridges between taxonomies
Oversimplifying you may have people responsible for just one or more
tasks and different access right. So it is important to decouple as
much as possible these activities with a look at performances too.

I think localised URLs have lower priority but they are a strategical
decision and may be a PITA.
I don't think people remember URLs, they may remember
http://www.mysite.com/blog or http://www.mysite.com/gallery
but once you get in things like
http://www.mysite.com/literature/modern/italian/war/biographies/
I doubt people can really remember such things...
But there is a but...
a) search engines do read URLs to rank pages
b) some people do read URLs to rank pages
for high ranking content language and URL language should match.
URL automatic translation has no sense as there is no sense in
automatic taxonomy translation.
i18n module used a cheap trick to build a bridge between nodes: same
url with language prefix. No DB access and a 404 if nothing there.
You could build a custom 404 to redirect the user to some related
material in the other language... etc...
With URL translation you have to add a DB access.
From the editor point of view there is no difference... or maybe... a
new system may even be better, maybe the new i18n already implement
something similar.

Editors that are aware of working in a multi-language site with
previous i18n system had to know the url of the going-to-be-translated
article and put it into the path alias... so there won't be any extra
cost in finding the original article and add a button and a drop-down
to "add a translation", just to add a link between the 2 nodes.

Engineering the DB structure to provide an efficient way to store
this link shouldn't be terribly hard.

Taxonomy could be filled automatically as the best match reaching the
nearest bridge between the 2 taxonomies.

This system should provide a way to add such link not just at
creation time. A good interface for linking 2 nodes may be a bit more
tricky.

Similarly there should be a way to link different branches of
taxonomies. Not all branches should be linked... you could  build a
one 2 one match or taxonomies may grow up semi-independently...

An algorithm should go down the tree till it finds a bridge to cross
and present the content on the other tree.

Editors should edit menu too... but menus aren't taxonomies... they
may contain taxonomies but they not.
Menu may be different in different languages (maybe some services
aren't provided in all the language...
Again... I'd have localised menus merged with a jolly menu and
taxonomies.

In a multi-language site tools for mass editing of taxonomies may
come handy... language version tend to diverge, then you try to make
them converge once more and you may have to move branches of taxonomy
around... but this is "independent" from "core" localisation
functions.

Localised menu will be "hand written" and have an assigned language.
Jolly menu may use t().
And taxonomy menu will use the same rules as taxonomies.
Menus, with the exception of taxonomy menus aren't so large you can't
deal with them manually.

Cherry on top would be to signal if there are one to one matches for
nodes so you'll know if switching language will take you to the same
content or just to the most similar content.
At the interface level that could be made highlighting the flags of
the switching block if there is matching content in those languages.
But once you've a function to inform you there is matching content...
everyone could build up his interface of choice

I see media duplication just as a problem of node interface.
If people can chose if uploading to the server or choose from the
server there shouldn't be any problem of media duplication across
different languages unless they have to be different media.

oh... and someone more knowledgeable should take into account RTL
language too... I've no idea of the problems involved in such kind of
localisation.

I wrote few pages in Simplified Chinese in a multi-language website
and I didn't have any problem, so Chinese at least as an exotic
language for us European doesn't seem to be an issue.
I didn't test localisation function for dates and currency. I didn't
have the need for them.
I remember HKLUG uses drupal too and it uses i18n module.
They helped me to install Chinese localised Linux for my wife and they
suggested me a great Linux router mmm more than 6 years ago I think.


-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it




More information about the development mailing list