[translations] Limitations of the source-string-centric approach (was: proper way to translate complicated ...)

Cog Rusty cog.rusty at gmail.com
Thu Jul 19 11:54:53 UTC 2007


On 7/19/07, Gabor Hojtsy <gabor at hojtsy.hu> wrote:
> Frederik 'Freso' S. Olesen wrote:
> > 2007/7/14, Derek Wright <drupal at dwwright.net>:
> >> [...] That's what I was hoping to hear, since I
> >> agree the split approach is better for everyone.
> >
> > Except that for some language or other, it might make sense to switch
> > order of the paragraphs, take a paragraph out, add another paragraph,
> > or in some other way manipulate with the paragraphs to get the proper
> > meaning. Splitting the string also risks confusion during the
> > translation in the interface.
> > I agree that shorter text blurbs are both easier to translate and
> > easier to get people to translate (not as much work), but might also
> > degrade the final outcome.
>
> Yes, it is unfortunately not a win-win situation. It seemed to be a good
> compromise to have one t() per paragraph/list item.
>
> Gabor


It is certainly a compromise. Stretching the idea to an extreme, if we
broke everything down to single words, although words are common
between many strings and the translator would have to deal with much
less strings a serious translation would be obviously impossible  --
single words are translated differently in different contexts, have
genders, often are part of grammatical structures and idioms etc.

Several times I have needed to abandon a good translation of an
English term string to my language and use an awkward one instead,
because in English the term happens to be common in two different
contexts while in my language it is not and I have to accommodate both
cases. And contributed modules join the party later to exacerbate such
a problem.

I suspect that in the center of these issues is the gettext system
which allows an English string to have only one translation. (Perhaps
the lack of context parameters in t() as well. I am not sure about
that).

Some time ago, for example, there was an unsolved issue with the month
"May", which in English happens to be same as its abbreviation,
therefore it can't have an abbreviation different from the name of the
month in any other language. Or, a sting as simple as "Active" can
have genders (male, female, neutral) depending on what it refers to,
but here we have to choose one gender for everything (which wouldn't
impress the occasional intellectual visitor of a site).

The rest of the time, a little bigger chunks of text, case sensitivity
or even occasional differences in the html markup of a string allow as
to hack our way to a better translation.

Does it seems a viable idea for the future to automatically add
optional hidden text into the original English strings to identify
different context, and then filter it out in the output? At least for
common strings which come from a different t()

I am also curious if anyone knows of any project anywhere in the world
which one day might enhance or replace the gettext system to address
better the issues of the single translation of a source string.


More information about the translations mailing list