[translations] Limitations of the source-string-centric approach (was: proper way to translate complicated ...)

19 Jul 2007


      On 7/19/07, Gabor Hojtsy gabor@hojtsy.hu wrote:
...
Frederik 'Freso' S. Olesen wrote:
...
2007/7/14, Derek Wright drupal@dwwright.net:
...
[...] That's what I was hoping to hear, since I
agree the split approach is better for everyone.
Except that for some language or other, it might make sense to switch
order of the paragraphs, take a paragraph out, add another paragraph,
or in some other way manipulate with the paragraphs to get the proper
meaning. Splitting the string also risks confusion during the
translation in the interface.
I agree that shorter text blurbs are both easier to translate and
easier to get people to translate (not as much work), but might also
degrade the final outcome.
Yes, it is unfortunately not a win-win situation. It seemed to be a good
compromise to have one t() per paragraph/list item.
Gabor
It is certainly a compromise. Stretching the idea to an extreme, if we
broke everything down to single words, although words are common
between many strings and the translator would have to deal with much
less strings a serious translation would be obviously impossible  --
single words are translated differently in different contexts, have
genders, often are part of grammatical structures and idioms etc.
Several times I have needed to abandon a good translation of an
English term string to my language and use an awkward one instead,
because in English the term happens to be common in two different
contexts while in my language it is not and I have to accommodate both
cases. And contributed modules join the party later to exacerbate such
a problem.
I suspect that in the center of these issues is the gettext system
which allows an English string to have only one translation. (Perhaps
the lack of context parameters in t() as well. I am not sure about
that).
Some time ago, for example, there was an unsolved issue with the month
"May", which in English happens to be same as its abbreviation,
therefore it can't have an abbreviation different from the name of the
month in any other language. Or, a sting as simple as "Active" can
have genders (male, female, neutral) depending on what it refers to,
but here we have to choose one gender for everything (which wouldn't
impress the occasional intellectual visitor of a site).
The rest of the time, a little bigger chunks of text, case sensitivity
or even occasional differences in the html markup of a string allow as
to hack our way to a better translation.
Does it seems a viable idea for the future to automatically add
optional hidden text into the original English strings to identify
different context, and then filter it out in the output? At least for
common strings which come from a different t()
I am also curious if anyone knows of any project anywhere in the world
which one day might enhance or replace the gettext system to address
better the issues of the single translation of a source string.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[translations] Limitations of the source-string-centric approach (was: proper way to translate complicated ...)