On 7/19/07, Gabor Hojtsy gabor@hojtsy.hu wrote:
Frederik 'Freso' S. Olesen wrote:
2007/7/14, Derek Wright drupal@dwwright.net:
[...] That's what I was hoping to hear, since I agree the split approach is better for everyone.
Except that for some language or other, it might make sense to switch order of the paragraphs, take a paragraph out, add another paragraph, or in some other way manipulate with the paragraphs to get the proper meaning. Splitting the string also risks confusion during the translation in the interface. I agree that shorter text blurbs are both easier to translate and easier to get people to translate (not as much work), but might also degrade the final outcome.
Yes, it is unfortunately not a win-win situation. It seemed to be a good compromise to have one t() per paragraph/list item.
Gabor
It is certainly a compromise. Stretching the idea to an extreme, if we broke everything down to single words, although words are common between many strings and the translator would have to deal with much less strings a serious translation would be obviously impossible -- single words are translated differently in different contexts, have genders, often are part of grammatical structures and idioms etc.
Several times I have needed to abandon a good translation of an English term string to my language and use an awkward one instead, because in English the term happens to be common in two different contexts while in my language it is not and I have to accommodate both cases. And contributed modules join the party later to exacerbate such a problem.
I suspect that in the center of these issues is the gettext system which allows an English string to have only one translation. (Perhaps the lack of context parameters in t() as well. I am not sure about that).
Some time ago, for example, there was an unsolved issue with the month "May", which in English happens to be same as its abbreviation, therefore it can't have an abbreviation different from the name of the month in any other language. Or, a sting as simple as "Active" can have genders (male, female, neutral) depending on what it refers to, but here we have to choose one gender for everything (which wouldn't impress the occasional intellectual visitor of a site).
The rest of the time, a little bigger chunks of text, case sensitivity or even occasional differences in the html markup of a string allow as to hack our way to a better translation.
Does it seems a viable idea for the future to automatically add optional hidden text into the original English strings to identify different context, and then filter it out in the output? At least for common strings which come from a different t()
I am also curious if anyone knows of any project anywhere in the world which one day might enhance or replace the gettext system to address better the issues of the single translation of a source string.