[translations] Limitations of the source-string-centric approach

Gabor Hojtsy gabor at hojtsy.hu
Thu Jul 19 17:33:00 UTC 2007


Konstantin Käfer wrote:
> On 19.07.2007, at 13:54, Cog Rusty wrote:
>> Several times I have needed to abandon a good translation of an
>> English term string to my language and use an awkward one instead,
>> because in English the term happens to be common in two different
>> contexts while in my language it is not and I have to accommodate both
>> cases. And contributed modules join the party later to exacerbate such
>> a problem.
> 
> This is indeed a common problem. For the same reason, we changed the  
> generic "Submit" to "Save" for Drupal 6, but that only solves one  
> small problem.

Well that improvement was a good change on it's own, "Submit" was an 
awkward button label anyway.

>> I suspect that in the center of these issues is the gettext system
>> which allows an English string to have only one translation. (Perhaps
>> the lack of context parameters in t() as well. I am not sure about
>> that).
> 
> Maybe, we could add such context parameters (plain strings), for the  
> case of the abbreviated word "May" the context parameter could be  
> "month abbreviation", for the full month "May" the parameter would be  
> "month full" (or something in that direction). Translators would have  
> to advice programmers to use contexts for strings so that homographs  
> can be separated in a clean fashion.

There are basically two ways to go about the 'track' (as in audio module 
and tracker module) and the May/May problem:

  - add more context information to the t() call, which ends up in
    Gettext files (maybe in the strings even) This could be
    trim(t('~shortmonth May', array('~shortmonth' => ''))),
    but of course this can be automated (strip ~ stuff from strings).

  - use 'constants' or 'symbolic strings' instead of actual strings:
    t(T_MONTHS_MAY), t(T_SYSTEM_HELP_14, array('@url' => url(....)))
    and t(T_HOME) and friends. This would slow down English sites,
    as they would need to look up the 'constants' to strings too.
    (Also we should not store all strings as constants in memory).

Both ways lead to a solution, and both are ugly unfortunately.

> That's also an issue I have discovered, for example with the string  
> "Your @type has been created." (with @type being the name of a  
> content type).
> 
> I'll explain the problem for people who don't understand the problem:
> Content type names have a specific gender in most languages; let's  
> take "story", which translates to "Artikel" in German. The word  
> "Artikel" is male, thus the sentence should be "Ihr Artikel wurde  
> erstellt." (Ihr = Your). However, if we use page = Seite, we end up  
> with "Ihr Seite wurde erstellt." The problem here is that "Seite" is  
> female, thus requiring "Ihre" in the German language. This results in  
> grammatically incorrect sentences. A wrong gender is not a minor  
> issue for a native speaker, it really disturbs the reading flow and  
> may shed a bad light on the site creator.

The hungarian team works around this by actually translating '@type has 
been created'.

> A possible solution to that problem could be to:
> a) remove variables that are embedded in a sentence (strings like "Do  
> you really want to delete %title?" are perfectly fine since the %  
> indicates that this is user supplied text dropping out of the regular  
> reading flow)

Unfortunately "Do  you really want to delete %title?" does not work well 
either, as in Hungarian we need an article before %title, effectively 
translating "Do you really want to delete the %title post?". Anyway, how 
do you expect these strings to be modified? There are *lots*, and this 
type of string construction makes the interface so much more friendly, 
even if the translation is not 100% accurate.

> b) Provide a way to override translations for specific variable  
> contents. The site administrator could for example override "Your  
> @type has been created." for @type = 'Seite' and replacing it with  
> "Ihre @type has been created"

Well, things like articles and genders would need programmatic backends.

>> I am also curious if anyone knows of any project anywhere in the world
>> which one day might enhance or replace the gettext system to address
>> better the issues of the single translation of a source string.

Mozilla and many others (also in Java) use .property files, which is 
essentially the constants method explained above. Other PHP CMS use 
plain PHP constants.

Gabor


More information about the translations mailing list