[development] updating translations: how valuable is user data after all?

Gabor Hojtsy gabor at hojtsy.hu
Tue May 22 09:53:19 UTC 2007


Hi guys,

Now Drupal 6.x-dev includes cool features to import PO files 
automatically at every logical step:

  - you can install Drupal in your foreign language, and have
    PO files for all enabled modules imported along automatically

  - when you add a new language, all translation files for
    enabled modules get imported automatically for that language

  - when you install new modules or enable themes, the translation
    files for these components get imported for all enabled
    languages

This is all great and automated, contrib modules already have their PO 
files at the right place, and we will update the packaging scripts for 
Drupal 6 to package core translations properly.

You might notice a pattern in the above features though: they IMPORT 
stuff into the database. Unfortunately we have no way in Drupal 6 to 
remove translations when you disable a theme or uninstall a module. We 
don't know what strings appeared in *only* that component, and not 
elsewhere in Drupal, so we can remove them without problems. For that, 
we would need the extractor script built into Drupal core to look 
through all source files of enabled components and identify the unused 
strings in the database. Fortunately this is doable in contrib, now that 
extractor has it's own Drupal module. (Of course it is doable in Drupal 
core my deleting all strings from the database and reimporting files for 
only the enabled components, but read on about the value of user data).

BTW Drupal 6 core still need upgrade support for translations. So when 
you update a module or Drupal itself, new and corrected translations get 
into your database. New translations are easy again, they are just 
importing new stuff, which we are very good at :) Updating translations 
already in the DB threatens user data though. In Drupal 5 and before, we 
have no information about what translations a user modified on the web 
interface, so we don't know what was imported from available PO files 
and what was user defined. We can reimport stuff from the files, but can 
easily loose/overwrite user defined/updated strings.

What can we do about not to loose user defined strings? We can easily 
introduce a 'modified' bit into the locale translations (target) table, 
just as it was in menu module in Drupal 5. That would help us from 
Drupal 6 onward, but it does not help us loosing user defined strings 
when a Drupal 5 to Drupal 6 upgrade happens. So how cautious should we 
be there?

   1. Do not overwrite any existing translation, risking that we leave
   incorrect and fixed translations in the database.

   2. Do overwrite existing translations on an update, risking that
   we overwrite user modified translations.

Note that an update will not *remove* anything from the DB because we 
don't know what we can remove as explained above. It can *overwrite* 
stuff though, and problems are around these overwrites.

So how should the update paths work for Drupal and for modules/themes?

Gabor


More information about the development mailing list