[drupal-devel] [feature] Multiple translations of the same strings

Olen drupal-devel at drupal.org
Mon Apr 4 09:57:52 UTC 2005


Issue status update for http://drupal.org/node/19425

 Project:      Drupal
 Version:      cvs
 Component:    locale.module
 Category:     feature requests
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  Olen
 Updated by:   Olen
 Status:       patch

Sorry, I've been buzzy "going live" with a site the last week, so I have
not had the time to comment or do any more work on this for some days. 
Now things have calmed down a bit, and I am ready to fix the issues
that has come up.
After reading a bit about PO, I realize that today it is not possible
to have multiple translations of the same string *in the same
"domain"*.
As far as I can see, "domain" seems to more or less mean "po file", so
this should be a minor issue (with my patch, you already have the
option to download one .po file per drupal file).
All one would need to do to make it 100% PO-compatible is to add a
"DISTINCT" to the query creating the .po-file to download.
(Please correct me if I'm wrong).
That way you could first export "All", translate all (distinct)
strings, translate everything, import the file, surf a bit around to
locate the places the translations need to be changed, export the
particular po files to change the translations in whatever files
(domains) you need to afterwards.
To really make this work as expected, it would be best to have a script
run when new files are added or updated to get all strings from all
files and add them to the locales_*-tables.  That way one would ensure
that all strings are translated the first time.
Today strings are added to the tables the first time they are "seen",
which makes the tables grow slowly, and is frustrating for translators,
as the files need to be downloaded many times to get all the string.
So my proposal is to do something like:

Copy/alter/make a new version of 'extractor.php' to insert strings into
the database.
Run this script (from cron?) to make sure strings from new modules are
inserted automatically.
Maybe do some cleanup of no longer needed strings at the same time
(export them to a unsed.$date.po or something).

As files change, you may have the same string mulitple times in the
database IE in the locations "example.module:123" and
"example.module:124", because a line was added or removed somewhere
above the string in an update of the module.

Use "filename" as "domain" and create both .po and .pot-files from the
database.
Use "DISTINCT" on the queries to make sure the same string only appears
once in every .po (and .pot)



Olen



Previous comments:
------------------------------------------------------------------------

March 25, 2005 - 10:20 : Olen

In many langages the same english string can (and should) be translated
differently, depending on context.
Here are two patches, one for locale.inc and one for locale.module that
allows the same string to be translated more than once.
It adds the correct path (filename) and line number of the translated
string as "location" instead of the url where it was first seen.
It will also allow you to download the .po-file for only one module at
the time.
These two improvements makes it a lot easier to find the correct
context for the translation.
It will use a "best effort" when finding translations, first trying to
match on file:line, then only file and at last, any translated string
with the same 'source'.
Be aware that the first few page loads after a new translation is added
are really slow, until the database has been updated with all the new
strings and locations.
Also note that this will lead to several "unused" strings in the
database.  I have an idea about some sort of timestamp to check when a
string was last used, and a cron job that removes old, unused strings,
but I think this could cause more problems than it fixes.
Anyway, here are the patches.  Let me know if you want them as
attachments instead.
locale.module:
--- locale.module.orig  2005-03-23 08:42:29.000000000 +0100
+++ locale.module       2005-03-25 10:10:39.180616160 +0100
@@ -142,29 +142,67 @@
   // We don't have this translation cached, so get it from the DB
   else {
-    $result = db_query("SELECT s.lid, t.translation FROM
{locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE
s.source = '%s' AND t.locale = '%s'", $string, $locale);
+    $caller = debug_backtrace();
+    $docroot = realpath($_SERVER['DOCUMENT_ROOT']);
+    $file = ereg_replace($docroot, '', $caller[1]['file']);
+    $basefile = basename($file);
+    $line = $caller[1]['line'];
+    $origstring = $string;
+    $result = db_query("SELECT s.lid, s.location, t.translation FROM
{locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE
s.source = '%s' AND t.locale = '%s'", $string, $locale);
     // Translation found
-    if ($trans = db_fetch_object($result)) {
+    while ($trans = db_fetch_object($result)) {
       if (!empty($trans->translation)) {
-        $locale_t[$string] = $trans->translation;
-        $string = $trans->translation;
+        if ($trans->location == "$file:$line") {
+          // We have 100% match
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 100;
+          break;
+        }
+        elseif (eregi($basefile, $trans->location) && ($rate < 100)) {
+          // We have a match in the same file, but on a different line
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 75;
+        }
+        elseif ($rate < 50) {
+          // We have a match in another file
+          $locale_t[$string] = $trans->translation;
+          $string = $trans->translation;
+          $match = $trans->lid;
+          $rate = 50;
+        }
+      }
+    }
+    // We have a translation, but not a full file:line match
+    if (($match) && ($rate < 100)) {
+      // Lets update source and target with the correct location
+      db_query("INSERT INTO {locales_source} (location, source) VALUES
('%s', '%s')", "$file:$line", $origstring);
+      if ($locale) {
+          $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $origstring,
"$file:$line"));
+          db_query("INSERT INTO {locales_target} (lid, locale,
translation) VALUES (%d, '%s', '%s')", $lid->lid, $locale, $string);
       }
     }
     // Either we have no such source string, or no translation
-    else {
-      $result = db_query("SELECT lid, source FROM {locales_source}
WHERE source = '%s'", $string);
-      // We have no such translation
+    elseif (!$match) {
+      $result = db_query("SELECT lid, source FROM {locales_source}
WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line");
       if ($obj = db_fetch_object($result)) {
         if ($locale) {
-          db_query("INSERT INTO {locales_target} (lid, locale) VALUES
(%d, '%s')", $obj->lid, $locale);
+          $trans = db_fetch_object(db_query("SELECT lid FROM
{locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid,
$locale));
+          // We have no such translation
+          if (!$trans) {
+            db_query("INSERT INTO {locales_target} (lid, locale)
VALUES (%d, '%s')", $obj->lid, $locale);
+          }
         }
       }
       // We have no such source string
       else {
-        db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", request_uri(), $string);
+        db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", "$file:$line", $string);
         if ($locale) {
-          $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $string));
+          $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $string,
"$file:$line"));
           db_query("INSERT INTO {locales_target} (lid, locale) VALUES
(%d, '%s')", $lid->lid, $locale);
         }
       }
@@ -410,7 +448,7 @@
   include_once 'includes/locale.inc';
   switch ($_POST['op']) {
     case t('Export'):
-      _locale_export_po($_POST['edit']['langcode']);
+      _locale_export_po($_POST['edit']['langcode'],
$_POST['edit']['filename']);
       break;
   }
   print theme('page', _locale_admin_export_screen());
And for locale.inc
--- locale.inc.orig     2005-03-23 18:03:27.000000000 +0100
+++ locale.inc  2005-03-25 09:58:22.433809358 +0100
@@ -176,11 +176,9 @@
         if ($key == 0) {
           $plid = 0;
         }
-        $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english[$key]));
+        $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'",
$english[$key], $comments));
         if ($loc->lid) { // a string exists
           $lid = $loc->lid;
-          // update location field
-          db_query("UPDATE {locales_source} SET location = '%s' WHERE
lid = %d", $comments, $lid);
           $trans2 = db_fetch_object(db_query("SELECT lid, translation,
plid, plural FROM {locales_target} WHERE lid = %d AND locale = '%s'",
$lid, $lang));
           if (!$trans2->lid) { // no translation in current language
             db_query("INSERT INTO {locales_target} (lid, locale,
translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid,
$lang, $trans, $plid, $key);
@@ -198,7 +196,7 @@
         }
         else { // no string
           db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", $comments, $english[$key]);
-          $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english[$key]));
+          $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'",
$english[$key], $comments));
           $lid = $loc->lid;
           db_query("INSERT INTO {locales_target} (lid, locale,
translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid,
$lang, $trans, $plid, $key);           if ($trans != '') {
@@ -213,11 +211,10 @@
     else {
       $english = $value['msgid'];
       $translation = $value['msgstr'];
-      $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english));
+      $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $english,
$comments));
       if ($loc->lid) { // a string exists
         $lid = $loc->lid;
         // update location field
-        db_query("UPDATE {locales_source} SET location = '%s' WHERE
source = '%s'", $comments, $english);
         $trans = db_fetch_object(db_query("SELECT lid, translation
FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang));
         if (!$trans->lid) { // no translation in current language
           db_query("INSERT INTO {locales_target} (lid, locale,
translation) VALUES (%d, '%s', '%s')", $lid, $lang, $translation);
@@ -662,7 +659,7 @@
   while(strlen($comm) < 128 && count($comment)) {
     $comm .= substr(array_shift($comment), 1) .', ';
   }
-  return substr($comm, 0, -2);
+  return trim(substr($comm, 0, -2));
 }
 /**
@@ -689,18 +686,37 @@
 }
 /**
+ * Get a list of all files with at least one translatable string
+ */
+function _locale_active_modules() {
+  $loc = db_query("SELECT location FROM {locales_source}");
+  $filenames[''] = t('All files');
+  while ($locat = db_fetch_object($loc)) {
+    $basename = basename(preg_replace('/:.*/', '', $locat->location));
+    if ($basename) {
+      $filenames[$basename] = $basename;
+    }
+  }
+  ksort($filenames);
+  return $filenames;
+}
+
+/**
  * User interface for the translation export screen
  */
 function _locale_admin_export_screen() {
   $languages = locale_supported_languages(FALSE, TRUE);
   $languages = array_map("t", $languages['name']);
   unset($languages['en']);
+  $filenames = _locale_active_modules();
+
   $output = '';
   // Offer language specific export if any language is set up
   if (count($languages)) {
     $output .= '<h2>'. t('Export translation') .'</h2>';
     $form = form_select(t('Language name'), 'langcode', '',
$languages, t('Select the language you would like to export in gettext
Portable Object (.po) format.'));
+    $form .= form_select(t('File name'), 'filename', '', $filenames,
t('Select the file you would like to export strings from.'));
     $form .= form_submit(t('Export'));
     $output .= form($form);
   }
@@ -719,13 +735,21 @@
  *
  * @param $language Selects a language to generate the output for
  */
-function _locale_export_po($language) {
+function _locale_export_po($language, $filename = NULL) {
   global $user;
+  if ($filename) {
+    $filename = "/%$filename%";
+    $sort = '(substring_index(s.location, ":", -1)+0)';
+  }
+  else {
+    $filename = '/%';
+    $sort = 'substring_index(s.location, ":", 1),
(substring_index(s.location, ":", -1)+0)';
+  }
   // Get language specific strings, or all strings
   if ($language) {
     $meta = db_fetch_object(db_query("SELECT * FROM {locales_meta}
WHERE locale = '%s'", $language));
-    $result = db_query("SELECT s.lid, s.source, s.location,
t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN
{locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' ORDER BY
t.plid, t.plural", $language);
+    $result = db_query("SELECT s.lid, s.source, s.location,
t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN
{locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' and
s.location like '%s' ORDER BY t.plid, t.plural, $sort, s.source,
s.lid", $language, $filename);
   }
   else {
     $result = db_query("SELECT s.lid, s.source, s.location, t.plid,
t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid
= t.lid GROUP BY s.lid ORDER BY t.plid, t.plural");
@@ -750,7 +774,14 @@
   // Generating Portable Object file for a language
   if ($language) {
-    $filename = $language .'.po';
+    if ($filename) {
+      $filename = preg_replace('/[^A-z0-9\.\-_]/', '', $filename);
+      if (!$filename) {
+        $filename = 'all';
+      }
+      $filename .= '.';
+    }
+    $filename .= $language .'.po';
     $header .= "# $meta->name translation of ".
variable_get('site_name', 'Drupal') ."\n";
     $header .= '# Copyright (c) '. date('Y') .' '. $user->name .' <'.
$user->mail .">\n";
     $header .= "#\n";


------------------------------------------------------------------------

March 25, 2005 - 10:59 : stefan nagtegaal

Attachment: http://drupal.org/files/issues/locale-inc_0.patch (5.84 KB)

I can remember that Dries and Steven prefered the use of uploaded
patch/diff files, instead of just putting the diff into an issue
itself.
So, attached you'll find the patch for locale.inc..
This is such a nice feature and should _really_ get in core once..
Whatever is wrong with this patch, i'll keep on updating until it has
hit the trunk..
I love it!


------------------------------------------------------------------------

March 25, 2005 - 11:00 : stefan nagtegaal

Attachment: http://drupal.org/files/issues/locale-module.patch (4.31 KB)

I can remember that Dries and Steven prefered the use of uploaded
patch/diff files, instead of just putting the diff into an issue
itself.
So, attached you'll find the patch for locale.module..
This is such a nice feature and should _really_ get in core once..
Whatever is wrong with this patch, i'll keep on updating until it has
hit the trunk..
I love it!
(Set status to patch again.)


------------------------------------------------------------------------

March 25, 2005 - 11:06 : chx

Please consider this for 4.6. The need is real great for this
functionality.


------------------------------------------------------------------------

March 25, 2005 - 13:39 : Olen

Just discovered a small bug.
There is an extra " at the end of tis query on line 194 of
locale.module:
$trans = db_fetch_object(db_query("SELECT lid FROM {locales_target}
WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale));


------------------------------------------------------------------------

March 25, 2005 - 21:03 : Goba

Things to note here:

debug_backtrace() could be expensive, it does not seem to me that
someone benchmarked this change
I expect realpath() to be quite expensive, since it tries to resolve
all possible symbolic links in the path, so it does quite some file
system checks. Note that there are really a lot of t() calls on a page!
The locale caching code was not changed as far as I can tell, and only
the non cached strings will be checked for file name and line number,
so those that have real problems (short strings) are not affected by
this patch, as they are precached and loaded and checked without the
line numbers... Excuse me if I find this funny :)
The real big roadblock here, is that you need to find a way to
represent these multiple strings in the po file... First it is not
possible to have different translations for the same string in PO
files, second, if it would be possible, the extractor would need to
have all the filename:line unique source strings extracted separately
(ie. you would have ~20 "Submit" strings to translate even only for
core, etc.). So you need to provide some solution for representing this
in the PO files, or unless this whole idea is pointless.



------------------------------------------------------------------------

March 26, 2005 - 00:49 : Olen

> Things to note here:
>
>    * debug_backtrace() could be expensive, it does not seem to me
that someone benchmarked this change
I have not done a real benchmark, but at least things don't "feel"
slower.  This function was much faster than i feared.  But other
solutions that gives the same info in a less expensive way would be
highly appreciated.
What I did have in mind first was to build something from extractor.php
to extract the strings from the files and add them all to the database
at once, not waiting for them to be accessed, but debug_backtrace at
least gave the right location without making too many changes to
exisiting code.
>     * I expect realpath() to be quite expensive, since it tries to
resolve all possible symbolic links in the path, so it does quite some
file system checks. Note that there are really a lot of t() calls on a
page!
The reason I used realpath is just because I use a couple of symlinks
for the base_dir, and did not want them in the location field.  I guess
this is not true for most people, so it could probably be removed.
>    * The locale caching code was not changed as far as I can tell,
and only the non cached strings will be checked for file name and line
number, so those that have real problems (short strings) are not
affected by this patch, as they are precached and loaded and checked
without the line numbers... Excuse me if I find this funny :)
If this is true, I totally agree.  I was not aware of the precache. I
believed things were only cached on first access (an hence affected by
my patch).
>     * The real big roadblock here, is that you need to find a way to
represent these multiple strings in the po file... First it is not
possible to have different translations for the same string in PO
files, second, if it would be possible, the extractor would need to
have all the filename:line unique source strings extracted separately
(ie. you would have ~20 "Submit" strings to translate even only for
core, etc.). So you need to provide some solution for representing this
in the PO files, or unless this whole idea is pointless.
I partly agree.  For me, "Submit" was one of the reasons for adding
this.  That string should be translated to at least three or four
different words or expressions in norwegian to be correct in all
places.
An other reason I started on this patch was because I wanted to find
out exactly where to string is originating when I do translations.
'locations' of the form "/?PHPSESSID=foobar" does not make it easy to
find out what I should translate some sting to if it does not have a
clear and unambiguous meaning.
What happens in the patch today, is that if someone calls t('Submit')
for the first time in a new location, the translations are searched.
And if a translation of the same string is found -  either in the same
file or at all - the tables are updated and the new location is added
to the _source table. The translated string is then added to the
_target table as well.
So if you translate 'Submit' once, that translation is used everywhere.
But if you need to change it in one or more places, download the (now
uncorrect) .po-file for that module (or other file) and change it on
that single line.
(Ofcourse, this could lead to the opposite problem - If you want to
change _all_ translations of "Submit" whois would now have to be done
on ~20 places instead of one, but I am also working on an improved
version of the built in translation tool, that will take care of this
(as well as fix a few other issues to make it more useful (even if it
is not ment to compete with specialized applications such as Kbabel or
GnomeTranslator). 
The formal correctness of the PO files was secondary to me when I
started this work, as the important issue was to make the translated
strings be correct in Drupal.
I am sure the problem that some strings need to be translated
differently in different parts of an application must have been an
issue other developers of other applications must have "discovered",
and that there must be a way to represent that in i PO file.
I'll have to read a bit about i18n and PO to find the best solution to
this.
I think I am trying to solve an important issue, but it should ofcourse
be done the right way.
Thanks for pointing this out.


------------------------------------------------------------------------

March 26, 2005 - 13:27 : Goba

Olen, you really need to investigate the original locale code further.
Since short strings are cached by Drupal, your code will not be called
for the 'Submit' string, and the proper file/line will not be found.
You added the check to the place where only the long strings are
searched for (actually the strings not cached).
We also *need to have* a completely po friendly way of representing
this, this might be of secondary consern to you, but the exploded
number of interface Drupal translations resulted from the fact that it
finally became easy to translate the interface with ready-to-use
desktop tools. No matter how friendly you make the web interface, it is
still tremendously easier to do text editing on the desktop.
Doing realpath() on all t() calls on a constant value is quite
pointless, and it should not be done. If it is desired to be called,
then the result should be cached somewhere. Resolving symlinks takes
time.
I agree that this problem is apperent, and it would be ideal to have
some fix, but this is not there yet.


------------------------------------------------------------------------

April 3, 2005 - 20:51 : stefan nagtegaal

Olen, is there any more work done on this issue? Please share your
thoughts and idea's please, because i truly like this to meet core when
Goba and Gerhard also think that "this is a good thing"tm..





More information about the drupal-devel mailing list