[drupal-devel] [feature] Multiple translations of the same strings
Goba
drupal-devel at drupal.org
Sat Mar 26 12:27:17 UTC 2005
Issue status update for http://drupal.org/node/19425
Project: Drupal
Version: cvs
Component: locale.module
Category: feature requests
Priority: normal
Assigned to: Anonymous
Reported by: Olen
Updated by: Goba
Status: patch
Olen, you really need to investigate the original locale code further.
Since short strings are cached by Drupal, your code will not be called
for the 'Submit' string, and the proper file/line will not be found.
You added the check to the place where only the long strings are
searched for (actually the strings not cached).
We also *need to have* a completely po friendly way of representing
this, this might be of secondary consern to you, but the exploded
number of interface Drupal translations resulted from the fact that it
finally became easy to translate the interface with ready-to-use
desktop tools. No matter how friendly you make the web interface, it is
still tremendously easier to do text editing on the desktop.
Doing realpath() on all t() calls on a constant value is quite
pointless, and it should not be done. If it is desired to be called,
then the result should be cached somewhere. Resolving symlinks takes
time.
I agree that this problem is apperent, and it would be ideal to have
some fix, but this is not there yet.
Goba
Previous comments:
------------------------------------------------------------------------
March 25, 2005 - 10:20 : Olen
In many langages the same english string can (and should) be translated
differently, depending on context.
Here are two patches, one for locale.inc and one for locale.module that
allows the same string to be translated more than once.
It adds the correct path (filename) and line number of the translated
string as "location" instead of the url where it was first seen.
It will also allow you to download the .po-file for only one module at
the time.
These two improvements makes it a lot easier to find the correct
context for the translation.
It will use a "best effort" when finding translations, first trying to
match on file:line, then only file and at last, any translated string
with the same 'source'.
Be aware that the first few page loads after a new translation is added
are really slow, until the database has been updated with all the new
strings and locations.
Also note that this will lead to several "unused" strings in the
database. I have an idea about some sort of timestamp to check when a
string was last used, and a cron job that removes old, unused strings,
but I think this could cause more problems than it fixes.
Anyway, here are the patches. Let me know if you want them as
attachments instead.
locale.module:
--- locale.module.orig 2005-03-23 08:42:29.000000000 +0100
+++ locale.module 2005-03-25 10:10:39.180616160 +0100
@@ -142,29 +142,67 @@
// We don't have this translation cached, so get it from the DB
else {
- $result = db_query("SELECT s.lid, t.translation FROM
{locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE
s.source = '%s' AND t.locale = '%s'", $string, $locale);
+ $caller = debug_backtrace();
+ $docroot = realpath($_SERVER['DOCUMENT_ROOT']);
+ $file = ereg_replace($docroot, '', $caller[1]['file']);
+ $basefile = basename($file);
+ $line = $caller[1]['line'];
+ $origstring = $string;
+ $result = db_query("SELECT s.lid, s.location, t.translation FROM
{locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE
s.source = '%s' AND t.locale = '%s'", $string, $locale);
// Translation found
- if ($trans = db_fetch_object($result)) {
+ while ($trans = db_fetch_object($result)) {
if (!empty($trans->translation)) {
- $locale_t[$string] = $trans->translation;
- $string = $trans->translation;
+ if ($trans->location == "$file:$line") {
+ // We have 100% match
+ $locale_t[$string] = $trans->translation;
+ $string = $trans->translation;
+ $match = $trans->lid;
+ $rate = 100;
+ break;
+ }
+ elseif (eregi($basefile, $trans->location) && ($rate < 100)) {
+ // We have a match in the same file, but on a different line
+ $locale_t[$string] = $trans->translation;
+ $string = $trans->translation;
+ $match = $trans->lid;
+ $rate = 75;
+ }
+ elseif ($rate < 50) {
+ // We have a match in another file
+ $locale_t[$string] = $trans->translation;
+ $string = $trans->translation;
+ $match = $trans->lid;
+ $rate = 50;
+ }
+ }
+ }
+ // We have a translation, but not a full file:line match
+ if (($match) && ($rate < 100)) {
+ // Lets update source and target with the correct location
+ db_query("INSERT INTO {locales_source} (location, source) VALUES
('%s', '%s')", "$file:$line", $origstring);
+ if ($locale) {
+ $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $origstring,
"$file:$line"));
+ db_query("INSERT INTO {locales_target} (lid, locale,
translation) VALUES (%d, '%s', '%s')", $lid->lid, $locale, $string);
}
}
// Either we have no such source string, or no translation
- else {
- $result = db_query("SELECT lid, source FROM {locales_source}
WHERE source = '%s'", $string);
- // We have no such translation
+ elseif (!$match) {
+ $result = db_query("SELECT lid, source FROM {locales_source}
WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line");
if ($obj = db_fetch_object($result)) {
if ($locale) {
- db_query("INSERT INTO {locales_target} (lid, locale) VALUES
(%d, '%s')", $obj->lid, $locale);
+ $trans = db_fetch_object(db_query("SELECT lid FROM
{locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid,
$locale));
+ // We have no such translation
+ if (!$trans) {
+ db_query("INSERT INTO {locales_target} (lid, locale)
VALUES (%d, '%s')", $obj->lid, $locale);
+ }
}
}
// We have no such source string
else {
- db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", request_uri(), $string);
+ db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", "$file:$line", $string);
if ($locale) {
- $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $string));
+ $lid = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $string,
"$file:$line"));
db_query("INSERT INTO {locales_target} (lid, locale) VALUES
(%d, '%s')", $lid->lid, $locale);
}
}
@@ -410,7 +448,7 @@
include_once 'includes/locale.inc';
switch ($_POST['op']) {
case t('Export'):
- _locale_export_po($_POST['edit']['langcode']);
+ _locale_export_po($_POST['edit']['langcode'],
$_POST['edit']['filename']);
break;
}
print theme('page', _locale_admin_export_screen());
And for locale.inc
--- locale.inc.orig 2005-03-23 18:03:27.000000000 +0100
+++ locale.inc 2005-03-25 09:58:22.433809358 +0100
@@ -176,11 +176,9 @@
if ($key == 0) {
$plid = 0;
}
- $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english[$key]));
+ $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'",
$english[$key], $comments));
if ($loc->lid) { // a string exists
$lid = $loc->lid;
- // update location field
- db_query("UPDATE {locales_source} SET location = '%s' WHERE
lid = %d", $comments, $lid);
$trans2 = db_fetch_object(db_query("SELECT lid, translation,
plid, plural FROM {locales_target} WHERE lid = %d AND locale = '%s'",
$lid, $lang));
if (!$trans2->lid) { // no translation in current language
db_query("INSERT INTO {locales_target} (lid, locale,
translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid,
$lang, $trans, $plid, $key);
@@ -198,7 +196,7 @@
}
else { // no string
db_query("INSERT INTO {locales_source} (location, source)
VALUES ('%s', '%s')", $comments, $english[$key]);
- $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english[$key]));
+ $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'",
$english[$key], $comments));
$lid = $loc->lid;
db_query("INSERT INTO {locales_target} (lid, locale,
translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid,
$lang, $trans, $plid, $key); if ($trans != '') {
@@ -213,11 +211,10 @@
else {
$english = $value['msgid'];
$translation = $value['msgstr'];
- $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s'", $english));
+ $loc = db_fetch_object(db_query("SELECT lid FROM
{locales_source} WHERE source = '%s' AND location = '%s'", $english,
$comments));
if ($loc->lid) { // a string exists
$lid = $loc->lid;
// update location field
- db_query("UPDATE {locales_source} SET location = '%s' WHERE
source = '%s'", $comments, $english);
$trans = db_fetch_object(db_query("SELECT lid, translation
FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang));
if (!$trans->lid) { // no translation in current language
db_query("INSERT INTO {locales_target} (lid, locale,
translation) VALUES (%d, '%s', '%s')", $lid, $lang, $translation);
@@ -662,7 +659,7 @@
while(strlen($comm) < 128 && count($comment)) {
$comm .= substr(array_shift($comment), 1) .', ';
}
- return substr($comm, 0, -2);
+ return trim(substr($comm, 0, -2));
}
/**
@@ -689,18 +686,37 @@
}
/**
+ * Get a list of all files with at least one translatable string
+ */
+function _locale_active_modules() {
+ $loc = db_query("SELECT location FROM {locales_source}");
+ $filenames[''] = t('All files');
+ while ($locat = db_fetch_object($loc)) {
+ $basename = basename(preg_replace('/:.*/', '', $locat->location));
+ if ($basename) {
+ $filenames[$basename] = $basename;
+ }
+ }
+ ksort($filenames);
+ return $filenames;
+}
+
+/**
* User interface for the translation export screen
*/
function _locale_admin_export_screen() {
$languages = locale_supported_languages(FALSE, TRUE);
$languages = array_map("t", $languages['name']);
unset($languages['en']);
+ $filenames = _locale_active_modules();
+
$output = '';
// Offer language specific export if any language is set up
if (count($languages)) {
$output .= '<h2>'. t('Export translation') .'</h2>';
$form = form_select(t('Language name'), 'langcode', '',
$languages, t('Select the language you would like to export in gettext
Portable Object (.po) format.'));
+ $form .= form_select(t('File name'), 'filename', '', $filenames,
t('Select the file you would like to export strings from.'));
$form .= form_submit(t('Export'));
$output .= form($form);
}
@@ -719,13 +735,21 @@
*
* @param $language Selects a language to generate the output for
*/
-function _locale_export_po($language) {
+function _locale_export_po($language, $filename = NULL) {
global $user;
+ if ($filename) {
+ $filename = "/%$filename%";
+ $sort = '(substring_index(s.location, ":", -1)+0)';
+ }
+ else {
+ $filename = '/%';
+ $sort = 'substring_index(s.location, ":", 1),
(substring_index(s.location, ":", -1)+0)';
+ }
// Get language specific strings, or all strings
if ($language) {
$meta = db_fetch_object(db_query("SELECT * FROM {locales_meta}
WHERE locale = '%s'", $language));
- $result = db_query("SELECT s.lid, s.source, s.location,
t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN
{locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' ORDER BY
t.plid, t.plural", $language);
+ $result = db_query("SELECT s.lid, s.source, s.location,
t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN
{locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' and
s.location like '%s' ORDER BY t.plid, t.plural, $sort, s.source,
s.lid", $language, $filename);
}
else {
$result = db_query("SELECT s.lid, s.source, s.location, t.plid,
t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid
= t.lid GROUP BY s.lid ORDER BY t.plid, t.plural");
@@ -750,7 +774,14 @@
// Generating Portable Object file for a language
if ($language) {
- $filename = $language .'.po';
+ if ($filename) {
+ $filename = preg_replace('/[^A-z0-9\.\-_]/', '', $filename);
+ if (!$filename) {
+ $filename = 'all';
+ }
+ $filename .= '.';
+ }
+ $filename .= $language .'.po';
$header .= "# $meta->name translation of ".
variable_get('site_name', 'Drupal') ."\n";
$header .= '# Copyright (c) '. date('Y') .' '. $user->name .' <'.
$user->mail .">\n";
$header .= "#\n";
------------------------------------------------------------------------
March 25, 2005 - 10:59 : stefan nagtegaal
Attachment: http://drupal.org/files/issues/locale-inc_0.patch (5.84 KB)
I can remember that Dries and Steven prefered the use of uploaded
patch/diff files, instead of just putting the diff into an issue
itself.
So, attached you'll find the patch for locale.inc..
This is such a nice feature and should _really_ get in core once..
Whatever is wrong with this patch, i'll keep on updating until it has
hit the trunk..
I love it!
------------------------------------------------------------------------
March 25, 2005 - 11:00 : stefan nagtegaal
Attachment: http://drupal.org/files/issues/locale-module.patch (4.31 KB)
I can remember that Dries and Steven prefered the use of uploaded
patch/diff files, instead of just putting the diff into an issue
itself.
So, attached you'll find the patch for locale.module..
This is such a nice feature and should _really_ get in core once..
Whatever is wrong with this patch, i'll keep on updating until it has
hit the trunk..
I love it!
(Set status to patch again.)
------------------------------------------------------------------------
March 25, 2005 - 11:06 : chx
Please consider this for 4.6. The need is real great for this
functionality.
------------------------------------------------------------------------
March 25, 2005 - 13:39 : Olen
Just discovered a small bug.
There is an extra " at the end of tis query on line 194 of
locale.module:
$trans = db_fetch_object(db_query("SELECT lid FROM {locales_target}
WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale));
------------------------------------------------------------------------
March 25, 2005 - 21:03 : Goba
Things to note here:
debug_backtrace() could be expensive, it does not seem to me that
someone benchmarked this change
I expect realpath() to be quite expensive, since it tries to resolve
all possible symbolic links in the path, so it does quite some file
system checks. Note that there are really a lot of t() calls on a page!
The locale caching code was not changed as far as I can tell, and only
the non cached strings will be checked for file name and line number,
so those that have real problems (short strings) are not affected by
this patch, as they are precached and loaded and checked without the
line numbers... Excuse me if I find this funny :)
The real big roadblock here, is that you need to find a way to
represent these multiple strings in the po file... First it is not
possible to have different translations for the same string in PO
files, second, if it would be possible, the extractor would need to
have all the filename:line unique source strings extracted separately
(ie. you would have ~20 "Submit" strings to translate even only for
core, etc.). So you need to provide some solution for representing this
in the PO files, or unless this whole idea is pointless.
------------------------------------------------------------------------
March 26, 2005 - 00:49 : Olen
> Things to note here:
>
> * debug_backtrace() could be expensive, it does not seem to me
that someone benchmarked this change
I have not done a real benchmark, but at least things don't "feel"
slower. This function was much faster than i feared. But other
solutions that gives the same info in a less expensive way would be
highly appreciated.
What I did have in mind first was to build something from extractor.php
to extract the strings from the files and add them all to the database
at once, not waiting for them to be accessed, but debug_backtrace at
least gave the right location without making too many changes to
exisiting code.
> * I expect realpath() to be quite expensive, since it tries to
resolve all possible symbolic links in the path, so it does quite some
file system checks. Note that there are really a lot of t() calls on a
page!
The reason I used realpath is just because I use a couple of symlinks
for the base_dir, and did not want them in the location field. I guess
this is not true for most people, so it could probably be removed.
> * The locale caching code was not changed as far as I can tell,
and only the non cached strings will be checked for file name and line
number, so those that have real problems (short strings) are not
affected by this patch, as they are precached and loaded and checked
without the line numbers... Excuse me if I find this funny :)
If this is true, I totally agree. I was not aware of the precache. I
believed things were only cached on first access (an hence affected by
my patch).
> * The real big roadblock here, is that you need to find a way to
represent these multiple strings in the po file... First it is not
possible to have different translations for the same string in PO
files, second, if it would be possible, the extractor would need to
have all the filename:line unique source strings extracted separately
(ie. you would have ~20 "Submit" strings to translate even only for
core, etc.). So you need to provide some solution for representing this
in the PO files, or unless this whole idea is pointless.
I partly agree. For me, "Submit" was one of the reasons for adding
this. That string should be translated to at least three or four
different words or expressions in norwegian to be correct in all
places.
An other reason I started on this patch was because I wanted to find
out exactly where to string is originating when I do translations.
'locations' of the form "/?PHPSESSID=foobar" does not make it easy to
find out what I should translate some sting to if it does not have a
clear and unambiguous meaning.
What happens in the patch today, is that if someone calls t('Submit')
for the first time in a new location, the translations are searched.
And if a translation of the same string is found - either in the same
file or at all - the tables are updated and the new location is added
to the _source table. The translated string is then added to the
_target table as well.
So if you translate 'Submit' once, that translation is used everywhere.
But if you need to change it in one or more places, download the (now
uncorrect) .po-file for that module (or other file) and change it on
that single line.
(Ofcourse, this could lead to the opposite problem - If you want to
change _all_ translations of "Submit" whois would now have to be done
on ~20 places instead of one, but I am also working on an improved
version of the built in translation tool, that will take care of this
(as well as fix a few other issues to make it more useful (even if it
is not ment to compete with specialized applications such as Kbabel or
GnomeTranslator).
The formal correctness of the PO files was secondary to me when I
started this work, as the important issue was to make the translated
strings be correct in Drupal.
I am sure the problem that some strings need to be translated
differently in different parts of an application must have been an
issue other developers of other applications must have "discovered",
and that there must be a way to represent that in i PO file.
I'll have to read a bit about i18n and PO to find the best solution to
this.
I think I am trying to solve an important issue, but it should ofcourse
be done the right way.
Thanks for pointing this out.
More information about the drupal-devel
mailing list