Issue status update for http://drupal.org/node/19425 Project: Drupal Version: cvs Component: locale.module Category: feature requests Priority: normal Assigned to: Anonymous Reported by: Olen Updated by: Goba Status: patch Olen, you really need to investigate the original locale code further. Since short strings are cached by Drupal, your code will not be called for the 'Submit' string, and the proper file/line will not be found. You added the check to the place where only the long strings are searched for (actually the strings not cached). We also *need to have* a completely po friendly way of representing this, this might be of secondary consern to you, but the exploded number of interface Drupal translations resulted from the fact that it finally became easy to translate the interface with ready-to-use desktop tools. No matter how friendly you make the web interface, it is still tremendously easier to do text editing on the desktop. Doing realpath() on all t() calls on a constant value is quite pointless, and it should not be done. If it is desired to be called, then the result should be cached somewhere. Resolving symlinks takes time. I agree that this problem is apperent, and it would be ideal to have some fix, but this is not there yet. Goba Previous comments: ------------------------------------------------------------------------ March 25, 2005 - 10:20 : Olen In many langages the same english string can (and should) be translated differently, depending on context. Here are two patches, one for locale.inc and one for locale.module that allows the same string to be translated more than once. It adds the correct path (filename) and line number of the translated string as "location" instead of the url where it was first seen. It will also allow you to download the .po-file for only one module at the time. These two improvements makes it a lot easier to find the correct context for the translation. It will use a "best effort" when finding translations, first trying to match on file:line, then only file and at last, any translated string with the same 'source'. Be aware that the first few page loads after a new translation is added are really slow, until the database has been updated with all the new strings and locations. Also note that this will lead to several "unused" strings in the database. I have an idea about some sort of timestamp to check when a string was last used, and a cron job that removes old, unused strings, but I think this could cause more problems than it fixes. Anyway, here are the patches. Let me know if you want them as attachments instead. locale.module: --- locale.module.orig 2005-03-23 08:42:29.000000000 +0100 +++ locale.module 2005-03-25 10:10:39.180616160 +0100 @@ -142,29 +142,67 @@ // We don't have this translation cached, so get it from the DB else { - $result = db_query("SELECT s.lid, t.translation FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE s.source = '%s' AND t.locale = '%s'", $string, $locale); + $caller = debug_backtrace(); + $docroot = realpath($_SERVER['DOCUMENT_ROOT']); + $file = ereg_replace($docroot, '', $caller[1]['file']); + $basefile = basename($file); + $line = $caller[1]['line']; + $origstring = $string; + $result = db_query("SELECT s.lid, s.location, t.translation FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE s.source = '%s' AND t.locale = '%s'", $string, $locale); // Translation found - if ($trans = db_fetch_object($result)) { + while ($trans = db_fetch_object($result)) { if (!empty($trans->translation)) { - $locale_t[$string] = $trans->translation; - $string = $trans->translation; + if ($trans->location == "$file:$line") { + // We have 100% match + $locale_t[$string] = $trans->translation; + $string = $trans->translation; + $match = $trans->lid; + $rate = 100; + break; + } + elseif (eregi($basefile, $trans->location) && ($rate < 100)) { + // We have a match in the same file, but on a different line + $locale_t[$string] = $trans->translation; + $string = $trans->translation; + $match = $trans->lid; + $rate = 75; + } + elseif ($rate < 50) { + // We have a match in another file + $locale_t[$string] = $trans->translation; + $string = $trans->translation; + $match = $trans->lid; + $rate = 50; + } + } + } + // We have a translation, but not a full file:line match + if (($match) && ($rate < 100)) { + // Lets update source and target with the correct location + db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", "$file:$line", $origstring); + if ($locale) { + $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line")); + db_query("INSERT INTO {locales_target} (lid, locale, translation) VALUES (%d, '%s', '%s')", $lid->lid, $locale, $string); } } // Either we have no such source string, or no translation - else { - $result = db_query("SELECT lid, source FROM {locales_source} WHERE source = '%s'", $string); - // We have no such translation + elseif (!$match) { + $result = db_query("SELECT lid, source FROM {locales_source} WHERE source = '%s' AND location = '%s'", $origstring, "$file:$line"); if ($obj = db_fetch_object($result)) { if ($locale) { - db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $obj->lid, $locale); + $trans = db_fetch_object(db_query("SELECT lid FROM {locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale)); + // We have no such translation + if (!$trans) { + db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $obj->lid, $locale); + } } } // We have no such source string else { - db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", request_uri(), $string); + db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", "$file:$line", $string); if ($locale) { - $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $string)); + $lid = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $string, "$file:$line")); db_query("INSERT INTO {locales_target} (lid, locale) VALUES (%d, '%s')", $lid->lid, $locale); } } @@ -410,7 +448,7 @@ include_once 'includes/locale.inc'; switch ($_POST['op']) { case t('Export'): - _locale_export_po($_POST['edit']['langcode']); + _locale_export_po($_POST['edit']['langcode'], $_POST['edit']['filename']); break; } print theme('page', _locale_admin_export_screen()); And for locale.inc --- locale.inc.orig 2005-03-23 18:03:27.000000000 +0100 +++ locale.inc 2005-03-25 09:58:22.433809358 +0100 @@ -176,11 +176,9 @@ if ($key == 0) { $plid = 0; } - $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english[$key])); + $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english[$key], $comments)); if ($loc->lid) { // a string exists $lid = $loc->lid; - // update location field - db_query("UPDATE {locales_source} SET location = '%s' WHERE lid = %d", $comments, $lid); $trans2 = db_fetch_object(db_query("SELECT lid, translation, plid, plural FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang)); if (!$trans2->lid) { // no translation in current language db_query("INSERT INTO {locales_target} (lid, locale, translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid, $lang, $trans, $plid, $key); @@ -198,7 +196,7 @@ } else { // no string db_query("INSERT INTO {locales_source} (location, source) VALUES ('%s', '%s')", $comments, $english[$key]); - $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english[$key])); + $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english[$key], $comments)); $lid = $loc->lid; db_query("INSERT INTO {locales_target} (lid, locale, translation, plid, plural) VALUES (%d, '%s', '%s', %d, %d)", $lid, $lang, $trans, $plid, $key); if ($trans != '') { @@ -213,11 +211,10 @@ else { $english = $value['msgid']; $translation = $value['msgstr']; - $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s'", $english)); + $loc = db_fetch_object(db_query("SELECT lid FROM {locales_source} WHERE source = '%s' AND location = '%s'", $english, $comments)); if ($loc->lid) { // a string exists $lid = $loc->lid; // update location field - db_query("UPDATE {locales_source} SET location = '%s' WHERE source = '%s'", $comments, $english); $trans = db_fetch_object(db_query("SELECT lid, translation FROM {locales_target} WHERE lid = %d AND locale = '%s'", $lid, $lang)); if (!$trans->lid) { // no translation in current language db_query("INSERT INTO {locales_target} (lid, locale, translation) VALUES (%d, '%s', '%s')", $lid, $lang, $translation); @@ -662,7 +659,7 @@ while(strlen($comm) < 128 && count($comment)) { $comm .= substr(array_shift($comment), 1) .', '; } - return substr($comm, 0, -2); + return trim(substr($comm, 0, -2)); } /** @@ -689,18 +686,37 @@ } /** + * Get a list of all files with at least one translatable string + */ +function _locale_active_modules() { + $loc = db_query("SELECT location FROM {locales_source}"); + $filenames[''] = t('All files'); + while ($locat = db_fetch_object($loc)) { + $basename = basename(preg_replace('/:.*/', '', $locat->location)); + if ($basename) { + $filenames[$basename] = $basename; + } + } + ksort($filenames); + return $filenames; +} + +/** * User interface for the translation export screen */ function _locale_admin_export_screen() { $languages = locale_supported_languages(FALSE, TRUE); $languages = array_map("t", $languages['name']); unset($languages['en']); + $filenames = _locale_active_modules(); + $output = ''; // Offer language specific export if any language is set up if (count($languages)) { $output .= '<h2>'. t('Export translation') .'</h2>'; $form = form_select(t('Language name'), 'langcode', '', $languages, t('Select the language you would like to export in gettext Portable Object (.po) format.')); + $form .= form_select(t('File name'), 'filename', '', $filenames, t('Select the file you would like to export strings from.')); $form .= form_submit(t('Export')); $output .= form($form); } @@ -719,13 +735,21 @@ * * @param $language Selects a language to generate the output for */ -function _locale_export_po($language) { +function _locale_export_po($language, $filename = NULL) { global $user; + if ($filename) { + $filename = "/%$filename%"; + $sort = '(substring_index(s.location, ":", -1)+0)'; + } + else { + $filename = '/%'; + $sort = 'substring_index(s.location, ":", 1), (substring_index(s.location, ":", -1)+0)'; + } // Get language specific strings, or all strings if ($language) { $meta = db_fetch_object(db_query("SELECT * FROM {locales_meta} WHERE locale = '%s'", $language)); - $result = db_query("SELECT s.lid, s.source, s.location, t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' ORDER BY t.plid, t.plural", $language); + $result = db_query("SELECT s.lid, s.source, s.location, t.translation, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid WHERE t.locale = '%s' and s.location like '%s' ORDER BY t.plid, t.plural, $sort, s.source, s.lid", $language, $filename); } else { $result = db_query("SELECT s.lid, s.source, s.location, t.plid, t.plural FROM {locales_source} s INNER JOIN {locales_target} t ON s.lid = t.lid GROUP BY s.lid ORDER BY t.plid, t.plural"); @@ -750,7 +774,14 @@ // Generating Portable Object file for a language if ($language) { - $filename = $language .'.po'; + if ($filename) { + $filename = preg_replace('/[^A-z0-9\.\-_]/', '', $filename); + if (!$filename) { + $filename = 'all'; + } + $filename .= '.'; + } + $filename .= $language .'.po'; $header .= "# $meta->name translation of ". variable_get('site_name', 'Drupal') ."\n"; $header .= '# Copyright (c) '. date('Y') .' '. $user->name .' <'. $user->mail .">\n"; $header .= "#\n"; ------------------------------------------------------------------------ March 25, 2005 - 10:59 : stefan nagtegaal Attachment: http://drupal.org/files/issues/locale-inc_0.patch (5.84 KB) I can remember that Dries and Steven prefered the use of uploaded patch/diff files, instead of just putting the diff into an issue itself. So, attached you'll find the patch for locale.inc.. This is such a nice feature and should _really_ get in core once.. Whatever is wrong with this patch, i'll keep on updating until it has hit the trunk.. I love it! ------------------------------------------------------------------------ March 25, 2005 - 11:00 : stefan nagtegaal Attachment: http://drupal.org/files/issues/locale-module.patch (4.31 KB) I can remember that Dries and Steven prefered the use of uploaded patch/diff files, instead of just putting the diff into an issue itself. So, attached you'll find the patch for locale.module.. This is such a nice feature and should _really_ get in core once.. Whatever is wrong with this patch, i'll keep on updating until it has hit the trunk.. I love it! (Set status to patch again.) ------------------------------------------------------------------------ March 25, 2005 - 11:06 : chx Please consider this for 4.6. The need is real great for this functionality. ------------------------------------------------------------------------ March 25, 2005 - 13:39 : Olen Just discovered a small bug. There is an extra " at the end of tis query on line 194 of locale.module: $trans = db_fetch_object(db_query("SELECT lid FROM {locales_target} WHERE lid = '%d' AND locale = '%s'"", $obj->lid, $locale)); ------------------------------------------------------------------------ March 25, 2005 - 21:03 : Goba Things to note here: debug_backtrace() could be expensive, it does not seem to me that someone benchmarked this change I expect realpath() to be quite expensive, since it tries to resolve all possible symbolic links in the path, so it does quite some file system checks. Note that there are really a lot of t() calls on a page! The locale caching code was not changed as far as I can tell, and only the non cached strings will be checked for file name and line number, so those that have real problems (short strings) are not affected by this patch, as they are precached and loaded and checked without the line numbers... Excuse me if I find this funny :) The real big roadblock here, is that you need to find a way to represent these multiple strings in the po file... First it is not possible to have different translations for the same string in PO files, second, if it would be possible, the extractor would need to have all the filename:line unique source strings extracted separately (ie. you would have ~20 "Submit" strings to translate even only for core, etc.). So you need to provide some solution for representing this in the PO files, or unless this whole idea is pointless. ------------------------------------------------------------------------ March 26, 2005 - 00:49 : Olen
Things to note here:
* debug_backtrace() could be expensive, it does not seem to me that someone benchmarked this change I have not done a real benchmark, but at least things don't "feel" slower. This function was much faster than i feared. But other solutions that gives the same info in a less expensive way would be highly appreciated. What I did have in mind first was to build something from extractor.php to extract the strings from the files and add them all to the database at once, not waiting for them to be accessed, but debug_backtrace at least gave the right location without making too many changes to exisiting code. * I expect realpath() to be quite expensive, since it tries to resolve all possible symbolic links in the path, so it does quite some file system checks. Note that there are really a lot of t() calls on a page! The reason I used realpath is just because I use a couple of symlinks for the base_dir, and did not want them in the location field. I guess this is not true for most people, so it could probably be removed. * The locale caching code was not changed as far as I can tell, and only the non cached strings will be checked for file name and line number, so those that have real problems (short strings) are not affected by this patch, as they are precached and loaded and checked without the line numbers... Excuse me if I find this funny :) If this is true, I totally agree. I was not aware of the precache. I believed things were only cached on first access (an hence affected by my patch). * The real big roadblock here, is that you need to find a way to represent these multiple strings in the po file... First it is not possible to have different translations for the same string in PO files, second, if it would be possible, the extractor would need to have all the filename:line unique source strings extracted separately (ie. you would have ~20 "Submit" strings to translate even only for core, etc.). So you need to provide some solution for representing this in the PO files, or unless this whole idea is pointless. I partly agree. For me, "Submit" was one of the reasons for adding this. That string should be translated to at least three or four different words or expressions in norwegian to be correct in all places. An other reason I started on this patch was because I wanted to find out exactly where to string is originating when I do translations. 'locations' of the form "/?PHPSESSID=foobar" does not make it easy to find out what I should translate some sting to if it does not have a clear and unambiguous meaning. What happens in the patch today, is that if someone calls t('Submit') for the first time in a new location, the translations are searched. And if a translation of the same string is found - either in the same file or at all - the tables are updated and the new location is added to the _source table. The translated string is then added to the _target table as well. So if you translate 'Submit' once, that translation is used everywhere. But if you need to change it in one or more places, download the (now uncorrect) .po-file for that module (or other file) and change it on that single line. (Ofcourse, this could lead to the opposite problem - If you want to change _all_ translations of "Submit" whois would now have to be done on ~20 places instead of one, but I am also working on an improved version of the built in translation tool, that will take care of this (as well as fix a few other issues to make it more useful (even if it is not ment to compete with specialized applications such as Kbabel or GnomeTranslator). The formal correctness of the PO files was secondary to me when I started this work, as the important issue was to make the translated strings be correct in Drupal. I am sure the problem that some strings need to be translated differently in different parts of an application must have been an issue other developers of other applications must have "discovered", and that there must be a way to represent that in i PO file. I'll have to read a bit about i18n and PO to find the best solution to this. I think I am trying to solve an important issue, but it should ofcourse be done the right way. Thanks for pointing this out.