Reverse iconv encoding for sorting of international arrays
Hi, tagadelic has a bug[1]: when it sorts the tags based on their name, it compares two tags with strnatcasecmp. This is used to sort-order the array with tags. However, strnatcasecmp is not binary safe (grmmmbll), resulting in weird clouds. One would expect "Drive | every | état | Factory | fear " instead, état will be placed even before the tags starting with 'a'. How is this dealt with normally, I can't beleive I am the first one to run into this? Is there a reverse for drupal_convert_to_utf8($data, $encoding), so that the result is 7bit-safe and usable in strnatcasecmp? Am I looking in the complete wrong direction? [1]http://drupal.org/node/108001 -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
On Thu, 11 Jan 2007, [iso-8859-1] B�r Kessels wrote:
tagadelic has a bug[1]: when it sorts the tags based on their name, it compares two tags with strnatcasecmp. This is used to sort-order the array with tags.
However, strnatcasecmp is not binary safe (grmmmbll), resulting in weird clouds. One would expect "Drive | every | �tat | Factory | fear " instead, �tat will be placed even before the tags starting with 'a'.
How is this dealt with normally, I can't beleive I am the first one to run into this? Is there a reverse for drupal_convert_to_utf8($data, $encoding), so that the result is 7bit-safe and usable in strnatcasecmp? Am I looking in the complete wrong direction?
Ber, ordering is different by language. Dutch people might have completely different rules for ordering for the same letters then Hungarians do. Using mbstring() functions might help, although those only allow for utf8 handling, not proper locale handling. If you would like to have a function available for Drupal independently of the utf8 function set used, I don't think it is available already. Gabor
Op donderdag 11 januari 2007 20:37, schreef Gabor Hojtsy:
Ber, ordering is different by language. Dutch people might have completely different rules for ordering for the same letters then Hungarians do.
Yea, now that you say so I realize this too. Stupid I did not thignk of it before.
Using mbstring() functions might help, although those only allow for utf8 handling, not proper locale handling. If you would like to have a function available for Drupal independently of the utf8 function set used, I don't think it is available already.
This should not be too hard, since such a function would be similar to drupal_convert_to_utf8 in way of checking for installed libs. However, as you point out, this is actually locale-dependant. Which means that, if I want to make a proper ordering system for tagadelic (or any ordered list in Drupal, in fact) I would have to include its ordering mechanism in locale.module, or at least make it a locale setting. Are there any libraries that do this for PHP already where I can look? Bèr -- Drupal, Ruby on Rails and Joomla! development: webschuur.com | Drupal hosting: www.sympal.nl
Bèr Kessels wrote:
Op donderdag 11 januari 2007 20:37, schreef Gabor Hojtsy:
Ber, ordering is different by language. Dutch people might have completely different rules for ordering for the same letters then Hungarians do.
Yea, now that you say so I realize this too. Stupid I did not thignk of it before.
Using mbstring() functions might help, although those only allow for utf8 handling, not proper locale handling. If you would like to have a function available for Drupal independently of the utf8 function set used, I don't think it is available already.
This should not be too hard, since such a function would be similar to drupal_convert_to_utf8 in way of checking for installed libs.
However, as you point out, this is actually locale-dependant. Which means that, if I want to make a proper ordering system for tagadelic (or any ordered list in Drupal, in fact) I would have to include its ordering mechanism in locale.module, or at least make it a locale setting.
Are there any libraries that do this for PHP already where I can look?
PHP does support sorting by locale since PHP 4.4 and 5, see php.net/sort SORT_LOCALE_STRING Cheers, Gerhard
On Thu, 11 Jan 2007, Gerhard Killesreiter wrote:
Are there any libraries that do this for PHP already where I can look?
PHP does support sorting by locale since PHP 4.4 and 5, see php.net/sort SORT_LOCALE_STRING
Indeed, PHP supports locale aware sorting. Unfortunately all is only nice in PHP 6 (see http://www.php.net/i18n_loc_set_default for a nice example). The question is if sort() supports utf8 sorting by the surrent system locale properly (it sorts by ASCII code by default). I did not try this before. Gabor
Gabor Hojtsy wrote:
On Thu, 11 Jan 2007, Gerhard Killesreiter wrote:
Are there any libraries that do this for PHP already where I can look?
PHP does support sorting by locale since PHP 4.4 and 5, see php.net/sort SORT_LOCALE_STRING
Indeed, PHP supports locale aware sorting. Unfortunately all is only nice in PHP 6 (see http://www.php.net/i18n_loc_set_default for a nice example). The question is if sort() supports utf8 sorting by the surrent system locale properly (it sorts by ASCII code by default). I did not try this before.
Neither did I, but I know that you can set the locale using setlocale just fine. It is unclear to me how i18n_loc_set_default is different from this. Cheers, Gerhard
On Thu, 11 Jan 2007, Gerhard Killesreiter wrote:
Indeed, PHP supports locale aware sorting. Unfortunately all is only nice in PHP 6 (see http://www.php.net/i18n_loc_set_default for a nice example). The question is if sort() supports utf8 sorting by the surrent system locale properly (it sorts by ASCII code by default). I did not try this before.
Neither did I, but I know that you can set the locale using setlocale just fine. It is unclear to me how i18n_loc_set_default is different from this.
PHP 6 is all based on utf8, and this is implemented with the IBM ICU library on the backend. That function sets the locale used in the ICU operations, not the system locale. Gabor
participants (3)
-
Bèr Kessels -
Gabor Hojtsy -
Gerhard Killesreiter