[drupal-devel] [feature] New API function: truncate_utf8_conc
Steven
drupal-devel at drupal.org
Sat Jul 30 11:48:53 UTC 2005
Issue status update for
http://drupal.org/node/21576
Post a follow up:
http://drupal.org/project/comments/add/21576
Project: Drupal
Version: cvs
Component: base system
Category: feature requests
Priority: normal
Assigned to: Bèr Kessels
Reported by: Bèr Kessels
Updated by: Steven
Status: patch
Now that we have a proper drupal_strlen() and drupal_substr() function
which count characters, truncate_utf8() is certainly not the place for
this. We should indeed cut off based on character length.
Steven
Previous comments:
------------------------------------------------------------------------
Fri, 29 Apr 2005 10:46:29 +0000 : Bèr Kessels
Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB)
This function truncates the string, but adds a string to the end, when
requested. For example to concenate three dots (...) to a long
username.
For example to turn the long username
* johndoe at www.personalweblog.com into johndoe at www.johndo...
Its possible with the current truncate_utf8(), but than you need to do
string comparison and some more truncation afterwards, which is not
optimal.
And please mark this as wontfix if not appropriate. I do not have time
to maintain this patch, so if its not up to standards or so, just close
this issue.
------------------------------------------------------------------------
Fri, 29 Apr 2005 14:36:04 +0000 : Dries
You can update existing code (e.g. like format_name()) to take advantage
of this patch:
$ grep -r "truncate_utf8" * | grep '\.\.\.'
includes/common.inc: $name = truncate_utf8($object->name, 15)
.'...';
modules/search.module: return truncate_utf8($text, 256) . ' ...';
Also, $conc is not very descriptive as variable name. Maybe call is
$postfix instead?
------------------------------------------------------------------------
Fri, 29 Apr 2005 14:52:29 +0000 : Bèr Kessels
Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB)
I updated the function to use the better name $postfix. Also, I changed
the two places where these three dots were appended in a custom way.
------------------------------------------------------------------------
Fri, 29 Apr 2005 18:03:46 +0000 : Goba
My first thought was that this should be integrated into truncate_utf8()
and I still think so. Having a separate function seems to be inadequate
to me. Also noticed that truncate_utf8() - and as a result, this
suggested function - is broken in that the passed length is not
properly used. If you have multibyte characters in the string, strlen()
will not return the number of chars but the number of bytes used to
represent them (except if you have mbstring overrides, but that is not
common).
------------------------------------------------------------------------
Fri, 29 Apr 2005 18:38:03 +0000 : Bèr Kessels
Goba,
I do not really understand the second patr of your post. Do you mean
that truncate_utf8() is broken, or that truncate_utf8_conc() will not
work() ? If the latter, could you please provide some help and tell me
what should be changed or where I should look?
------------------------------------------------------------------------
Fri, 29 Apr 2005 19:06:47 +0000 : Dries
Integrating this in the existing truncate_utf8 function sounds like the
better solution.
------------------------------------------------------------------------
Fri, 29 Apr 2005 19:58:17 +0000 : Goba
Strlen() will return the number of bytes used to represent the string,
eg. strlen("Bèr") will return 4, contrary to the fact that it seems to
be three characters. This makes truncating very much arbitrary, ie. you
will get strings of varying length with the same parameters if you have
non single byte chars in them. This looks odd if you use the data in a
table column or something.
------------------------------------------------------------------------
Sat, 30 Apr 2005 09:18:40 +0000 : Dries
Not going to commit this as is. Integrate with the existing function.
This should also fix the strlen() problem.
------------------------------------------------------------------------
Sat, 30 Apr 2005 10:22:20 +0000 : Bèr Kessels
The existing truncate_utf8() uses the same strlen. And what I got from
Gobas post, that function is broken too.
So, integrating is the best option, but will not solve the strlen()
issue.
------------------------------------------------------------------------
Sat, 30 Apr 2005 11:22:46 +0000 : Goba
Either we need to mention that the $length param is completely advisory
to the truncate_utf8() function, or someone comes up with an
intelligent way to do an strlen on an utf8 string. What I would do for
the latter, is to replace all the multibyte sequences with one byte (in
a temporary copy of the string), so we can get a real strlen(). But
still this does not make it possible to crop the string at a given
exact length.
More information about the drupal-devel
mailing list