[drupal-devel] [feature] New API function: truncate_utf8_conc
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Bèr Kessels Status: patch Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. Bèr Kessels
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Dries Status: patch You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? Dries Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 12:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue.
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Bèr Kessels Status: patch Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. Bèr Kessels Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead?
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Goba Status: patch My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). Goba Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 15:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way.
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Bèr Kessels Status: patch Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? Bèr Kessels Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 15:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 19:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common).
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Dries Status: patch Integrating this in the existing truncate_utf8 function sounds like the better solution. Dries Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 12:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 16:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 16:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 20:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ April 29, 2005 - 20:38 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look?
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Goba Status: patch Strlen() will return the number of bytes used to represent the string, eg. strlen("Bèr") will return 4, contrary to the fact that it seems to be three characters. This makes truncating very much arbitrary, ie. you will get strings of varying length with the same parameters if you have non single byte chars in them. This looks odd if you use the data in a table column or something. Goba Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 15:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 19:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ April 29, 2005 - 19:38 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? ------------------------------------------------------------------------ April 29, 2005 - 20:06 : Dries Integrating this in the existing truncate_utf8 function sounds like the better solution.
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Dries Status: patch Not going to commit this as is. Integrate with the existing function. This should also fix the strlen() problem. Dries Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 12:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 16:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 16:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 20:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ April 29, 2005 - 20:38 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? ------------------------------------------------------------------------ April 29, 2005 - 21:06 : Dries Integrating this in the existing truncate_utf8 function sounds like the better solution. ------------------------------------------------------------------------ April 29, 2005 - 21:58 : Goba Strlen() will return the number of bytes used to represent the string, eg. strlen("Bèr") will return 4, contrary to the fact that it seems to be three characters. This makes truncating very much arbitrary, ie. you will get strings of varying length with the same parameters if you have non single byte chars in them. This looks odd if you use the data in a table column or something.
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Bèr Kessels Status: patch The existing truncate_utf8() uses the same strlen. And what I got from Gobas post, that function is broken too. So, integrating is the best option, but will not solve the strlen() issue. Bèr Kessels Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 15:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 19:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ April 29, 2005 - 19:38 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? ------------------------------------------------------------------------ April 29, 2005 - 20:06 : Dries Integrating this in the existing truncate_utf8 function sounds like the better solution. ------------------------------------------------------------------------ April 29, 2005 - 20:58 : Goba Strlen() will return the number of bytes used to represent the string, eg. strlen("Bèr") will return 4, contrary to the fact that it seems to be three characters. This makes truncating very much arbitrary, ie. you will get strings of varying length with the same parameters if you have non single byte chars in them. This looks odd if you use the data in a table column or something. ------------------------------------------------------------------------ April 30, 2005 - 10:18 : Dries Not going to commit this as is. Integrate with the existing function. This should also fix the strlen() problem.
Issue status update for http://drupal.org/node/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Goba Status: patch Either we need to mention that the $length param is completely advisory to the truncate_utf8() function, or someone comes up with an intelligent way to do an strlen on an utf8 string. What I would do for the latter, is to replace all the multibyte sequences with one byte (in a temporary copy of the string), so we can get a real strlen(). But still this does not make it possible to crop the string at a given exact length. Goba Previous comments: ------------------------------------------------------------------------ April 29, 2005 - 11:46 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ April 29, 2005 - 15:36 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ April 29, 2005 - 15:52 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ April 29, 2005 - 19:03 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ April 29, 2005 - 19:38 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? ------------------------------------------------------------------------ April 29, 2005 - 20:06 : Dries Integrating this in the existing truncate_utf8 function sounds like the better solution. ------------------------------------------------------------------------ April 29, 2005 - 20:58 : Goba Strlen() will return the number of bytes used to represent the string, eg. strlen("Bèr") will return 4, contrary to the fact that it seems to be three characters. This makes truncating very much arbitrary, ie. you will get strings of varying length with the same parameters if you have non single byte chars in them. This looks odd if you use the data in a table column or something. ------------------------------------------------------------------------ April 30, 2005 - 10:18 : Dries Not going to commit this as is. Integrate with the existing function. This should also fix the strlen() problem. ------------------------------------------------------------------------ April 30, 2005 - 11:22 : Bèr Kessels The existing truncate_utf8() uses the same strlen. And what I got from Gobas post, that function is broken too. So, integrating is the best option, but will not solve the strlen() issue.
Issue status update for http://drupal.org/node/21576 Post a follow up: http://drupal.org/project/comments/add/21576 Project: Drupal Version: cvs Component: base system Category: feature requests Priority: normal Assigned to: Bèr Kessels Reported by: Bèr Kessels Updated by: Steven Status: patch Now that we have a proper drupal_strlen() and drupal_substr() function which count characters, truncate_utf8() is certainly not the place for this. We should indeed cut off based on character length. Steven Previous comments: ------------------------------------------------------------------------ Fri, 29 Apr 2005 10:46:29 +0000 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc.diff (1.63 KB) This function truncates the string, but adds a string to the end, when requested. For example to concenate three dots (...) to a long username. For example to turn the long username * johndoe@www.personalweblog.com into johndoe@www.johndo... Its possible with the current truncate_utf8(), but than you need to do string comparison and some more truncation afterwards, which is not optimal. And please mark this as wontfix if not appropriate. I do not have time to maintain this patch, so if its not up to standards or so, just close this issue. ------------------------------------------------------------------------ Fri, 29 Apr 2005 14:36:04 +0000 : Dries You can update existing code (e.g. like format_name()) to take advantage of this patch: $ grep -r "truncate_utf8" * | grep '\.\.\.' includes/common.inc: $name = truncate_utf8($object->name, 15) .'...'; modules/search.module: return truncate_utf8($text, 256) . ' ...'; Also, $conc is not very descriptive as variable name. Maybe call is $postfix instead? ------------------------------------------------------------------------ Fri, 29 Apr 2005 14:52:29 +0000 : Bèr Kessels Attachment: http://drupal.org/files/issues/common_inc_truncate_utf8_conc_0.diff (2.82 KB) I updated the function to use the better name $postfix. Also, I changed the two places where these three dots were appended in a custom way. ------------------------------------------------------------------------ Fri, 29 Apr 2005 18:03:46 +0000 : Goba My first thought was that this should be integrated into truncate_utf8() and I still think so. Having a separate function seems to be inadequate to me. Also noticed that truncate_utf8() - and as a result, this suggested function - is broken in that the passed length is not properly used. If you have multibyte characters in the string, strlen() will not return the number of chars but the number of bytes used to represent them (except if you have mbstring overrides, but that is not common). ------------------------------------------------------------------------ Fri, 29 Apr 2005 18:38:03 +0000 : Bèr Kessels Goba, I do not really understand the second patr of your post. Do you mean that truncate_utf8() is broken, or that truncate_utf8_conc() will not work() ? If the latter, could you please provide some help and tell me what should be changed or where I should look? ------------------------------------------------------------------------ Fri, 29 Apr 2005 19:06:47 +0000 : Dries Integrating this in the existing truncate_utf8 function sounds like the better solution. ------------------------------------------------------------------------ Fri, 29 Apr 2005 19:58:17 +0000 : Goba Strlen() will return the number of bytes used to represent the string, eg. strlen("Bèr") will return 4, contrary to the fact that it seems to be three characters. This makes truncating very much arbitrary, ie. you will get strings of varying length with the same parameters if you have non single byte chars in them. This looks odd if you use the data in a table column or something. ------------------------------------------------------------------------ Sat, 30 Apr 2005 09:18:40 +0000 : Dries Not going to commit this as is. Integrate with the existing function. This should also fix the strlen() problem. ------------------------------------------------------------------------ Sat, 30 Apr 2005 10:22:20 +0000 : Bèr Kessels The existing truncate_utf8() uses the same strlen. And what I got from Gobas post, that function is broken too. So, integrating is the best option, but will not solve the strlen() issue. ------------------------------------------------------------------------ Sat, 30 Apr 2005 11:22:46 +0000 : Goba Either we need to mention that the $length param is completely advisory to the truncate_utf8() function, or someone comes up with an intelligent way to do an strlen on an utf8 string. What I would do for the latter, is to replace all the multibyte sequences with one byte (in a temporary copy of the string), so we can get a real strlen(). But still this does not make it possible to crop the string at a given exact length.
participants (4)
-
Bèr Kessels -
Dries -
Goba -
Steven