Issue status update for http://drupal.org/node/24025 Project: Drupal Version: 4.6.0 Component: base system Category: bug reports Priority: normal Assigned to: Anonymous Reported by: danielc Updated by: Steven Status: patch This is the fault of the viewcvs code which assumes ISO-8859-1 encoding, it has nothing to do with using UTF-8. Those mails look fine in my email client (Thunderbird). "The only difference is that my patch represents each character using four ASCII characters. Now everyone can easily read and edit the values. " I disagree. If I want to edit the characters now, I simply type them in using whatever input method is appropriate. The literal bytes don't mean anything. If I want to hexencode a piece of UTF-8 text, I have to view the text I typed as literal bytes somehow (so I need a hex editor?) and enter the values in the code. If I later want to figure out what that text really says, I have to paste the hex values again in the hex editor, save to a text file and open it as UTF-8. This is a waste of time. This like saying all code should be hardwrapped at 80 characters, because well, that's what ancient terminals use. Sorry, I don't buy it. My computer has no problems using and displaying Unicode. There are tons of freeware Unicode fonts around and as far as I know, most Unix tools should handle it fine. As far as non-displayable characters goes, there is an excellent fallback font which represents them with a small box with the Unicode codepoint in them. This is much more useful than the literal bytes as it actually means something. Any modern OS should support fallback fonts for missing characters. It's not my problem if you decide to torture yourself with vi and friends. Steven Previous comments: ------------------------------------------------------------------------ May 31, 2005 - 22:53 : danielc Attachment: http://drupal.org/files/issues/node.module.nobinary.diff (864 bytes) drupal/modules/node.module contains three binary strings. The use of binary strings causes problems in text editors incapable of handling such strings. For example, I have modified this file for internal use, but when I save the file, the binary strings get converted to question marks. While someone could argue that I should get a better text editor, this issue exists for many users, not just me. Better yet, the solution is simple... PHP allows representing binary characters via hex encoding. That's what this patch does. Thanks. ------------------------------------------------------------------------ June 1, 2005 - 00:23 : kbahey +1 for this. They never show up correctly for me (Linux. vim). ------------------------------------------------------------------------ June 1, 2005 - 01:19 : Steven All Drupal code is UTF-8 encoded. Locale.inc contains many more such strings and there's some in search.module too. Keeping them as plain-text is vital to keeping the code readable, using hex escapes reduces editability. -1 on this. ------------------------------------------------------------------------ June 1, 2005 - 01:25 : Steven PS: If your editor converts them to question marks, it means it doesn't support UTF-8 properly and only handles your local ANSI codepage. You will run into many more issues. I use Notepad2 on Windows and it works like a charm. ------------------------------------------------------------------------ June 1, 2005 - 02:49 : danielc Steven wrote:
Keeping them as plain-text is vital to keeping the code readable, using hex escapes reduces editability.
Right now, when viewing these characters using readers don't deal with these characters well, like Mozilla looking at the web interface of the CVS repository (go to http://cvs.drupal.org/viewcvs/drupal/drupal/modules/node.module?annotate=1.4... then look at line 217), I see an a tilde, a Euro symbol, and a comma. I hardly consider that "readable." When viewing the file in vi, the binary data already shows up as their hex representatives (for example line 217 has "\xe3\x80\x82". So, changing them to an escaped/encoded string makes the string look exactly the same. The only difference is that my patch represents each character using four ASCII characters. Now everyone can easily read and edit the values.