[drupal-devel] [bug] replace binary strings in node.module with hex encodings

danielc drupal-devel at drupal.org
Fri Jun 3 14:41:10 UTC 2005


Issue status update for http://drupal.org/node/24025

 Project:      Drupal
 Version:      4.6.0
 Component:    base system
 Category:     bug reports
 Priority:     normal
 Assigned to:  Anonymous
 Reported by:  danielc
 Updated by:   danielc
 Status:       patch

Steven, I'm confused.  I hope you can set me straight, please.


Here's a URI of a graphic showing what I see on line 194 of
node.module, version 1.485.2.8.
http://www.analysisandsolutions.com/drupal/node.module.png
Is this what you see?  Is that what I'm supposed to be seeing?  Can you
please show me what you see?  What are these characters?  What do they
represent?  In what language?


You mentioned "If I want to edit the characters now, I simply type them
in using whatever input method is appropriate."  How do you type them
in?  I don't see keys with those characters on my keyboard.


Because node.module contains characters outside ISO-8859-1 (ord: 227,
128, 130, 216, 159) (hex: e3, 80, 82, d8, f9), people editing or
reading this file must have software that can handle handle these
bytes.  Drupal is an open source project.  Shouldn't the files therein
be easily accessible to everyone?  The characters presently being
limits participation to those with the particular editors.


I came across this issue because I'm editing node.module to do unusual
things for a client.  Then I made a patch file to send him.  Upon
inspecting the patch file, I noticed my editor, EditPlus, had morphed
the characters.




danielc



Previous comments:
------------------------------------------------------------------------

May 31, 2005 - 15:53 : danielc

Attachment: http://drupal.org/files/issues/node.module.nobinary.diff (864 bytes)

drupal/modules/node.module contains three binary strings.  The use of
binary strings causes problems in text editors incapable of handling
such strings.


For example, I have modified this file for internal use, but when I
save the file, the binary strings get converted to question marks. 
While someone could argue that I should get a better text editor, this
issue exists for many users, not just me.  Better yet, the solution is
simple...


PHP allows representing binary characters via hex encoding.  That's
what this patch does.


Thanks.




------------------------------------------------------------------------

May 31, 2005 - 17:23 : kbahey

+1 for this.


They never show up correctly for me (Linux. vim).




------------------------------------------------------------------------

May 31, 2005 - 18:19 : Steven

All Drupal code is UTF-8 encoded. Locale.inc contains many more such
strings and there's some in search.module too. Keeping them as
plain-text is vital to keeping the code readable, using hex escapes
reduces editability.


-1 on this.




------------------------------------------------------------------------

May 31, 2005 - 18:25 : Steven

PS: If your editor converts them to question marks, it means it doesn't
support UTF-8 properly and only handles your local ANSI codepage. You
will run into many more issues. I use Notepad2 on Windows and it works
like a charm.




------------------------------------------------------------------------

May 31, 2005 - 19:49 : danielc

Steven wrote:
> Keeping them as plain-text is vital to keeping the code readable,
> using hex escapes reduces editability.


Right now, when viewing these characters using readers don't deal with
these characters well,
like Mozilla looking at the web interface of the CVS repository
(go to
http://cvs.drupal.org/viewcvs/drupal/drupal/modules/node.module?annotate=1.493
then look at line 217), I see an a tilde, a Euro symbol, and a comma.  I
hardly consider that "readable."


When viewing the file in vi, the binary data already shows up as their
hex representatives (for example line 217 has "\xe3\x80\x82".  So,
changing them to an escaped/encoded string makes the string look
exactly the same.  The only difference is that my patch represents each
character using four ASCII characters.  Now everyone can easily read and
edit the values.




------------------------------------------------------------------------

May 31, 2005 - 20:53 : Steven

This is the fault of the viewcvs code which assumes ISO-8859-1 encoding,
it has nothing to do with using UTF-8. Those mails look fine in my email
client (Thunderbird).


"The only difference is that my patch represents each character using
four ASCII characters. Now everyone can easily read and edit the
values.

"
I disagree. If I want to edit the characters now, I simply type them in
using whatever input method is appropriate. The literal bytes don't mean
anything.


If I want to hexencode a piece of UTF-8 text, I have to view the text I
typed as literal bytes somehow (so I need a hex editor?) and enter the
values in the code. If I later want to figure out what that text really
says, I have to paste the hex values again in the hex editor, save to a
text file and open it as UTF-8. This is a waste of time.


This like saying all code should be hardwrapped at 80 characters,
because well, that's what ancient terminals use. Sorry, I don't buy it.
My computer has no problems using and displaying Unicode. There are tons
of freeware Unicode fonts around and as far as I know, most Unix tools
should handle it fine. As far as non-displayable characters goes, there
is an excellent fallback font which represents them with a small box
with the Unicode codepoint in them. This is much more useful than the
literal bytes as it actually means something.


Any modern OS should support fallback fonts for missing characters.
It's not my problem if you decide to torture yourself with vi and
friends.




------------------------------------------------------------------------

June 3, 2005 - 07:02 : Shot

It looks like drupal-devel is not gated both ways, so I’ll add my 2¢
here as well:


"
Any modern OS should support fallback fonts for missing characters.
It's not my problem if you decide to torture yourself with vi and
friends.


"
Just to clarify – vim works perfectly in an UTF-8 environment. If one
sets up a proper locale (pl_PL.UTF-8 in my case, en_US.UTF-8 in the
original poster's, perhaps) and properly configures his terminal,
everything simply works and the characters show up without a problem.


If an outsider vote counts, -1 on converting the strings to hex values.


Cheers,
-- Shot (Piotr Szotkowski)







More information about the drupal-devel mailing list