[support] hex codes in node_import files

Marty Landman mlandman at face2interface.com
Fri Jan 8 23:27:21 UTC 2010


I got to answer my own question. Pretty simple after all, I found all 
the characters in the field data that mapped to 0x80, or decimal 128 
and up. Then converted them to HTML special character codes by 
prefixing with '&#' and suffixing ';'. So

146 -> ’

Seems to have worked well enough for my purpose although I don't know 
for sure that this accounts for all possibilities.

+++++++++++++++++++++++++++++++++++++++++++++

I'm porting CCK content from a redesigned D4 site to D6 by extracting 
the data in a PHP script, generating CSV files then using node_import 
to create the new nodes. It's working out pretty well so far but I 
hit the following snag on one of the content types. There are hex 
encoded characters in the D4 content, somewhere in the importation 
process the text gets truncated at the first appearance of one of 
these eg. 0x99, 0x93, 0x94.

Am thinking the easiest way to handle it would be to preprocess the 
data in my PHP script, converting to what it really should be anyway 
ie. HTML special characters such as the following:

0x99 -> ™
0x93 -> “
0x94 -> ”

Anyone know of an existing utility to do this for me? Otherwise I can 
code up a conversion method to accomplish it.

Marty



More information about the support mailing list