I'm porting CCK content from a redesigned D4 site to D6 by extracting the data in a PHP script, generating CSV files then using node_import to create the new nodes. It's working out pretty well so far but I hit the following snag on one of the content types. There are hex encoded characters in the D4 content, somewhere in the importation process the text gets truncated at the first appearance of one of these eg. 0x99, 0x93, 0x94.
Am thinking the easiest way to handle it would be to preprocess the data in my PHP script, converting to what it really should be anyway ie. HTML special characters such as the following:
0x99 -> ™ 0x93 -> “ 0x94 -> ”
Anyone know of an existing utility to do this for me? Otherwise I can code up a conversion method to accomplish it.
Marty