[support] html2text module/function?
Ivan Sergio Borgonovo
mail at webthatworks.it
Mon Apr 7 17:13:10 UTC 2008
On Mon, 7 Apr 2008 16:45:46 +0200
Ivan Sergio Borgonovo <mail at webthatworks.it> wrote:
> I'm looking for something similar to the
> html2text, w3m -dump , links -dump
> to turn an html input in a well formatted text (using padding etc...
> to render tables).
>
> Is there anything pre-cooked in drupal?
>
> html2text has a nice output but a) I need to fork b) it looks it
> doesn't support utf8.
>
> The overall target is to write HTML email using a template and avoid
> to rewrite text email templates.
Since it doesn't seem anything around that really do the job I came
up with:
function textify($html) {
$descriptorspec = array(
0 => array("pipe", "r"),
1 => array("pipe", "w"),
2 => array("file", "/dev/null", "a")
);
$cwd = '/tmp';
$env = array('LANG' => 'en_US.UTF-8');
$process = proc_open('w3m -dump -cols 68 -T text/html',
$descriptorspec, $pipes, $cwd, $env); if (is_resource($process)) {
fwrite($pipes[0], $html);
fclose($pipes[0]);
$text=stream_get_contents($pipes[1]);
fclose($pipes[1]);
}
return $text;
}
I hate to fork, but still writing an html parser was not something I
was planning to do over night.
It seems there is an HTML::[forgetwhat] in perl to do a similar job
as w3m -dump. I didn't find anything in pear...
If anyone know any good library... I'd be glad to kick out that
proc_open from my code.
Now it's time to do some security assessment and see if there is any
need to filter the incoming $html.
links, lynx didn't seem to cope well with utf8.
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
More information about the support
mailing list