[support] html2text module/function?

Ivan Sergio Borgonovo mail at webthatworks.it
Mon Apr 7 17:13:10 UTC 2008


On Mon, 7 Apr 2008 16:45:46 +0200
Ivan Sergio Borgonovo <mail at webthatworks.it> wrote:

> I'm looking for something similar to the
> html2text, w3m -dump , links -dump
> to turn an html input in a well formatted text (using padding etc...
> to render tables).
> 
> Is there anything pre-cooked in drupal?
> 
> html2text has a nice output but a) I need to fork b) it looks it
> doesn't support utf8.
> 
> The overall target is to write HTML email using a template and avoid
> to rewrite text email templates.

Since it doesn't seem anything around that really do the job I came
up with:

function textify($html) {
 $descriptorspec = array(
  0 => array("pipe", "r"),
  1 => array("pipe", "w"),
  2 => array("file", "/dev/null", "a")
 );
 $cwd = '/tmp';
 $env = array('LANG' => 'en_US.UTF-8');
 $process = proc_open('w3m -dump -cols 68 -T text/html',
$descriptorspec, $pipes, $cwd, $env); if (is_resource($process)) {

  fwrite($pipes[0], $html);
  fclose($pipes[0]);

  $text=stream_get_contents($pipes[1]);
  fclose($pipes[1]);
 }
 return $text;
}

I hate to fork, but still writing an html parser was not something I
was planning to do over night.
It seems there is an HTML::[forgetwhat] in perl to do a similar job
as w3m -dump. I didn't find anything in pear...
If anyone know any good library... I'd be glad to kick out that
proc_open from my code.

Now it's time to do some security assessment and see if there is any
need to filter the incoming $html.

links, lynx didn't seem to cope well with utf8.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it



More information about the support mailing list