Hello
I do not wish to say how long I have been trying to figure this out.
I have a node with a CCK textarea, containing data like this:
123|aabbcc 456|10fe1c ...
In other words, a three digit reference to a hex code.
I am attempting to use a computed field to search that node for the reference (three digit code), and return only the hex value.
I am using a regular expression to do this. I have tried several varients of the regexp, all of which work whether I use them in PERL or egrep. However, when I use drush to test them with preg_match, the test fails every time.
What might I be doing wrong here? This was supposed to be done hours ago, of course.
drush 2> /dev/null eval '$datanode = node_load(2736); $res = preg_match('''/^200|([0-9a-fA-F]+)/''', $datanode->field_int_data[0][value], $match); echo $res . ": " . $match[1] . "\n";'
The output is:
0:
However:
drush 2> /dev/null eval '$datanode = node_load(2736); echo $datanode->field_int_data[0][value];' | perl -e 'while (<>) { print $_ if (/^200|([0-9a-zA-Z]+)/); }'
I get:
200|EEC57C
I gather that the PCRE library has changed some things, but I did not think the handling of subpatterns was one of them, so I am quite perplexed. It's probably something quite simple.
N.B. The reason I am redirecting STDERR to null, is because I get this warning when running drush.
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20060613+lfs/imagick.so' - libWand.so.9: cannot open shared object file: No such file or directory in Unknown on line 0
Which is referenced at http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg741806.html, with no apparent resolution.
Regards,
Luke
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I think this is an issue with your escaping but my eyes are not keen enough to see where the error is.
The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}|([0-9a-fA-F]+)/', $to_match, $match); return $match[1]; }
echo get_matches('123|aabbcc'); // Prints aabbcc echo "\n"; echo get_matches('456|10fe1c'); // Prints 10fe1c echo "\n"; echo get_matches('200|EEC57C'); // Prints EEC57C echo "\n";
I'd suggest moving this test into a php file and simply including it; you'll save yourself a lot of headache trying to figure out if you've escaped your slashes enough times ;) (once for the regular expression, another for php, a third for bash?)
CM Lubinski
Luke wrote:
Hello
I do not wish to say how long I have been trying to figure this out.
I have a node with a CCK textarea, containing data like this:
123|aabbcc 456|10fe1c ...
In other words, a three digit reference to a hex code.
I am attempting to use a computed field to search that node for the reference (three digit code), and return only the hex value.
I am using a regular expression to do this. I have tried several varients of the regexp, all of which work whether I use them in PERL or egrep. However, when I use drush to test them with preg_match, the test fails every time.
What might I be doing wrong here? This was supposed to be done hours ago, of course.
drush 2> /dev/null eval '$datanode = node_load(2736); $res = preg_match('''/^200|([0-9a-fA-F]+)/''', $datanode->field_int_data[0][value], $match); echo $res . ": " . $match[1] . "\n";'
The output is:
0:
However:
drush 2> /dev/null eval '$datanode = node_load(2736); echo $datanode->field_int_data[0][value];' | perl -e 'while (<>) { print $_ if (/^200|([0-9a-zA-Z]+)/); }'
I get:
200|EEC57C
I gather that the PCRE library has changed some things, but I did not think the handling of subpatterns was one of them, so I am quite perplexed. It's probably something quite simple.
N.B. The reason I am redirecting STDERR to null, is because I get this warning when running drush.
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20060613+lfs/imagick.so' - libWand.so.9: cannot open shared object file: No such file or directory in Unknown on line 0
Which is referenced at http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg741806.html, with no apparent resolution.
Regards,
Luke
(Solved! See below for details.)
On Wed, 3 Feb 2010, CM Lubinski wrote:
I think this is an issue with your escaping but my eyes are not keen enough to see where the error is.
Moving the code into an include, and running:
drush 2> /dev/null eval 'include("/tmp/drupal.inc");'
Gets me a no-matcher.
The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}|([0-9a-fA-F]+)/', $to_match, $match);
That assumes a single search string. I hardcoded the "200" where you have "\d+", because what actually happens is that the three digit code is obtained on the fly from a variable. So the regex is to search a group of lines with that format, and when it gets to the one containing "200" at the start, kick back the second part.
I tried it with your version, though, and it returned the first line's RHS, as I would expect--1 match.
I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space.
Rewriting it with:
$res = preg_match('/\b' . $search_for . '|([0-9a-fA-F]{6})/', $match_against, $match);
gets the job done.
Thanks for considering it.
Luke
Luke wrote:
(Solved! See below for details.)
I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space.
You could fiddle about with the options:
# m Multiline mode - ^ and $ match internal lines # s match as a Single line - . matches \n
c.f. perlreref
Dick
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Just discovered that this could also be solved with the 'm' flag: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
The 's' flag is also very useful if your expression will be matched across lines.
Good to know!
CM Lubinski
Luke wrote:
(Solved! See below for details.)
On Wed, 3 Feb 2010, CM Lubinski wrote:
I think this is an issue with your escaping but my eyes are not keen enough to see where the error is.
Moving the code into an include, and running:
drush 2> /dev/null eval 'include("/tmp/drupal.inc");'
Gets me a no-matcher.
The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}|([0-9a-fA-F]+)/', $to_match, $match);
That assumes a single search string. I hardcoded the "200" where you have "\d+", because what actually happens is that the three digit code is obtained on the fly from a variable. So the regex is to search a group of lines with that format, and when it gets to the one containing "200" at the start, kick back the second part.
I tried it with your version, though, and it returned the first line's RHS, as I would expect--1 match.
I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space.
Rewriting it with:
$res = preg_match('/\b' . $search_for . '|([0-9a-fA-F]{6})/', $match_against, $match);
gets the job done.
Thanks for considering it.
Luke