preg_match failes where egrep and PERL succeed?
Hello I do not wish to say how long I have been trying to figure this out. I have a node with a CCK textarea, containing data like this: 123|aabbcc 456|10fe1c ... In other words, a three digit reference to a hex code. I am attempting to use a computed field to search that node for the reference (three digit code), and return only the hex value. I am using a regular expression to do this. I have tried several varients of the regexp, all of which work whether I use them in PERL or egrep. However, when I use drush to test them with preg_match, the test fails every time. What might I be doing wrong here? This was supposed to be done hours ago, of course. drush 2> /dev/null eval '$datanode = node_load(2736); $res = preg_match('\''/^200\|([0-9a-fA-F]+)/'\'', $datanode->field_int_data[0][value], $match); echo $res . ": " . $match[1] . "\n";' The output is: 0: However: drush 2> /dev/null eval '$datanode = node_load(2736); echo $datanode->field_int_data[0][value];' | perl -e 'while (<>) { print $_ if (/^200\|([0-9a-zA-Z]+)/); }' I get: 200|EEC57C I gather that the PCRE library has changed some things, but I did not think the handling of subpatterns was one of them, so I am quite perplexed. It's probably something quite simple. N.B. The reason I am redirecting STDERR to null, is because I get this warning when running drush. PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20060613+lfs/imagick.so' - libWand.so.9: cannot open shared object file: No such file or directory in Unknown on line 0 Which is referenced at http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg741806.html, with no apparent resolution. Regards, Luke
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think this is an issue with your escaping but my eyes are not keen enough to see where the error is. The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}\|([0-9a-fA-F]+)/', $to_match, $match); return $match[1]; } echo get_matches('123|aabbcc'); // Prints aabbcc echo "\n"; echo get_matches('456|10fe1c'); // Prints 10fe1c echo "\n"; echo get_matches('200|EEC57C'); // Prints EEC57C echo "\n"; I'd suggest moving this test into a php file and simply including it; you'll save yourself a lot of headache trying to figure out if you've escaped your slashes enough times ;) (once for the regular expression, another for php, a third for bash?) CM Lubinski Luke wrote:
Hello
I do not wish to say how long I have been trying to figure this out.
I have a node with a CCK textarea, containing data like this:
123|aabbcc 456|10fe1c ...
In other words, a three digit reference to a hex code.
I am attempting to use a computed field to search that node for the reference (three digit code), and return only the hex value.
I am using a regular expression to do this. I have tried several varients of the regexp, all of which work whether I use them in PERL or egrep. However, when I use drush to test them with preg_match, the test fails every time.
What might I be doing wrong here? This was supposed to be done hours ago, of course.
drush 2> /dev/null eval '$datanode = node_load(2736); $res = preg_match('\''/^200\|([0-9a-fA-F]+)/'\'', $datanode->field_int_data[0][value], $match); echo $res . ": " . $match[1] . "\n";'
The output is:
0:
However:
drush 2> /dev/null eval '$datanode = node_load(2736); echo $datanode->field_int_data[0][value];' | perl -e 'while (<>) { print $_ if (/^200\|([0-9a-zA-Z]+)/); }'
I get:
200|EEC57C
I gather that the PCRE library has changed some things, but I did not think the handling of subpatterns was one of them, so I am quite perplexed. It's probably something quite simple.
N.B. The reason I am redirecting STDERR to null, is because I get this warning when running drush.
PHP Warning: PHP Startup: Unable to load dynamic library '/usr/lib/php5/20060613+lfs/imagick.so' - libWand.so.9: cannot open shared object file: No such file or directory in Unknown on line 0
Which is referenced at http://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg741806.html, with no apparent resolution.
Regards,
Luke
-----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.12 (Darwin) iEYEARECAAYFAktqVG0ACgkQfzi1OiZiJLCPkQCeK3DI6cG9vD0g1jgO3mWyf01P T2IAn0voucjIdxObq0khlJ58t62h8SAz =R0wG -----END PGP SIGNATURE-----
(Solved! See below for details.) On Wed, 3 Feb 2010, CM Lubinski wrote:
I think this is an issue with your escaping but my eyes are not keen enough to see where the error is.
Moving the code into an include, and running: drush 2> /dev/null eval 'include("/tmp/drupal.inc");' Gets me a no-matcher.
The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}\|([0-9a-fA-F]+)/', $to_match, $match);
That assumes a single search string. I hardcoded the "200" where you have "\d+", because what actually happens is that the three digit code is obtained on the fly from a variable. So the regex is to search a group of lines with that format, and when it gets to the one containing "200" at the start, kick back the second part. I tried it with your version, though, and it returned the first line's RHS, as I would expect--1 match. I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space. Rewriting it with: $res = preg_match('/\b' . $search_for . '\|([0-9a-fA-F]{6})/', $match_against, $match); gets the job done. Thanks for considering it. Luke
Luke wrote:
(Solved! See below for details.)
I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space.
You could fiddle about with the options: # m Multiline mode - ^ and $ match internal lines # s match as a Single line - . matches \n c.f. perlreref Dick
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Just discovered that this could also be solved with the 'm' flag: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php The 's' flag is also very useful if your expression will be matched across lines. Good to know! CM Lubinski Luke wrote:
(Solved! See below for details.)
On Wed, 3 Feb 2010, CM Lubinski wrote:
I think this is an issue with your escaping but my eyes are not keen enough to see where the error is.
Moving the code into an include, and running:
drush 2> /dev/null eval 'include("/tmp/drupal.inc");'
Gets me a no-matcher.
The regular expression itself works fine: <?php function get_matches($to_match) { preg_match('/^\d{3}\|([0-9a-fA-F]+)/', $to_match, $match);
That assumes a single search string. I hardcoded the "200" where you have "\d+", because what actually happens is that the three digit code is obtained on the fly from a variable. So the regex is to search a group of lines with that format, and when it gets to the one containing "200" at the start, kick back the second part.
I tried it with your version, though, and it returned the first line's RHS, as I would expect--1 match.
I've just figured it out. The difference between the greps and PCRE, is that in the greps, the data is taken on a line by line basis. So, ^ matches the start of a line. PCRE is not treating newline specially, so ^ and $ apply to the beginning and end of the entire pattern space.
Rewriting it with:
$res = preg_match('/\b' . $search_for . '\|([0-9a-fA-F]{6})/', $match_against, $match);
gets the job done.
Thanks for considering it.
Luke -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkuGtgUACgkQfzi1OiZiJLAA8gCePDFVJ4986MPYwOQeKZMZM/J7 3PMAnAujSZKE1+BiWZ1ORri3Ae8Hkvb3 =f/AF -----END PGP SIGNATURE-----
participants (3)
-
CM Lubinski -
Dick Middleton -
Luke