preg_match bug or regex help needed?
I'm pretty good with regex, but I'm beating my head against the wall with this one, and wondering if there's a bug in php. Any help is much appreciated. (This is for nicedit.) <?php $text = 'style="font-style: italic; font-weight: bold;"'; $regex = "(\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;)+"; if (preg_match('/'. $regex .'+/i', $text, $matches)) { print_r($matches); } ?> If the regex is confusing, just think of it simply as ((\w+):(\w+);)+ ... with extra spaces and a definition for words that includes dashes. The regex vars $1 and $2 seem to only put the last matching value in the matches array. Array ( [0] => font-style: italic; font-weight: bold; [1] => font-weight: bold; [2] => font-weight [3] => bold ) I was hoping that it would also write font-style: italic into the matches. Thanks, Doug Green
Well, the first thing that I spot is that the resulting regex wil have two "+" at the end: /(\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;)++/i Could that be it? Doug Green wrote:
I'm pretty good with regex, but I'm beating my head against the wall with this one, and wondering if there's a bug in php.
Any help is much appreciated. (This is for nicedit.)
<?php $text = 'style="font-style: italic; font-weight: bold;"'; $regex = "(\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;)+"; if (preg_match('/'. $regex .'+/i', $text, $matches)) { print_r($matches); } ?>
If the regex is confusing, just think of it simply as ((\w+):(\w+);)+ ... with extra spaces and a definition for words that includes dashes.
The regex vars $1 and $2 seem to only put the last matching value in the matches array.
Array ( [0] => font-style: italic; font-weight: bold; [1] => font-weight: bold; [2] => font-weight [3] => bold )
I was hoping that it would also write font-style: italic into the matches.
Thanks, Doug Green
Excuse the html email. It means I can highlight things and hence communicate more clearly. First, I think you have an extra '+' character, which results in a double plus '++' being passed into preg_match() $regex = "(\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;)+";
if (preg_match('/'. $regex .'+/i', $text, $matches)) {...
which calls preg_match('/(\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;)++/i', 'style="font-style: italic; font-weight: bold;"', $matches)) Nevertheless, that's not the cause, because removing one or both plusses doesn't solve the problem. The thing that strikes me as odd with this regex is the wrapping of the outer parenthesis, in a multiplier. I would instead look for a way of repeating the smaller regex on the string until the string is finished. This code; $text = 'style="font-style: italic; font-weight: bold;"'; $regex = "\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;"; if (preg_match_all('/'. $regex .'/i', $text, $matches, PREG_SET_ORDER)) { print_r($matches); } returns this;
Array ( [0] => Array ( [0] => font-style: italic; [1] => font-style [2] => italic )
[1] => Array ( [0] => font-weight: bold; [1] => font-weight [2] => bold )
)
Which I think is more similar to what you are seeking?
Also, I expect there is probably a php library that does this with simpler function calls than writing you're own ajax. Possibly as part of or related to the DOM class and related classes http://nz.php.net/manual/en/ref.dom.php Cheers, Bevan/ -- Drupal.geek.nz | Gtalk bevan@lucion.co.nz | YIM rudgy_m_nz | .Mac/AOL b.rudge | skype b.rudge | Twitter.com/BevanR
And testing with random whitespace;
$text = ' style=" font-style : italic ; font-weight : bold ; " '; $regex = "\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;"; if (preg_match_all('/'. $regex .'/i', $text, $matches, PREG_SET_ORDER)) { print_r($matches); } $style = array(); foreach ($matches as $match) { $style[$match[1]] = $match[2]; } print_r($style);
works Array
( [0] => Array ( [0] => font-style : italic ; [1] => font-style [2] => italic )
[1] => Array ( [0] => font-weight : bold ; [1] => font-weight [2] => bold )
) Array ( [font-style] => italic [font-weight] => bold )
Testing with commonly practiced and supported syntactical errors and irregularities;
$text = 'style=font-style:italic;font-weight:bold';
doesn't all work
Array ( [0] => Array ( [0] => font-style:italic; [1] => font-style [2] => italic ) ) Array ( [font-style] => italic )
It also doesn't work for non-textual properties, and property names starting with '-' exclude the '-'; $text = 'style=border-width: 0 2em 10px 0; border-left: 1px solid #000; -moz-border-radius: foo;'; outputs
Array ( [0] => Array ( [0] => moz-border-radius: foo; [1] => moz-border-radius [2] => foo )
) Array ( [moz-border-radius] => foo )
This regex deals with those issues (but not the missing trailing semicolon ';' $regex = "\s*([a-z\-][a-z0-9\-]*)\s*:\s*([^;]*)\s*;"; I'm not sure what the best way to deal with that is given the context. Perhaps something like
$text = 'font-style:italic;font-weight:bold'; $regex = "\s*([a-z\-][a-z0-9\-]*)\s*:\s*([^;]*)\s*"; $styles = explode(';', $text); $all_matches = array(); foreach($styles as $style) { if (preg_match('/'. $regex .'/i', $style, $matches)) { print_r($matches); $all_matches[] = $matches; } } $style = array(); foreach ($all_matches as $match) { $style[$match[1]] = $match[2]; } print_r($style);
Bevan/
While this response wasn't the exact solution I was looking for, I want to say "Thank You". When I tried to simplify the problem for the devel list, I actually transposed a couple things wrong, ... so the double ++ was just a typo, and I also left of the style= part of the regex, which made Bevan's solution possible. But this convinced me to at put something in code, even if it was a few extra lines of code. BTW, this is for the http://drupal.org/project/nicedit. If anyone else is not quite satisfied with our wysiwyg editor options, please join me in working on this new editor option for Drupal. There are a couple of open problems documented on the project page, but I think that this is getting close to usable. Bevan Rudge wrote:
And testing with random whitespace;
$text = ' style=" font-style : italic ; font-weight : bold ; " '; $regex = "\s*([a-z][a-z0-9\-]*)\s*:\s*([a-z][a-z0-9\-]*)\s*;"; if (preg_match_all('/'. $regex .'/i', $text, $matches, PREG_SET_ORDER)) { print_r($matches); } $style = array(); foreach ($matches as $match) { $style[$match[1]] = $match[2]; } print_r($style);
works
Array
( [0] => Array ( [0] => font-style : italic ; [1] => font-style [2] => italic )
[1] => Array ( [0] => font-weight : bold ; [1] => font-weight [2] => bold )
) Array ( [font-style] => italic [font-weight] => bold )
Testing with commonly practiced and supported syntactical errors and irregularities;
$text = 'style=font-style:italic;font-weight:bold';
doesn't all work
Array ( [0] => Array ( [0] => font-style:italic; [1] => font-style [2] => italic ) ) Array ( [font-style] => italic )
It also doesn't work for non-textual properties, and property names starting with '-' exclude the '-';
$text = 'style=border-width: 0 2em 10px 0; border-left: 1px solid #000; -moz-border-radius: foo;';
outputs
Array ( [0] => Array ( [0] => moz-border-radius: foo; [1] => moz-border-radius [2] => foo )
) Array ( [moz-border-radius] => foo )
This regex deals with those issues (but not the missing trailing semicolon ';' $regex = "\s*([a-z\-][a-z0-9\-]*)\s*:\s*([^;]*)\s*;";
I'm not sure what the best way to deal with that is given the context. Perhaps something like
$text = 'font-style:italic;font-weight:bold'; $regex = "\s*([a-z\-][a-z0-9\-]*)\s*:\s*([^;]*)\s*"; $styles = explode(';', $text); $all_matches = array(); foreach($styles as $style) { if (preg_match('/'. $regex .'/i', $style, $matches)) { print_r($matches); $all_matches[] = $matches; } } $style = array(); foreach ($all_matches as $match) { $style[$match[1]] = $match[2]; } print_r($style);
Bevan/
On 30-Dec-07, at 4:49 PM, Doug Green wrote:
I'm pretty good with regex, but I'm beating my head against the wall with this one, and wondering if there's a bug in php.
check your PHP version is up to date. I had some real problems with the preg_replace() function in PHP 5.2.3 and 5.2.4. The new PHP 5.2.5 includes a new version of libpcre (7.3 Nuno) and that fixed the problem (or at least I hope it did, it hasn't happened since). the preg_() functions are definitely not infallible in PHP5 < 5.2.5. Upgrading might fix it for you? D.
participants (4)
-
Bevan Rudge -
Damien Norris -
Doug Green -
Olivier Jacquet