[drupal-devel] call for arms: The Final URL regexp [tm]
Jim Riggs
drupal-lists at jimandlissa.com
Tue Feb 15 15:27:41 UTC 2005
On 15 Feb, 2005, at 9:13, Bèr Kessels wrote:
> I would really like it if some regexp-guru can give me a hand with
> creating a
> single regexp that can be used drupalwide.
Below is some code from the _create_re() function in Textile.php in the
Textile module. That code is a PHP port of Brad Choate's Perl Textile
module. Some/all of this may be helpful.
// a URL discovery regex. This is from Mastering Regex from
O'Reilly.
// Some modifications by Brad Choate <brad at bradchoate dot com>
$this->urlre = '(?:
# Must start out right...
(?=[a-zA-Z0-9./#])
# Match the leading part (proto://hostname, or just hostname)
(?:
# ftp://, http://, or https:// leading part
(?:ftp|https?|telnet|nntp)://(?:\w+(?::\w+)?@)?[-\w]+(?:\.\w[-\w]*)+
|
(?:mailto:)?[-\+\w]+@[-\w]+(?:\.\w[-\w]*)+
|
# or, try to find a hostname with our more specific
sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\b
| edu\b
| biz\b
| gov\b
| in(?:t|fo)\b # .int or .info
| mil\b
| net\b
| org\b
| museum\b
| aero\b
| coop\b
| name\b
| pro\b
| [a-z][a-z]\b # two-letter country codes
)
)?
# Allow an optional port number
(?: : \d+ )?
# The rest of the URL is optional, and begins with / . . .
(?:
/?
# The rest are heuristics for what seems to work well
[^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]*
(?:
[.!,?;:]+ [^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]+ #\'"
)*
)?
)';
More information about the drupal-devel
mailing list