[drupal-devel] call for arms: The Final URL regexp [tm]

Jim Riggs drupal-lists at jimandlissa.com
Tue Feb 15 15:27:41 UTC 2005


On 15 Feb, 2005, at 9:13, Bèr Kessels wrote:

> I would really like it if some regexp-guru can give me a hand with 
> creating a
> single regexp that can be used drupalwide.



Below is some code from the _create_re() function in Textile.php in the 
Textile module.  That code is a PHP port of Brad Choate's Perl Textile 
module.  Some/all of this may be helpful.


     // a URL discovery regex. This is from Mastering Regex from 
O'Reilly.
     // Some modifications by Brad Choate <brad at bradchoate dot com>
     $this->urlre = '(?:
     # Must start out right...
     (?=[a-zA-Z0-9./#])
     # Match the leading part (proto://hostname, or just hostname)
     (?:
         # ftp://, http://, or https:// leading part
         
(?:ftp|https?|telnet|nntp)://(?:\w+(?::\w+)?@)?[-\w]+(?:\.\w[-\w]*)+
         |
         (?:mailto:)?[-\+\w]+@[-\w]+(?:\.\w[-\w]*)+
         |
         # or, try to find a hostname with our more specific 
sub-expression
         (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains
         # Now ending .com, etc. For these, require lowercase
         (?-i: com\b
             | edu\b
             | biz\b
             | gov\b
             | in(?:t|fo)\b # .int or .info
             | mil\b
             | net\b
             | org\b
             | museum\b
             | aero\b
             | coop\b
             | name\b
             | pro\b
             | [a-z][a-z]\b # two-letter country codes
         )
     )?

     # Allow an optional port number
     (?: : \d+ )?

     # The rest of the URL is optional, and begins with / . . .
     (?:
      /?
      # The rest are heuristics for what seems to work well
      [^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]*
      (?:
         [.!,?;:]+  [^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]+ #\'"
      )*
     )?
)';





More information about the drupal-devel mailing list