[drupal-devel] call for arms: The Final URL regexp [tm]
Hi, We currently have loads of web-url rexexping scattered over the place. A quick list: check_url($value) (core) search ranking (core) field_url.inc (flexinode) weblink.module (various places) url_filter.module (filters) probalby more, like events. I would really like it if some regexp-guru can give me a hand with creating a single regexp that can be used drupalwide. What i want is either one API finction that handles all [1] or two API functions [2]. I am working on the weblink module, but plan to make it much better useable by drupal and its modules (an API). [1] all : check_url($text = NULL, $full_text = NULL) $text, if given, check if $text is a valid URL, return the valid url if TRUE, otherwise return FALSE $full_text, if given, check the fulltext for occurring URLS, return an array with all found URLS: $return[X] = array('text'=>$url_text, //optional text found in HTML <a>text</a> 'domain'=>$base_domain, 'local'=>$local_flag //if $base_domain == $base_url 'url'=>$validated_url) [2] two functions: check_url($text = NULL, $full_text = NULL) as above. drupal_url_regexp($type) where $type is 'url', 'mailto' or 'html' (other suggestions?) returns a string that can be used as regular expression in preg_replace to find a plain URL, mailto:url or an a href= URL. Any takers? Bèr -- | Bèr Kessels | webschuur.com | website development | | Turnhoutsebaan 34/3 | 2140 Antwerpen | België | | IM: ber@jabber.org.uk | MSN: berkessels@gmx.net | | pers: bler.webschuur.com | prof: www.webschuur.com |
On 15 Feb, 2005, at 9:13, Bèr Kessels wrote:
I would really like it if some regexp-guru can give me a hand with creating a single regexp that can be used drupalwide.
Below is some code from the _create_re() function in Textile.php in the Textile module. That code is a PHP port of Brad Choate's Perl Textile module. Some/all of this may be helpful. // a URL discovery regex. This is from Mastering Regex from O'Reilly. // Some modifications by Brad Choate <brad at bradchoate dot com> $this->urlre = '(?: # Must start out right... (?=[a-zA-Z0-9./#]) # Match the leading part (proto://hostname, or just hostname) (?: # ftp://, http://, or https:// leading part (?:ftp|https?|telnet|nntp)://(?:\w+(?::\w+)?@)?[-\w]+(?:\.\w[-\w]*)+ | (?:mailto:)?[-\+\w]+@[-\w]+(?:\.\w[-\w]*)+ | # or, try to find a hostname with our more specific sub-expression (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \. )+ # sub domains # Now ending .com, etc. For these, require lowercase (?-i: com\b | edu\b | biz\b | gov\b | in(?:t|fo)\b # .int or .info | mil\b | net\b | org\b | museum\b | aero\b | coop\b | name\b | pro\b | [a-z][a-z]\b # two-letter country codes ) )? # Allow an optional port number (?: : \d+ )? # The rest of the URL is optional, and begins with / . . . (?: /? # The rest are heuristics for what seems to work well [^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]* (?: [.!,?;:]+ [^.!,?;:"\'<>()\[\]{}\s\x7F-\xFF]+ #\'" )* )? )';
participants (2)
-
Bèr Kessels -
Jim Riggs