I am looking to alter the main user module to prevent more non-alphanumeric characters than drupal does by default. I added, as you can see, a line to prevent underscores as we have a mash up with Mediawiki and underscores cause a major headache. I did so by disallowing unicode character 005F which is a standard underscore, and I emphasise "standard" here: there are about 3 other underscore characters!
Is there a simple way of disallowing accented characters which also throw a spanner in the mediawiki machinery? I suppose I could always use the admin/access rules to do this. Maybe it's better than further hacking core. I had little choice with underscore because underscore is for some reason used as a wildcard character in access rules so I had to add it here:
/** * Verify the syntax of the given name. */ function user_validate_name($name) { if (!strlen($name)) return t('You must enter a username.'); if (substr($name, 0, 1) == ' ') return t('The username cannot begin with a space.'); if (substr($name, -1) == ' ') return t('The username cannot end with a space.'); if (strpos($name, ' ') !== FALSE) return t('The username cannot contain multiple spaces in a row.'); if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.'); if (preg_match('/[\x{80}-\x{A0}'. // Non-printable ISO-8859-1 + NBSP '\x{AD}'. // Soft-hyphen '\x{2000}-\x{200F}'. // Various space characters '\x{2028}-\x{202F}'. // Bidirectional text overrides '\x{205F}-\x{206F}'. // Various text hinting characters '\x{FEFF}'. // Byte order mark '\x{005F}'. // Underscore '\x{FF01}-\x{FF60}'. // Full-width latin '\x{FFF9}-\x{FFFD}'. // Replacement characters '\x{0}]/u', // NULL byte $name)) { return t('The username contains an illegal character.'); } if (strpos($name, '@') !== FALSE && !eregi('@([0-9a-z](-?[0-9a-z])*.)+[a-z]{2}([zmuvtg]|fo|me)?$', $name)) return t('The username is not a valid authentication ID.'); if (strlen($name) > USERNAME_MAX_LENGTH) return t('The username %name is too long: it must be %max characters or less.', array('%name' => $name, '%max' => USERNAME_MAX_LENGTH)); }
I mentioned a similar problem to this on this mailing list a few months ago and I had people coming back to me with ways in which the above code was not done very well. Of particular interest was this line:
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.');
As I say, this is the core user.module so this is on every drupal site out there. What does this line do and is it indeed coded poorly? In an ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's it! I may even offer a project on the forum to rewrite this part of the module to allow only those characters. It seems better than disallowing a whole raft of other characters - to allow only a, b and c, so to speak.
Mashing up drupal and mediawiki causes so many headaches when it comes to allowable usernames and there's always that one tenth of a percent who absolutely MUST HAVE their username as @@|||||||| - - *_The__<<Big>>__Lebowski_||* - - |||||||||@@
if anyone puts that into mediawiki, our server will go up in smoke.
Neil
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.');
As I say, this is the core user.module so this is on every drupal site out there. What does this line do and is it indeed coded poorly? In an ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's it! I may even offer a project on the forum to rewrite this part of the module to allow only those characters. It seems better than disallowing a whole raft of other characters - to allow only a, b and c, so to speak.
Of course this is a posix regular expression match. I think it was originally designed to filer out characters Hex 81- hex F7, as well as @_.- and all numbers? Kinda weird, but I think it's broken. I made a test page with just this code, and no matter what I passed, I couldn't seem to get it to fire.....
I'd recommend taking this to the devel list to find out if it's a bug, or file an issue on drupal.org.
If you find that you want to increase the filtering on user.module, you ought to be able to write a module that uses hook_form_alter to alter the way user names validation gets handled, rather than hacking code. But given the existence of this regex code, I'd be wanting to check this out with the drupal developers about what this code intends to do.....
Dave
________________________________
From: support-bounces@drupal.org [mailto:support-bounces@drupal.org] On Behalf Of Neil: esl-lounge.com Sent: Wednesday, September 26, 2007 3:26 AM To: support@drupal.org Subject: [support] preventing accented characters at registration
I am looking to alter the main user module to prevent more non-alphanumeric characters than drupal does by default. I added, as you can see, a line to prevent underscores as we have a mash up with Mediawiki and underscores cause a major headache. I did so by disallowing unicode character 005F which is a standard underscore, and I emphasise "standard" here: there are about 3 other underscore characters!
Is there a simple way of disallowing accented characters which also throw a spanner in the mediawiki machinery? I suppose I could always use the admin/access rules to do this. Maybe it's better than further hacking core. I had little choice with underscore because underscore is for some reason used as a wildcard character in access rules so I had to add it here:
/** * Verify the syntax of the given name. */ function user_validate_name($name) { if (!strlen($name)) return t('You must enter a username.'); if (substr($name, 0, 1) == ' ') return t('The username cannot begin with a space.'); if (substr($name, -1) == ' ') return t('The username cannot end with a space.'); if (strpos($name, ' ') !== FALSE) return t('The username cannot contain multiple spaces in a row.'); if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.'); if (preg_match('/[\x{80}-\x{A0}'. // Non-printable ISO-8859-1 + NBSP '\x{AD}'. // Soft-hyphen '\x{2000}-\x{200F}'. // Various space characters '\x{2028}-\x{202F}'. // Bidirectional text overrides '\x{205F}-\x{206F}'. // Various text hinting characters '\x{FEFF}'. // Byte order mark '\x{005F}'. // Underscore '\x{FF01}-\x{FF60}'. // Full-width latin '\x{FFF9}-\x{FFFD}'. // Replacement characters '\x{0}]/u', // NULL byte $name)) { return t('The username contains an illegal character.'); } if (strpos($name, '@') !== FALSE && !eregi('@([0-9a-z](-?[0-9a-z])*.)+[a-z]{2}([zmuvtg]|fo|me)?$' mailto:'@(%5b0-9a-z%5d(-?%5b0-9a-z%5d)*.)+%5ba-z%5d%7b2%7d(%5bzmuvtg%5d |fo|me)?$' , $name)) return t('The username is not a valid authentication ID.'); if (strlen($name) > USERNAME_MAX_LENGTH) return t('The username %name is too long: it must be %max characters or less.', array('%name' => $name, '%max' => USERNAME_MAX_LENGTH)); }
I mentioned a similar problem to this on this mailing list a few months ago and I had people coming back to me with ways in which the above code was not done very well. Of particular interest was this line:
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.');
As I say, this is the core user.module so this is on every drupal site out there. What does this line do and is it indeed coded poorly? In an ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's it! I may even offer a project on the forum to rewrite this part of the module to allow only those characters. It seems better than disallowing a whole raft of other characters - to allow only a, b and c, so to speak.
Mashing up drupal and mediawiki causes so many headaches when it comes to allowable usernames and there's always that one tenth of a percent who absolutely MUST HAVE their username as @@|||||||| - - *_The__<<Big>>__Lebowski_||* - - |||||||||@@
if anyone puts that into mediawiki, our server will go up in smoke.
Neil