[support] preventing accented characters at registration

Neil: esl-lounge.com neil at esl-lounge.com
Wed Sep 26 10:26:22 UTC 2007


I am looking to alter the main user module to prevent more non-alphanumeric characters than drupal does by default. I added, as you can see, a line to prevent underscores as we have a mash up with Mediawiki and underscores cause a major headache. I did so by disallowing unicode character 005F which is a standard underscore, and I emphasise "standard" here: there are about 3 other underscore characters!

Is there a simple way of disallowing accented characters which also throw a spanner in the mediawiki machinery? I suppose I could always use the admin/access rules to do this. Maybe it's better than further hacking core. I had little choice with underscore because underscore is for some reason used as a wildcard character in access rules so I had to add it here:

/**
 * Verify the syntax of the given name.
 */
function user_validate_name($name) {
  if (!strlen($name)) return t('You must enter a username.');
  if (substr($name, 0, 1) == ' ') return t('The username cannot begin with a space.');
  if (substr($name, -1) == ' ') return t('The username cannot end with a space.');
  if (strpos($name, '  ') !== FALSE) return t('The username cannot contain multiple spaces in a row.');
  if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.');
  if (preg_match('/[\x{80}-\x{A0}'.          // Non-printable ISO-8859-1 + NBSP
                   '\x{AD}'.                 // Soft-hyphen
                   '\x{2000}-\x{200F}'.      // Various space characters
                   '\x{2028}-\x{202F}'.      // Bidirectional text overrides
                   '\x{205F}-\x{206F}'.      // Various text hinting characters
                   '\x{FEFF}'.               // Byte order mark
                   '\x{005F}'.               // Underscore
                   '\x{FF01}-\x{FF60}'.      // Full-width latin
                   '\x{FFF9}-\x{FFFD}'.      // Replacement characters
                   '\x{0}]/u',               // NULL byte
                   $name)) {
    return t('The username contains an illegal character.');
  }
  if (strpos($name, '@') !== FALSE && !eregi('@([0-9a-z](-?[0-9a-z])*.)+[a-z]{2}([zmuvtg]|fo|me)?$', $name)) return t('The username is not a valid authentication ID.');
  if (strlen($name) > USERNAME_MAX_LENGTH) return t('The username %name is too long: it must be %max characters or less.', array('%name' => $name, '%max' => USERNAME_MAX_LENGTH));
}

I mentioned a similar problem to this on this mailing list a few months ago and I had people coming back to me with ways in which the above code was not done very well. Of particular interest was this line:

 if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an illegal character.');

As I say, this is the core user.module so this is on every drupal site out there. What does this line do and is it indeed coded poorly? In an ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's it! I may even offer a project on the forum to rewrite this part of the module to allow only those characters. It seems better than disallowing a whole raft of other characters - to allow only a, b and c, so to speak.

Mashing up drupal and mediawiki causes so many headaches when it comes to allowable usernames and there's always that one tenth of a percent who absolutely MUST HAVE their username as @@|||||||| - - *_The__<<Big>>__Lebowski_||* - - |||||||||@@

if anyone puts that into mediawiki, our server will go up in smoke.

Neil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/support/attachments/20070926/a80594d1/attachment.htm 


More information about the support mailing list