[support] preventing accented characters at registration

Metzler, David metzlerd at evergreen.edu
Wed Sep 26 16:09:47 UTC 2007

 if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');


As I say, this is the core user.module so this is on every drupal site
out there. What does this line do and is it indeed coded poorly? In an
ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's
it! I may even offer a project on the forum to rewrite this part of the
module to allow only those characters. It seems better than disallowing
a whole raft of other characters - to allow only a, b and c, so to


Of course this is a posix regular expression match. I think it was
originally designed to filer out characters Hex 81- hex F7, as well as
@_.- and all numbers?   Kinda weird, but I think it's broken.  I made a
test page with just this code, and no matter what I passed, I couldn't
seem to get it to fire..... 


I'd recommend taking this to the devel list to find out if it's a bug,
or file an issue on drupal.org. 


If you find that you want to increase the filtering on user.module, you
ought to be able to write a module that uses hook_form_alter to alter
the way user names validation gets handled, rather than hacking code.
But given the existence of this regex code, I'd be wanting to check this
out with the drupal developers about what this code intends to do.....  




From: support-bounces at drupal.org [mailto:support-bounces at drupal.org] On
Behalf Of Neil: esl-lounge.com
Sent: Wednesday, September 26, 2007 3:26 AM
To: support at drupal.org
Subject: [support] preventing accented characters at registration


I am looking to alter the main user module to prevent more
non-alphanumeric characters than drupal does by default. I added, as you
can see, a line to prevent underscores as we have a mash up with
Mediawiki and underscores cause a major headache. I did so by
disallowing unicode character 005F which is a standard underscore, and I
emphasise "standard" here: there are about 3 other underscore


Is there a simple way of disallowing accented characters which also
throw a spanner in the mediawiki machinery? I suppose I could always use
the admin/access rules to do this. Maybe it's better than further
hacking core. I had little choice with underscore because underscore is
for some reason used as a wildcard character in access rules so I had to
add it here:


 * Verify the syntax of the given name.
function user_validate_name($name) {
  if (!strlen($name)) return t('You must enter a username.');
  if (substr($name, 0, 1) == ' ') return t('The username cannot begin
with a space.');
  if (substr($name, -1) == ' ') return t('The username cannot end with a
  if (strpos($name, '  ') !== FALSE) return t('The username cannot
contain multiple spaces in a row.');
  if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');
  if (preg_match('/[\x{80}-\x{A0}'.          // Non-printable ISO-8859-1
                   '\x{AD}'.                 // Soft-hyphen
                   '\x{2000}-\x{200F}'.      // Various space characters
                   '\x{2028}-\x{202F}'.      // Bidirectional text
                   '\x{205F}-\x{206F}'.      // Various text hinting
                   '\x{FEFF}'.               // Byte order mark
                   '\x{005F}'.               // Underscore
                   '\x{FF01}-\x{FF60}'.      // Full-width latin
                   '\x{FFF9}-\x{FFFD}'.      // Replacement characters
                   '\x{0}]/u',               // NULL byte
                   $name)) {
    return t('The username contains an illegal character.');
  if (strpos($name, '@') !== FALSE &&
|fo|me)?$'> , $name)) return t('The username is not a valid
authentication ID.');
  if (strlen($name) > USERNAME_MAX_LENGTH) return t('The username %name
is too long: it must be %max characters or less.', array('%name' =>
$name, '%max' => USERNAME_MAX_LENGTH));


I mentioned a similar problem to this on this mailing list a few months
ago and I had people coming back to me with ways in which the above code
was not done very well. Of particular interest was this line:


 if (ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username
contains an illegal character.');


As I say, this is the core user.module so this is on every drupal site
out there. What does this line do and is it indeed coded poorly? In an
ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and that's
it! I may even offer a project on the forum to rewrite this part of the
module to allow only those characters. It seems better than disallowing
a whole raft of other characters - to allow only a, b and c, so to


Mashing up drupal and mediawiki causes so many headaches when it comes
to allowable usernames and there's always that one tenth of a percent
who absolutely MUST HAVE their username as @@|||||||| - -
*_The__<<Big>>__Lebowski_||* - - |||||||||@@


if anyone puts that into mediawiki, our server will go up in smoke.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/support/attachments/20070926/fc1249ac/attachment-0001.htm 

More information about the support mailing list