I am looking to alter the main user module to
prevent more non-alphanumeric characters than drupal does by default. I added,
as you can see, a line to prevent underscores as we have a mash up with
Mediawiki and underscores cause a major headache. I did so by disallowing
unicode character 005F which is a standard underscore, and I emphasise
"standard" here: there are about 3 other underscore characters!
Is there a simple way of disallowing accented
characters which also throw a spanner in the mediawiki machinery? I suppose I
could always use the admin/access rules to do this. Maybe it's better than
further hacking core. I had little choice with underscore because underscore is
for some reason used as a wildcard character in access rules so I had to add it
here:
/**
* Verify the syntax of the given
name.
*/
function user_validate_name($name) {
if
(!strlen($name)) return t('You must enter a username.');
if
(substr($name, 0, 1) == ' ') return t('The username cannot begin with a
space.');
if (substr($name, -1) == ' ') return t('The username cannot
end with a space.');
if (strpos($name, ' ') !== FALSE) return
t('The username cannot contain multiple spaces in a row.');
if
(ereg("[^\x80-\xF7 [:alnum:]@_.-]", $name)) return t('The username contains an
illegal character.');
if
(preg_match('/[\x{80}-\x{A0}'.
// Non-printable ISO-8859-1 +
NBSP
'\x{AD}'.
//
Soft-hyphen
'\x{2000}-\x{200F}'. // Various space
characters
'\x{2028}-\x{202F}'. // Bidirectional text
overrides
'\x{205F}-\x{206F}'. // Various text hinting
characters
'\x{FEFF}'.
// Byte order
mark
'\x{005F}'.
//
Underscore
'\x{FF01}-\x{FF60}'. // Full-width
latin
'\x{FFF9}-\x{FFFD}'. // Replacement
characters
'\x{0}]/u',
// NULL
byte
$name)) {
return t('The username contains an illegal
character.');
}
if (strpos($name, '@') !== FALSE && !eregi('@([0-9a-z](-?[0-9a-z])*.)+[a-z]{2}([zmuvtg]|fo|me)?$',
$name)) return t('The username is not a valid authentication ID.');
if
(strlen($name) > USERNAME_MAX_LENGTH) return t('The username %name is too
long: it must be %max characters or less.', array('%name' => $name, '%max'
=> USERNAME_MAX_LENGTH));
}
I mentioned a similar problem to this on this
mailing list a few months ago and I had people coming back to me with ways in
which the above code was not done very well. Of particular interest was this
line:
if (ereg("[^\x80-\xF7 [:alnum:]@_.-]",
$name)) return t('The username contains an illegal character.');
As I say, this is the core user.module so this is
on every drupal site out there. What does this line do and is it indeed coded
poorly? In an ideal world, I would like to allow ONLY a-z, 0-9 and hyphens and
that's it! I may even offer a project on the forum to rewrite this part of the
module to allow only those characters. It seems better than disallowing a whole
raft of other characters - to allow only a, b and c, so to speak.
Mashing up drupal and mediawiki causes so many
headaches when it comes to allowable usernames and there's always that one tenth
of a percent who absolutely MUST HAVE their username as @@|||||||| - -
*_The__<<Big>>__Lebowski_||* - - |||||||||@@
if anyone puts that into mediawiki, our server will
go up in smoke.
Neil