如果您使用用户名作为唯一的身份识别方式,那么您需要担心的还有更多:已经提到的类似的问题,例如 Сhris(带有西里尔字母 Es С )。这些事情太多了,你无法合理处理;要么限制为 ASCII,要么采用其他方式来识别用户。 (或者不在乎,就像 SO 不关心一样;无论如何,当我可以轻松地称自己为 Chris 时,我就不需要称自己为 С-hris。)
OK, so let's assume you're doing all your string-encoding tasks right. You've not got any SQL injections, HTML injections, or places where you're not URL-encoding something you should. So we don't need to worry about characters like "<&%\ being magic in some contexts. And you're using UTF-8 for everything so all of Unicode is in play. What other reasons are there to limit usernames?
To start with, all control characters, for sanity. There is no reason to have characters U+0000 to U+001F or U+007F to U+009F in a username.
Next, deny or normalise unexpected whitespace. You may want to allow a space in a username, but you almost certainly don't want to allow leading spaces, trailing spaces, or more than one space in a row. They may render the same in HTML, but are probably a user error that will confuse.
If you intend to allow that username to be used to login through HTTP Basic Authentication, you must disallow the : character, because the Basic Auth scheme encodes a ‘username:password’ pair with no escaping if there's a colon in the username or password. So at least one of the username and password must have the colon excluded, and it's better that that's the username because restricting people's choice of passwords is a much worse thing than usernames.
For Basic Authentication you may also want to disable all non-ASCII characters, as they are handled differently by different browsers. IE encodes them using the system codepage; Firefox encodes them using ISO-8859-1; Opera encodes them using UTF-8. Users should at least be warned before choosing non-ASCII names if HTTP Auth is going to be available, as actually using them will be very unreliable.
Next consider other Unicode control sequences, things like the bidi overrides and other characters listed there are unsuitable for use in markup. Probably you are going to end up putting them in markup and you don't want someone with an RLO in their name to turn a load of the text in your page backwards.
Also, if you allow Unicode do normalisation on the strings you get. Otherwise someone may have a username with a composed o-umlaut character ö, and wonder why they can't log in on a Mac, which by default would use a separate o character followed by combining umlaut. It's usual to normalise to the composed form NFC on the web. You may also want to do compatibility decompositions by using the form NFKC; this would allow a user Chris to log in from a Japanese keyboard in fullwidth romaji mode typing Chris. These are general issues it is good to solve for all your webapp's input, but for identifiers like usernames it can be more critical to get right.
Finally, make sure the length is OK to fit in the database without a silent truncation changing the name, especially if you are storing as UTF-8 bytes which you don't want to get snipped halfway through a byte sequence. Username truncations can also be a security issue in general.
If you are using usernames as a unique means of identification, you have much more to worry about: the already-mentioned problem of lookalikes such as Сhris (with a Cyrillic Es С). There are too many of these for you to handle reasonably; either restrict to ASCII or have an additional means of identifying users. (Or don't care, like SO doesn't; when I can easily call myself Chris anyway I have no need to call myself С-hris.)
Depends on many things, for instance, if the users are going to have their own URL, you want to be careful that someone who creates the username "%41llan" doesn't clash with the user called "Allan", while allowing forward-slash may cause problems. Look out for those sorts of constraints.
I've never seen the point in adding restrictions to usernames. If your code is resistant to sql injection attacks then let them put in anything they want.
The only restriction I'd add is a max length one so that it can be stored in a DB table
Let them use any Unicode character in their username. Adding restrictions on the allowed characters will probably just annoy people using a non-ascii language.
SQL injection protection is a must, but that should probably be in your code, not in username restrictions. Certain characters should definitely be escaped, like \, %, etc.
It will on what kind of site you're running, but I think some obscene word restrictions would make your site look more professional no matter what. If someone sees that people are allowed to go around with "EXPLETIVE" as they're username, your site will look childish. Its like allowing teenagers to run rampid in your book store IMHO. You probably don't need to get much more picky than that, although its completely up to you.
This is slightly off topic, but as another piece of username advice, a great feature of any website is allowing users to change they're username over time. You can just have a number as a primary key, and allowing them to do this can save a lot of whining and people creating new accounts because they wanted to change their username. :D
发布评论
评论(4)
好的,我们假设您正确执行了所有字符串编码任务。您没有任何 SQL 注入、HTML 注入,或者没有对您应该进行 URL 编码的地方进行编码。因此,我们不需要担心像“<&%\”这样的字符在某些情况下会变得神奇。而且您对所有内容都使用 UTF-8,因此所有 Unicode 都在发挥作用。还有什么其他原因来限制用户名 首先,出于理智考虑,
没有理由在用户名中包含 U+0000 到 U+001F 或 U+007F 到 U+009F
接下来,您可能需要拒绝或规范化意外的空格。允许用户名中包含空格,但您几乎肯定不希望在一行中允许出现前导空格、尾随空格或多个空格。它们可能会在 HTML 中呈现相同的效果,但很可能是一种会造成混淆的用户错误。
如果您打算允许使用该用户名通过 HTTP 基本身份验证登录,则必须禁止使用
:
字符,因为基本身份验证方案对“用户名:密码”对进行编码,如果存在因此,用户名和密码中至少有一个必须排除冒号,而且最好是用户名,因为限制人们对密码的选择比用户名更糟糕。对于基本身份验证,您可能还需要禁用所有非 ASCII 字符,因为不同的浏览器对它们的处理方式有所不同。 IE 使用系统代码页对它们进行编码; Firefox 使用 ISO-8859-1 对它们进行编码; Opera 使用 UTF-8 对它们进行编码。如果 HTTP 身份验证可用,那么在选择非 ASCII 名称之前至少应该警告用户,因为实际使用它们将非常不可靠。
接下来考虑其他 Unicode 控制序列,例如 bidi 覆盖和列出的其他字符不适合在标记中使用。也许您最终会将它们放入标记中,并且您不希望名称中带有 RLO 的人向后翻转页面中的大量文本。
另外,如果您允许 Unicode 对您获得的字符串进行标准化。否则,有人可能拥有包含 o 元音变音字符
ö
的用户名,并想知道为什么他们无法在 Mac 上登录,默认情况下,Mac 会使用单独的o
字符后跟变音符号。通常会标准化为网络上的复合形式 NFC。您可能还想使用 NFKC 形式进行兼容性分解;这将允许用户 Chris 从日语键盘以全角罗马字模式键入 Chhris 登录。这些是一般性问题,解决所有网络应用程序的输入是很好的,但对于像用户名这样的标识符来说,正确解决可能更为重要。最后,确保长度适合数据库,而无需静默截断更改名称,特别是如果您存储为 UTF-8 字节,而您不希望在字节序列中被截断。用户名截断通常也可能是一个安全问题。
如果您使用用户名作为唯一的身份识别方式,那么您需要担心的还有更多:已经提到的类似的问题,例如
Сhris
(带有西里尔字母 EsС
)。这些事情太多了,你无法合理处理;要么限制为 ASCII,要么采用其他方式来识别用户。 (或者不在乎,就像 SO 不关心一样;无论如何,当我可以轻松地称自己为 Chris 时,我就不需要称自己为С
-hris。)OK, so let's assume you're doing all your string-encoding tasks right. You've not got any SQL injections, HTML injections, or places where you're not URL-encoding something you should. So we don't need to worry about characters like "<&%\ being magic in some contexts. And you're using UTF-8 for everything so all of Unicode is in play. What other reasons are there to limit usernames?
To start with, all control characters, for sanity. There is no reason to have characters U+0000 to U+001F or U+007F to U+009F in a username.
Next, deny or normalise unexpected whitespace. You may want to allow a space in a username, but you almost certainly don't want to allow leading spaces, trailing spaces, or more than one space in a row. They may render the same in HTML, but are probably a user error that will confuse.
If you intend to allow that username to be used to login through HTTP Basic Authentication, you must disallow the
:
character, because the Basic Auth scheme encodes a ‘username:password’ pair with no escaping if there's a colon in the username or password. So at least one of the username and password must have the colon excluded, and it's better that that's the username because restricting people's choice of passwords is a much worse thing than usernames.For Basic Authentication you may also want to disable all non-ASCII characters, as they are handled differently by different browsers. IE encodes them using the system codepage; Firefox encodes them using ISO-8859-1; Opera encodes them using UTF-8. Users should at least be warned before choosing non-ASCII names if HTTP Auth is going to be available, as actually using them will be very unreliable.
Next consider other Unicode control sequences, things like the bidi overrides and other characters listed there are unsuitable for use in markup. Probably you are going to end up putting them in markup and you don't want someone with an RLO in their name to turn a load of the text in your page backwards.
Also, if you allow Unicode do normalisation on the strings you get. Otherwise someone may have a username with a composed o-umlaut character
ö
, and wonder why they can't log in on a Mac, which by default would use a separateo
character followed by combining umlaut. It's usual to normalise to the composed form NFC on the web. You may also want to do compatibility decompositions by using the form NFKC; this would allow a user Chris to log in from a Japanese keyboard in fullwidth romaji mode typing Chris. These are general issues it is good to solve for all your webapp's input, but for identifiers like usernames it can be more critical to get right.Finally, make sure the length is OK to fit in the database without a silent truncation changing the name, especially if you are storing as UTF-8 bytes which you don't want to get snipped halfway through a byte sequence. Username truncations can also be a security issue in general.
If you are using usernames as a unique means of identification, you have much more to worry about: the already-mentioned problem of lookalikes such as
Сhris
(with a Cyrillic EsС
). There are too many of these for you to handle reasonably; either restrict to ASCII or have an additional means of identifying users. (Or don't care, like SO doesn't; when I can easily call myself Chris anyway I have no need to call myselfС
-hris.)取决于很多事情,例如,如果用户要拥有自己的 URL,那么您要小心,创建用户名“%41llan”的人不会与名为“Allan”的用户发生冲突,同时允许转发-斜线可能会导致问题。留意这些限制。
Depends on many things, for instance, if the users are going to have their own URL, you want to be careful that someone who creates the username "%41llan" doesn't clash with the user called "Allan", while allowing forward-slash may cause problems. Look out for those sorts of constraints.
我从未见过对用户名添加限制的意义。如果您的代码能够抵抗 SQL 注入攻击,那么就让他们输入他们想要的任何内容。
我要添加的唯一限制是最大长度,以便可以将其存储在数据库表中,
让他们在用户名中使用任何 Unicode 字符。
对允许的字符添加限制可能只会惹恼使用非 ASCII 语言的人。
I've never seen the point in adding restrictions to usernames. If your code is resistant to sql injection attacks then let them put in anything they want.
The only restriction I'd add is a max length one so that it can be stored in a DB table
Let them use any Unicode character in their username.
Adding restrictions on the allowed characters will probably just annoy people using a non-ascii language.
SQL 注入保护是必须的,但这可能应该在您的代码中,而不是在用户名限制中。某些字符绝对应该被转义,例如 \、% 等。
这会在您运行的网站类型上发生变化,但我认为一些淫秽的单词限制无论如何都会使您的网站看起来更专业。如果有人发现人们可以使用“脏话”作为用户名,那么您的网站就会显得幼稚。恕我直言,这就像允许青少年在你的书店里疯狂奔跑一样。您可能不需要比这更挑剔,尽管这完全取决于您。
这有点偏离主题,但作为用户名的另一条建议,任何网站的一个重要功能是允许用户随着时间的推移更改他们的用户名。您可以只使用一个数字作为主键,并且允许他们这样做可以避免很多抱怨和人们创建新帐户,因为他们想更改他们的用户名。 :D
SQL injection protection is a must, but that should probably be in your code, not in username restrictions. Certain characters should definitely be escaped, like \, %, etc.
It will on what kind of site you're running, but I think some obscene word restrictions would make your site look more professional no matter what. If someone sees that people are allowed to go around with "EXPLETIVE" as they're username, your site will look childish. Its like allowing teenagers to run rampid in your book store IMHO. You probably don't need to get much more picky than that, although its completely up to you.
This is slightly off topic, but as another piece of username advice, a great feature of any website is allowing users to change they're username over time. You can just have a number as a primary key, and allowing them to do this can save a lot of whining and people creating new accounts because they wanted to change their username. :D