如何正确实施 Unicode 密码?
添加对 Unicode 密码的支持是开发人员不应忽视的重要功能。
尽管如此,在密码中添加对 Unicode 的支持仍然是一项棘手的工作,因为相同的文本可以在 Unicode 中以不同的方式进行编码,并且您不希望因此阻止人们登录。
假设您将密码存储为 UTF-8,请注意这个问题与 Unicode 编码无关,而是与Unicode 规范化相关。
现在的问题是如何标准化 Unicode 数据?
您必须确保能够进行比较。您需要确保下一个 Unicode 标准发布时不会使您的密码验证失效。
注意:仍然有一些地方可能永远不会使用 Unicode 密码,但这个问题与 为什么或何时使用 Unicode 密码,这是关于如何以正确的方式实现它们。
第一次更新
是否可以在不使用 ICU 的情况下实现这一点,就像使用操作系统进行标准化一样?
Adding support for Unicode passwords it an important feature that should not be ignored by developers.
Still, adding support for Unicode in passwords is a tricky job because the same text can be encoded in different ways in Unicode and you don't want to prevent people from logging in because of this.
Let's say that you'll store the passwords as UTF-8, and mind that this question is not related to Unicode encodings and it's related to Unicode normalization.
Now the question is how you should normalize the Unicode data?
You have to be sure that you'll be able to compare it. You need to be sure that when the next Unicode standard will be released it will not invalidate your password verification.
Note: still there are some places where Unicode passwords will probably never be used, but this question is not about why or when to use Unicode passwords, it is about how to implement them in the proper way.
1st update
Is it possible to implement this without using ICU, like using OS for normalizing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
阅读Unicode TR 15:Unicode 规范化形式是一个好的开始。然后您意识到这是一项繁重的工作,并且容易出现奇怪的错误 - 您可能已经知道这一部分,因为您在这里提问。最后,您下载诸如 ICU 和 让它为您完成。
IIRC,这是一个多步骤的过程。首先,分解序列直到无法进一步分解 - 例如 é 将变成 e + ´。然后将序列重新排序为定义良好的顺序。最后,您可以使用 UTF-8 或类似的方式对生成的字节流进行编码。 UTF-8 字节流可以输入您选择的加密哈希算法并存储在持久存储中。当您想要检查密码是否匹配时,请执行相同的过程并将哈希算法的输出与数据库中存储的内容进行比较。
A good start is to read Unicode TR 15: Unicode Normalization Forms. Then you realize that it is a lot of work and prone to strange errors - you probably already know this part since you are asking here. Finally, you download something like ICU and let it do it for you.
IIRC, it is a multistep process. First you decompose the sequence until you cannot further decompose - for example é would become e + ´. Then you reorder the sequences into a well-defined ordering. Finally, you can encode the resulting byte stream using UTF-8 or something similar. The UTF-8 byte stream can be fed into the cryptographic hash algorithm of your choice and stored in a persistent store. When you want to check if a password matches, perform the same procedure and compare the output of the hash algorithm with what is stored in the database.
回到您的问题 - 您能解释一下为什么添加“不使用 ICU”吗?我看到很多问题询问 ICU 做得(我们*认为)做得很好,但“不使用 ICU”。只是好奇。
其次,您可能对 StringPrep/NamePrep 感兴趣,而不仅仅是规范化: StringPrep - 到映射字符串进行比较。
第三,您可能对 UTR#36 和 UTR#39 了解其他 Unicode 安全隐患。
*(披露:ICU 开发人员:)
A question back to you- can you explain why you added "without using ICU"? I see a lot of questions asking for things that ICU does (we* think) pretty well, but "without using ICU". Just curious.
Secondly, you may be interested in StringPrep/NamePrep and not just normalization: StringPrep - to map strings for comparison.
Thirdly, you may be intererested in UTR#36 and UTR#39 for other Unicode security implications.
*(disclosure: ICU developer :)