哪些 Unicode 字符是危险的?
哪些 Unicode 字符(更准确地说是代码点)是危险的,应该列入黑名单并禁止用户使用? 我知道 BIDI 覆盖字符和“零宽度空格”很容易出现问题,但是还有其他什么问题吗?
谢谢
What Unicode characters (more precisely codepoints) are dangerous and should be blacklisted and prohibited for the users to use?
I know that BIDI override characters and the "zero width space" are very prone to make problems, but what others are there?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
角色并不危险:只有不恰当的使用它们才是危险的。
您可能会考虑阅读以下内容:
不可能猜测您所说的危险是什么意思。
Characters aren’t dangerous: only inappropriate uses of them are.
You might consider reading things like:
It is impossible to guess what you mean by dangerous.
安全性的黄金法则是白名单而不是黑名单,而不是试图覆盖所有坏字符,最好是在确保用户只使用已知的好字符的基础上进行验证。
有一些解决方案可以帮助您构建国际白名单所需的大型白名单。例如,在 .NET 中,有
UnicodeCategory
。这个想法是,图书馆不是将数千个单独的字符列入白名单,而是将它们分配到字母数字字符、标点符号、控制字符等类别中。
白名单教程.NET 中的国际字符
Unicode 正则表达式:类别
A Golden Rule in security is to
whitelist
instead of blacklist, instead of trying to cover all bad characters, it is a much better idea to validate based on ensuring the user only use known good characters.There are solutions that help you build the large whitelist that is required for international whitelisting. For example, in .NET there is
UnicodeCategory
.The idea is that instead of whitelisting thousands of individual characters, the library assigns them into categories like alphanumeric characters, punctuations, control characters, and such.
Tutorial on whitelisting international characters in .NET
Unicode Regex: Categories
'HANGUL FILLER' (U+3164)
自 1993 年 Unicode 1.1 起,就出现了一个空的宽零空格字符。
我们看不到它,也不能单独复制/粘贴它,因为我们无法选择它!
它需要通过 unix 键盘快捷键生成:
CTRL
+SHIFT
+u
+ 3164It 几乎可以
'HANGUL FILLER' (U+3164)
Since Unicode 1.1 in 1993, there is an empty wide, zero space character.
We can't see it, neither copy/paste it alone because we can't select it!
It need to be generated, by the unix keyboard shortcut:
CTRL
+SHIFT
+u
+ 3164It can pretty much ???? up anything: variables, function name, url, file names, mimic DNS, invalidate hash strings, database entries, blog posts, logins, allow to fake identical accounts, etc.
DEMO 1: Altering variables
The variable hijacked contains a Hangul Filler char, the console log call the variable without the char:
DEMO 2: Hijack URL's
Those 3 url will lead to
xn--stackoverflow-fr16ea.com
:https://stackㅤㅤoverflow.com
请参阅Unicode 安全注意事项报告。
它涵盖了各个方面,从渲染字符串的欺骗到以不安全语言处理 UTF 编码的危险。
See Unicode Security Considerations Report.
It covers various aspects, from spoofing of rendered strings to dangers of processing UTF encodings in unsafe languages.
U+2800 盲文图案空白 - 不带任何“点”的盲文字符。它看起来像一个常规的“空间”,但不属于其中。
U+2800 BRAILLE PATTERN BLANK - a Braille character without any "dots". It looks like a regular "space" but is not classified as one.