在 Unicode 中查找相似的 ASCII 字符
有人知道在 Unicode 中查找与 ASCII 字符相似的字符的简单方法吗?例如“西里尔小写字母 DZE (ѕ)” 。我想搜索并替换相似的字符。我所说的“相似”是指人类可读的。光看它你是看不出有什么区别的。
Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如其他评论者所指出的,Unicode 规范化(“兼容字符”)不会在这里帮助您因为您不是在寻找官方的等效项,而是在寻找字形(字母形状)的相似之处。 (不过,链接的 Unicode 技术报告仍然值得一读,因为它写得非常好。)
如果我是您,为了免去您自己组装字符列表的繁琐工作,我会在 同形异义词攻击:这是一种恶意误导网络用户的方法,通过显示包含某些字母已被篡改的域名的 URL 来恶意误导网络用户。替换为视觉上相似的字母。另一份关于安全性的 Unicode 技术报告包含有关该问题的部分。还有——这可能是你最需要的——一个“confusables”表< /a>.这是另一篇主要包含标点符号的文章,其中一些是 ASCII,在 非 ASCII 代码表。
我所希望的是你问的问题不是为了构建这样的攻击。
As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes). (The linked Unicode Technical Report is still worth reading, though, as it is extremely well written.)
If I were you, to spare you the tedious work of assembling a list of characters yourself, I'd search for resources on homograph attacks: This is a method of maliciously misleading web users by displaying URLs containing domain names in which some letters have been replaced with visually similar letters. Another Unicode Technical Report, on security, contains a section on the problem. There is also -- and that may be what you most need -- a "confusables" table. Here's another article with mainly punctuation marks, some of which ASCII, that have visually similar counterparts in the non-ASCII code tables.
What I do hope is that you aren't asking the question to construct such an attack.
请参阅 Unicode 数据库: http://www.unicode.org/Public/UNIDATA/UnicodeData .txt。
每行描述一个 unicode 字符,例如:
如果该符号有任何相似(兼容)字符,它将出现在条目的
字段中。在此示例中,0061
(ASCIIa
) 与LATIN SMALL LETTER A WITH RIGHT HALF RING
Unicode 字符兼容。至于你的角色,正如
你所看到的,该条目没有指定兼容性字符。
See the Unicode Database: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.
Each line describes a unicode caharacter, for example:
If there's any similar (compatible) characters for that symbol, it will appear in the
<compat>
field of the entry. In this example,0061
(ASCIIa
) is compatible to theLATIN SMALL LETTER A WITH RIGHT HALF RING
Unicode character.As for your character, the entry is
which, as you can see, does not specify a compatibility character.