在 Unicode 中查找相似的 ASCII 字符

发布于 2024-09-12 23:46:59 字数 201 浏览 17 评论 0原文

有人知道在 Unicode 中查找与 ASCII 字符相似的字符的简单方法吗?例如“西里尔小写字母 DZE (ѕ)” 。我想搜索并替换相似的字符。我所说的“相似”是指人类可读的。光看它你是看不出有什么区别的。

Does someone know a easy way to find characters in Unicode that are similar to ASCII characters. An example is the "CYRILLIC SMALL LETTER DZE (ѕ)". I'd like to do a search and replace for similar characters. By similar I mean human readable. You can't see a difference by looking at it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

孤独难免 2024-09-19 23:46:59

正如其他评论者所指出的,Unicode 规范化(“兼容字符”)不会在这里帮助您因为您不是在寻找官方的等效项,而是在寻找字形(字母形状)的相似之处。 (不过,链接的 Unicode 技术报告仍然值得一读,因为它写得非常好。)

如果我是您,为了免去您自己组装字符列表的繁琐工作,我会在 同形异义词攻击:这是一种恶意误导网络用户的方法,通过显示包含某些字母已被篡改的域名的 URL 来恶意误导网络用户。替换为视觉上相似的字母。另一份关于安全性的 Unicode 技术报告包含有关该问题的部分。还有——这可能是你最需要的——一个“confusables”表< /a>.这是另一篇主要包含标点符号的文章,其中一些是 ASCII,在 非 ASCII 代码表

我所希望的是你问的问题不是为了构建这样的攻击。

As noted by other commenters, Unicode normalisation ("compatibilty characters") isn't going to help you here as you aren't looking for official equivalences but for similarities in glyphs (letter shapes). (The linked Unicode Technical Report is still worth reading, though, as it is extremely well written.)

If I were you, to spare you the tedious work of assembling a list of characters yourself, I'd search for resources on homograph attacks: This is a method of maliciously misleading web users by displaying URLs containing domain names in which some letters have been replaced with visually similar letters. Another Unicode Technical Report, on security, contains a section on the problem. There is also -- and that may be what you most need -- a "confusables" table. Here's another article with mainly punctuation marks, some of which ASCII, that have visually similar counterparts in the non-ASCII code tables.

What I do hope is that you aren't asking the question to construct such an attack.

醉态萌生 2024-09-19 23:46:59

请参阅 Unicode 数据库: http://www.unicode.org/Public/UNIDATA/UnicodeData .txt

每行描述一个 unicode 字符,例如:

1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;;

如果该符号有任何相似(兼容)字符,它将出现在条目的 字段中。在此示例中,0061 (ASCII a) 与 LATIN SMALL LETTER A WITH RIGHT HALF RING Unicode 字符兼容。

至于你的角色,正如

0455;CYRILLIC SMALL LETTER DZE;Ll;0;L;;;;;N;;;0405;;0405

你所看到的,该条目没有指定兼容性字符。

See the Unicode Database: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt.

Each line describes a unicode caharacter, for example:

1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;;

If there's any similar (compatible) characters for that symbol, it will appear in the <compat> field of the entry. In this example, 0061 (ASCII a) is compatible to the LATIN SMALL LETTER A WITH RIGHT HALF RING Unicode character.

As for your character, the entry is

0455;CYRILLIC SMALL LETTER DZE;Ll;0;L;;;;;N;;;0405;;0405

which, as you can see, does not specify a compatibility character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文