如何使用 Unicode 字符与汉字/汉字组合?
我正在尝试找到一种解决方法,使用字符组合在 unicode 中显示旧的和罕见的字符。目前我正在将一些字典从 EPWING 转换为文本,其中有 36 个不同的字符无法使用普通的 UTF-8 再现。下面是我正在转换的一本字典的 epwing gaiji 到 unicode 映射的问题部分,在某些区域它有一个有趣的语法,显然用于以不同方式组合字符。我希望有人能够确定这个语法是什么,以及我在哪里可以找到有关如何使用它的文档或教程。
s/<?w=b02a>/
I'm trying to find a workaround to display old and rare characters in unicode using character combining. Currently I'm converting some dictionaries from EPWING into text and there are 36 different characters which cannot be reproduced using normal UTF-8. Below is the problem section of the epwing gaiji to unicode mappings for one of the dictionaries that I am converting, in some areas it has an interesting syntax that is clearly being used to combine characters in different ways. I was hoping if someone could identify what this syntax is, and where I might find documentation or a tutorial on how to use it.
s/<?w=b02a>/????/g
s/<?w=b04b>/者/g
s/<?w=b064>/<⾱ ????>/g
s/<?w=b077>/<彳<匕\/匕>>/g
s/<?w=b07c>/<山\/⺀>/g
s/<?w=b12e>/????/g
s/<?w=b155>/</>/g
s/<?w=b156>/<\/>/g
s/<?w=b157>/<\/\/>/g
s/<?w=b158>/<こ[1]/と|ヿ>/g
s/<?w=b16f>/<㗢>/g
s/<?w=b170>/<㗥>/g
s/<?w=b171>/ଏ/g
s/<?w=b175>/lb/g
s/<?w=b22a>//g
s/<?w=b234>/ff/g
s/<?w=b25e>/㯌/g
s/<?w=b271>/<扌 晉>/g
s/<?w=b36b>/????/g
s/<?w=b373>/????/g
s/<?w=b42c>/????/g
s/<?w=b434>/<已\/大>/g
s/<?w=b438>/????/g
s/<?w=b43a>/????/g
s/<?w=b43f>/<㇀/丶>/g
s/<?w=b440>/????/g
s/<?w=b45a>/<?>/g
s/<?w=b45b>/<|>/g
s/<?w=b53d>/<?>/g
s/<?w=b53e>/<?>/g
s/<?w=b540>/<o>/g
s/<?w=b537>/<ト モ>/g
s/<?w=b541>/<一/????>/g
s/<?w=b544>/<?>/g
s/<?w=b546>/<[r45]卐>/g
s/<?w=b55f>/*/g
I know that this line is supposed to represent 彳as a left vertical radical with one 匕 stacked on top of another 匕 as the right vertical portion of the character:
s/<?w=b077>/<彳<匕\/匕>>/g
This one is also pretty obvious, it's a 卐 rotated 45 degrees:
s/<?w=b546>/<[r45]卐>/g
Note: the four character hexadecimal codes that come after the ?w= is an identifier for the epwing gaiji that the unicode is supposed to correspond to.
Thank you for your time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请参阅Unicode 标准第 12.2 节“表意描述字符”。它讨论了您的具体情况。
不幸的是,您可能会发现几乎不存在对您正在尝试执行的操作的软件支持。
Please see The Unicode Standard section 12.2, Ideographic Description Characters. It discusses your precise situation.
Unfortunately, you may found that software support for what you are trying to do is practically non-existent.