检查两个变体中哪个是繁体,哪个是简体中文
我从 Google 地图 api 得到的结果不一致,
|Head southwest on 吳江路/吴江路 toward 泰兴路/泰興路
|Head southwest on TRAD/SIMP toward SIMP/TRAD
目前我正在使用此正则表达式 ([^\u0000-\u0080]|/)+
匹配中文单词,
然后我分解匹配项并生成对 吴江路 vs 吴江路
,去掉常用字,有没有办法区分吴
和吴
哪个是繁体字还是简体字?
I'm getting inconsistent results from Google maps api,
|Head southwest on 吳江路/吴江路 toward 泰兴路/泰興路
|Head southwest on TRAD/SIMP toward SIMP/TRAD
Currently I am matching Chinese words with this regex ([^\u0000-\u0080]|/)+
Then I explode the matches and have pairs 吳江路 vs 吴江路
, removing the common characters, is there a way to tell which of 吳
and 吴
is the traditional or simplified character?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要一个 Unicode 的传统->简化映射表。谷歌一下,你会很容易找到一个。如果找不到,则可以通过下载 Big5->GB 映射表,然后将两侧转换为 Unicode(通过 Big5->Unicode 和 GB->Unicode 映射表,这些映射表很容易获得)来制作一个)。
如果您在“简化”部分找到一个字符,那么它很可能是一个简化字符(因为传统字符映射到此)。
请注意,这不是科学方法,因为多个繁体字符可能映射到单个简化字符,并且该简化字符可能与繁体字符相同。在这种情况下,您需要决定是否将其称为传统。
例如,“后”有时会映射为简体字“后”,但它也与繁体字“queen”相同。
如果您只是映射字符对,则可以尝试查找两个方向的转换。最多你会发现一个方向上的一次转化,这就是你的答案。
You need a traditional->simplified mapping table for Unicode. Google it and you'll find one easily. If you can't find one, then you can make one by downloading a Big5->GB mapping table, then converting both sides to Unicode (via Big5->Unicode and GB->Unicode mapping tables, which are readily available).
If you find a character in the "simplified" section, then it is most likely a simplified character (since a traditional character maps to this).
Note that this is not a scientific method, as multiple traditional characters may map to a single simplified character, and that simplified character may be identical to a traditional character. In this case, you'll need to decide whether you'll call it traditional or not.
For example, 後 is sometimes mapped to 后 in simplified, but it is also identical to the traditional character for "queen".
If you are just mapping pairs of characters, you can try to find conversions in both directions. At most you'll find one conversion in one direction, and that's your answer.