检查两个变体中哪个是繁体,哪个是简体中文

发布于 2024-10-24 09:25:28 字数 313 浏览 5 评论 0原文

我从 Google 地图 api 得到的结果不一致,

|Head southwest on 吳江路/吴江路 toward 泰兴路/泰興路 
|Head southwest on TRAD/SIMP toward SIMP/TRAD

目前我正在使用此正则表达式 ([^\u0000-\u0080]|/)+ 匹配中文单词,

然后我分解匹配项并生成对 吴江路 vs 吴江路,去掉常用字,有没有办法区分哪个是繁体字还是简体字?

I'm getting inconsistent results from Google maps api,

|Head southwest on 吳江路/吴江路 toward 泰兴路/泰興路 
|Head southwest on TRAD/SIMP toward SIMP/TRAD

Currently I am matching Chinese words with this regex ([^\u0000-\u0080]|/)+

Then I explode the matches and have pairs 吳江路 vs 吴江路, removing the common characters, is there a way to tell which of and is the traditional or simplified character?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

著墨染雨君画夕 2024-10-31 09:25:28

您需要一个 Unicode 的传统->简化映射表。谷歌一下,你会很容易找到一个。如果找不到,则可以通过下载 Big5->GB 映射表,然后将两侧转换为 Unicode(通过 Big5->Unicode 和 GB->Unicode 映射表,这些映射表很容易获得)来制作一个)。

如果您在“简化”部分找到一个字符,那么它很可能是一个简化字符(因为传统字符映射到此)。

请注意,这不是科学方法,因为多个繁体字符可能映射到单个简化字符,并且该简化字符可能与繁体字符相同。在这种情况下,您需要决定是否将其称为传统。

例如,“后”有时会映射为简体字“后”,但它也与繁体字“queen”相同。

如果您只是映射字符对,则可以尝试查找两个方向的转换。最多你会发现一个方向上的一次转化,这就是你的答案。

You need a traditional->simplified mapping table for Unicode. Google it and you'll find one easily. If you can't find one, then you can make one by downloading a Big5->GB mapping table, then converting both sides to Unicode (via Big5->Unicode and GB->Unicode mapping tables, which are readily available).

If you find a character in the "simplified" section, then it is most likely a simplified character (since a traditional character maps to this).

Note that this is not a scientific method, as multiple traditional characters may map to a single simplified character, and that simplified character may be identical to a traditional character. In this case, you'll need to decide whether you'll call it traditional or not.

For example, 後 is sometimes mapped to 后 in simplified, but it is also identical to the traditional character for "queen".

If you are just mapping pairs of characters, you can try to find conversions in both directions. At most you'll find one conversion in one direction, and that's your answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文