这是一个错误(Windows API)吗?

发布于 2024-09-07 13:54:28 字数 1283 浏览 8 评论 0原文

我有一个关于字符串规范化的问题,它已经得到解答,但问题是,我无法正确规范需要 3 次按键的韩语字符
输入“ㅁㅜㄷ”(来自击键“ane”),会输出“무ㄷ”而不是“묻”。
输入“ㅌㅐㅇ”(来自击键“xod”),会输出“태ㅇ”而不是“탱”。

这是迪恩先生的答案,虽然它适用于我最初给出的例子……但它不适用于我上面引用的例子。

如果您使用的是 .NET,则以下内容将起作用:

var s = "ㅌㅐㅇ";
s = s.Normalize(NormalizationForm.FormKC);

在本机 Win32 中,相应的调用为 NormalizeString

wchar_t *input = "ㅌㅐㅇ";
wchar_t output[100];
NormalizeString(NormalizationKC, input, -1, output, 100);

NormalizeString 仅在 Windows Vista+ 中可用。 您需要“Microsoft 国际化”如果您想在 XP 上使用它,请安装“域名 (IDN) 缓解 API”(为什么它在 IDN 下载中,我不明白...)

请注意,这两种方法实际上都不需要使用 IME - 无论您是否安装了韩语输入法,它们都可以工作。

这是我在delphi(使用XP)中使用的代码:

      var  buf: array [0..20] of char;
      temporary: PWideChar;
      const NORMALIZATIONKC=5;
      ...
      temporary:='ㅌㅐㅇ';
      NormalizeString(NORMALIZATIONKC , temporary, -1, buf, 20);
      showmessage(buf);

这是一个错误吗?我的代码中有什么不正确的地方吗? 代码在您的计算机上运行正确吗?用什么语言?您使用的是哪个 Windows 版本?

I had a question about string normalization and it was already answered, but the problem is, I cannot correctly normalize korean characters that require 3 keystrokes
With the input "ㅁㅜㄷ"(from keystrokes "ane"), it comes out "무ㄷ" instead of "묻".
With the input "ㅌㅐㅇ"(from keystrokes "xod"), it comes out "태ㅇ" instead of "탱".

This is Mr. Dean's answer and while it worked on the example I gave at first...it doesn't work with the one's I cited above.

If you are using .NET, the following will work:

var s = "ㅌㅐㅇ";
s = s.Normalize(NormalizationForm.FormKC);

In native Win32, the corresponding call is NormalizeString:

wchar_t *input = "ㅌㅐㅇ";
wchar_t output[100];
NormalizeString(NormalizationKC, input, -1, output, 100);

NormalizeString is only available in Windows Vista+. You need the "Microsoft Internationalized Domain Name (IDN) Mitigation APIs" installed if you want to use it on XP (why it's in the IDN download, I don't understand...)

Note that neither of these methods actually requires use of the IME - they work regardless of whether you've got the Korean IME installed or not.

This is the code I'm using in delphi (with XP):

      var  buf: array [0..20] of char;
      temporary: PWideChar;
      const NORMALIZATIONKC=5;
      ...
      temporary:='ㅌㅐㅇ';
      NormalizeString(NORMALIZATIONKC , temporary, -1, buf, 20);
      showmessage(buf);

Is this a bug? Is there something incorrect in my code?
Does the code run correctly on your computer? In what language? What windows version are you using?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

木緿 2024-09-14 13:54:28

您正在使用的 Jamo (ㅌㅐㅇ) 位于名为 韩文兼容性 Jamo,由于遗留代码页而存在。如果你要获取你的目标字符并分解它(使用 NFKD),你会从块 Hangul Jamoᄐ ᅢ ᆼ,没有空格,这些空格只是为了防止浏览器对其进行规范化),并且这些可以很好地重新组合。

Unicode 5.2 规定:

当朝鲜文兼容性 jamo 是
具有兼容性的改造
标准化形式,NFKD 或 NFKC,
字符被转换为
相应的连接jamo
人物。

(...)

表12-11
说明了两个韩文如何
兼容性 jamo 可以分离为
显示,即使在转换它们之后
NFKD 或 NFKC。

这表明 NFKC 应该通过将它们视为常规 Jamo 来正确组合它们,但 Windows 似乎并没有这样做。然而,使用 NFKD 似乎确实可以将它们转换为普通的 Jamo,然后您可以在其上运行 NFKC 以获得正确的字符。

由于这些字符似乎来自外部程序(IME),我建议您要么执行手动传递来转换这些兼容性 Jamo,要么先执行 NFKD,然后执行 NFKC。或者,您可以重新配置 IME 以输出“正常”Jamo,而不是兼容性 Jamo。

The jamo you're using (ㅌㅐㅇ)are in the block called Hangul Compatibility Jamo, which is present due to legacy code pages. If you were to take your target character and decompose it (using NFKD), you get jamo from the block Hangul Jamo (ᄐ ᅢ ᆼ, sans the spaces, which are just there to prevent the browser from normalizing it), and these can be re-composed just fine.

Unicode 5.2 states:

When Hangul compatibility jamo are
transformed with a compatibility
normalization form, NFKD or NFKC, the
characters are converted to the
corresponding conjoining jamo
characters.

(...)

Table 12-11
illustrates how two Hangul
compatibility jamo can be separated in
display, even after transforming them
with NFKD or NFKC.

This suggests that NFKC should combine them correctly by treating them as regular Jamo, but Windows doesn't appear to be doing that. However, using NFKD does appear to convert them to the normal Jamo, and you can then run NFKC on it to get the right character.

Since those characters appear to come from an external program (the IME), I would suggest you either do a manual pass to convert those compatibility Jamo, or start by doing NFKD, then NFKC. Alternatively, you may be able to reconfigure the IME to output "normal" Jamo instead of comaptibility Jamo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文