如何在VBA中从韩语字符串中提取字符

发布于 2024-08-12 03:45:02 字数 185 浏览 16 评论 0 原文

需要从 MS-Excel 和 MS-Access 中的韩语单词中提取首字符。当我使用 Left("한글",1) 时，它将返回第一个音节，即 한，我需要的是初始字符，即 ㅎ 。有一个函数可以做到这一点吗？或者至少是一个习语？

如果您知道如何从字符串中获取 Unicode 值，我就可以从那里解决它，但我确信我会重新发明轮子。（再一次）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

成熟的代价 2024-08-19 03:45:02

免责声明：我对 Access 或 VBA 知之甚少，但您遇到的是通用 Unicode 问题，它不是特定于这些工具的。我重新标记了您的问题以添加与此问题相关的标签。

Access 通过返回 한 来执行正确的操作，它确实是该两个字符字符串的第一个字符。这里您想要的是该韩文在其组成部分 jamos 中的规范分解，也称为标准化形式 D (NFD)，意为“分解”。 NFD 形式是 ᄒ ‌ᅡ ‌ᆫ，其中第一个字符就是您想要的。

另请注意，根据您的示例，您似乎想要一个函数返回 jamo (ᄒ) 的等效朝鲜文 (ㅎ) - 实际上有两个不同的代码点，因为它们代表不同的语义单元（一个成熟的朝鲜文音节，或韩文的一部分）。从前者到后者没有预定义的映射，您可以为此编写一个小函数，因为 jamos 的数量限制为几十个（真正的工作在第一个函数 NFD 中完成）。

回复收藏 0 原文

流年里的时光 2024-08-19 03:45:02

除了亚瑟的出色回答之外，我想指出，从标准中提取韩文音节中的 jamo 是非常简单的。虽然该解决方案并非特定于 Excel 或 Access（它是一个 Python 模块），但它只涉及算术表达式，因此应该可以轻松翻译为其他语言。可以看出，这些公式与标准。分解结果以 ~~integers~~ 编码字符串的元组形式返回，可以轻松验证其是否与韩文 Jamo 代码表。

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

这是我的控制台中的输出：

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

Adding to Arthur's excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn't specific to Excel or Access (it's a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of ~~integers~~ encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

This is the output in my console:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

回复收藏 0 原文

柠檬心 2024-08-19 03:45:02

我认为你正在寻找的是一个字节数组
将 aByte() 调暗为字节
aByte="한글"
应该为您提供字符串中每个字符的两个 unicode 值

回复收藏 0 原文

清引 2024-08-19 03:45:02

我想你已经得到了你需要的东西，但它看起来相当复杂。我对此一无所知，但最近做了一些处理 Unicode 的调查，并研究了所有字符串 Byte 函数，例如 LeftB()、RightB()、InputB()、InStrB()、LenB()、AscB ()、ChrB() 和 MidB()，还有 StrConv()，它有一个 vbUnicode 参数。这些都是我认为可以在任何双字节上下文中使用的函数，但是，我不在该环境中工作，因此可能会丢失一些非常重要的东西。