如何在VBA中从韩语字符串中提取字符

发布于 2024-08-12 03:45:02 字数 185 浏览 6 评论 0 原文

需要从 MS-Excel 和 MS-Access 中的韩语单词中提取首字符。 当我使用 Left("한글",1) 时,它将返回第一个音节,即 한,我需要的是初始字符,即 ㅎ 。 有一个函数可以做到这一点吗?或者至少是一个习语?

如果您知道如何从字符串中获取 Unicode 值,我就可以从那里解决它,但我确信我会重新发明轮子。 (再一次)

Need to extract the initial character from a Korean word in MS-Excel and MS-Access.
When I use Left("한글",1) it will return the first syllable i.e 한, what I need is the initial character i.e ㅎ .
Is there a function to do this? or at least an idiom?

If you know how to get the Unicode value from the String I'd be able to work it out from there but I'm sure I'd be reinventing the wheel. (yet again)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

成熟的代价 2024-08-19 03:45:02

免责声明:我对 Access 或 VBA 知之甚少,但您遇到的是通用 Unicode 问题,它不是特定于这些工具的。我重新标记了您的问题以添加与此问题相关的标签。

Access 通过返回 한 来执行正确的操作,它确实是该两个字符字符串的第一个字符。这里您想要的是该韩文在其组成部分 jamos 中的规范分解,也称为标准化形式 D (NFD),意为“分解”。 NFD 形式是 ᄒ ‌ᅡ ‌ᆫ,其中第一个字符就是您想要的。

另请注意,根据您的示例,您似乎想要一个函数返回 jamo (ᄒ) 的等效朝鲜文 (ㅎ) - 实际上有两个不同的代码点,因为它们代表不同的语义单元(一个成熟的朝鲜文音节,或韩文的一部分)。从前者到后者没有预定义的映射,您可以为此编写一个小函数,因为 jamos 的数量限制为几十个(真正的工作在第一个函数 NFD 中完成)。

Disclaimer: I know little about Access or VBA, but what you're having is a generic Unicode problem, it's not specific to those tools. I retagged your question to add tags related to this issue.

Access is doing the right thing by returning 한, it is indeed the first character of that two-character string. What you want here is the canonical decomposition of this hangul in its constituent jamos, also known as Normalization Form D (NFD), for “decomposed”. The NFD form is ᄒ ‌ᅡ ‌ᆫ, of which the first character is what you want.

Note also that as per your example, you seem to want a function to return the equivalent hangul (ㅎ) for the jamo (ᄒ) – there really are two different code points because they represent different semantic units (a full-fledged hangul syllable, or a part of a hangul). There is no pre-defined mapping from the former to the latter, you could write a small function to that effect, as the number of jamos is limited to a few dozens (the real work is done in the first function, NFD).

流年里的时光 2024-08-19 03:45:02

除了亚瑟的出色回答之外,我想指出,从标准中提取韩文音节中的 jamo 是非常简单的。虽然该解决方案并非特定于 Excel 或 Access(它是一个 Python 模块),但它只涉及算术表达式,因此应该可以轻松翻译为其他语言。可以看出,这些公式与标准。分解结果以 integers 编码字符串的元组形式返回,可以轻松验证其是否与 韩文 Jamo 代码表

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

这是我的控制台中的输出:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ

Adding to Arthur's excellent answer, I want to point out that extracting jamo from hangeul syllables is very straightforward from the standard. While the solution isn't specific to Excel or Access (it's a Python module), it only involves arithmetic expressions so it should be easily translated to other languages. The formulas, as can be seen, are identical to those in page 109 of the standard. The decomposition is returned as a tuple of integers encoded strings, which can be easily verified to correspond to the Hangul Jamo Code Chart.

# -*- encoding: utf-8 -*-

SBase = 0xAC00
LBase = 0x1100
VBase = 0x1161
TBase = 0x11A7
SCount = 11172
LCount = 19
VCount = 21
TCount = 28
NCount = VCount * TCount


def decompose(syllable):
    global SBase, LBase, VBase, TBase, SCount, LCount, VCount, TCount, NCount

    S = ord(syllable)
    SIndex = S - SBase
    L = LBase + SIndex / NCount
    V = VBase + (SIndex % NCount) / TCount
    T = TBase + SIndex % TCount

    if T == TBase:
        result = (L,V)
    else:
        result = (L,V,T)

    return tuple(map(unichr, result))

if __name__ == '__main__':
    test_values = u'항가있닭넓짧'

    for syllable in test_values:
        print syllable, ':',
        for s in decompose(syllable): print s,
        print

This is the output in my console:

항 : ᄒ ᅡ ᆼ
가 : ᄀ ᅡ
있 : ᄋ ᅵ ᆻ
닭 : ᄃ ᅡ ᆰ
넓 : ᄂ ᅥ ᆲ
짧 : ᄍ ᅡ ᆲ
柠檬心 2024-08-19 03:45:02

我认为你正在寻找的是一个字节数组
将 aByte() 调暗为字节
aByte="한글"
应该为您提供字符串中每个字符的两个 unicode 值

I think what you are looking for is a Byte Array
Dim aByte() as byte
aByte="한글"
should give you the two unicode values for each character in the string

清引 2024-08-19 03:45:02

我想你已经得到了你需要的东西,但它看起来相当复杂。我对此一无所知,但最近做了一些处理 Unicode 的调查,并研究了所有字符串 Byte 函数,例如 LeftB()、RightB()、InputB()、InStrB()、LenB()、AscB ()、ChrB() 和 MidB(),还有 StrConv(),它有一个 vbUnicode 参数。这些都是我认为可以在任何双字节上下文中使用的函数,但是,我不在该环境中工作,因此可能会丢失一些非常重要的东西。

I assume you got what you needed, but it seems rather convoluted. I don't know anything about this, but recently did some investigating of handling Unicode, and looked into all the string Byte functions, such as LeftB(), RightB(), InputB(), InStrB(), LenB(), AscB(), ChrB() and MidB(), and there's also StrConv(), which has a vbUnicode argument. These are all functions that I'd think would be used in any double-byte context, but then, I don't work in that environment so might be missing something very important.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文