当前位置：文江博客话题详情

将 ICU4C 字节转换为 java 字符

发布于 2024-10-19 05:36:48 字数 1556 浏览 2 评论 0 原文

我正在通过 JNI 访问 ICU4C 函数，该函数返回 UChar * （即 unicode 字符数组）....我能够通过将 UChar 数组的每个成员等同于我创建的本地 jbyte[] 数组来将其转换为 jbyteArray然后我使用 env->SetByteArrayRegion() 函数将其返回给 Java...现在我在 Java 中有 Byte[] 数组，但它几乎都是乱码...充其量是奇怪的符号...我不确定在哪里问题可能是...我正在使用 unicode 字符，如果这很重要的话...如何在 java 中正确地将 byte[] 转换为 char[] ？有些东西没有被正确映射...这是代码片段：

--- JNI 代码（稍微改变以使其更短） ---

static jint testFunction(JNIEnv* env, jclass c, jcharArray srcArray, jbyteArray destArray) {

    jchar* src = env->GetCharArrayElements(srcArray, NULL);
    int n = env->getArrayLength(srcArray);

    UChar *testStr = new UChar[n];
    jbyte destChr[n];

    //calling ICU4C function here    
    icu_function (src, testStr);   //takes source characters and returns UChar*

    for (int i=0; i<n; i++)
        destChr[i] = testStr[i];   //is this correct?

    delete testStr;
    env->SetByteArrayRegion(destArray, 0, n, destChr);
    env->ReleaseCharArrayElements(srcArray, src, JNI_ABORT);

    return (n); //anything for now
}

-- Java 代码 --- string wohoo = "ABCD bal bla bla"; char[] myChars = wohoo.toCharArray();

byte[] myICUBytes = new byte[myChars.length];
int value = MyClass.testFunction (myChars, myICUBytes);

System.out.println(new String(myICUBytes)) ;// produces gibberish & weird symbols

我也尝试过： System.out.println(new String(myICUBytes, Charset.forName("UTF-16"))) ，它就像 gabberishy....

请注意，ICU 函数确实返回 UChar 中正确的 unicode 字符*...转换为 jbyteArray 和 Java 之间的某个地方是混乱的...

救命！

原文

I am accessing an ICU4C function through JNI which returns a UChar * (i.e. unicode character array).... I was able to convert that to jbyteArray by equating each member of the UChar array to a local jbyte[] array that I created and then I returned it to Java using the env->SetByteArrayRegion() function... now I have the Byte[] array in Java but it's all gibberish pretty much.. Weird symbols at best... I am not sure where the problem might be... I am working with unicode characters if that matters... how do I convert the byte[] to a char[] in java properly? Something is not being mapped right... Here is a snippet of the code:

--- JNI code (altered slighter to make it shorter) ---

static jint testFunction(JNIEnv* env, jclass c, jcharArray srcArray, jbyteArray destArray) {

    jchar* src = env->GetCharArrayElements(srcArray, NULL);
    int n = env->getArrayLength(srcArray);

    UChar *testStr = new UChar[n];
    jbyte destChr[n];

    //calling ICU4C function here    
    icu_function (src, testStr);   //takes source characters and returns UChar*

    for (int i=0; i<n; i++)
        destChr[i] = testStr[i];   //is this correct?

    delete testStr;
    env->SetByteArrayRegion(destArray, 0, n, destChr);
    env->ReleaseCharArrayElements(srcArray, src, JNI_ABORT);

    return (n); //anything for now
}

-- Java code --
string wohoo = "ABCD bal bla bla";
char[] myChars = wohoo.toCharArray();

byte[] myICUBytes = new byte[myChars.length];
int value = MyClass.testFunction (myChars, myICUBytes);

System.out.println(new String(myICUBytes)) ;// produces gibberish & weird symbols

I also tried: System.out.println(new String(myICUBytes, Charset.forName("UTF-16"))) and it's just as gebberishy....

note that the ICU function does return the proper unicode characters in the UChar *... somewheres between the conversion to jbyteArray and Java that is is messing up...

Help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生死何惧 2024-10-26 05:36:48

destChr[i] = testStr[i];   //is this correct?

这看起来像是一个问题。

JNI 类型：

byte   jbyte    signed 8 bits
char   jchar    unsigned 16 bits

ICU4C类型：

如果是，则将 UChar 定义为 wchar_t
16 位宽；总是被假设为
未签名。

如果 wchar_t 不是 16 位宽，则
将 UChar 定义为 uint16_t 或
char16_t 因为GCC >=4.4 可以处理
UTF16 字符串文字。这使得
UChar 平台相关的定义
但允许直接字符串类型
与平台的兼容性
16 位 wchar_t 类型。

因此，除了 icu_function 可能执行的任何操作之外，您还尝试将 16 位值放入 8 位宽类型中。

如果必须使用 Java 字节数组，我建议通过转码为 Unicode 编码来转换为 8 位 char 类型。

解释一下一些C代码：

UChar *utf16 = (UChar*) malloc(len16 * sizeof(UChar));
//TODO: fill data
// convert to UTF-8
UConverter *encoding = ucnv_open("UTF-8", &status);
int len8 = ucnv_fromUChars(encoding, NULL, 0, utf16, len16, &status);
char *utf8 = (char*) malloc(len8 * sizeof(char));
ucnv_fromUChars(encoding, utf8, len8, utf16, len16, &status);
ucnv_close(encoding);
//TODO: char to jbyte

然后您可以使用 new String(myICUBytes, "UTF-8") 将其转码为 Java 字符串。

我使用了 UTF-8，因为它已经在我的示例代码中，并且您不必担心字节顺序。根据需要将我的 C 转换为 C++。

destChr[i] = testStr[i];   //is this correct?

This looks like an issue all right.

JNI types:

byte   jbyte    signed 8 bits
char   jchar    unsigned 16 bits

ICU4C types:

Define UChar to be wchar_t if that is
16 bits wide; always assumed to be
unsigned.

If wchar_t is not 16 bits wide, then
define UChar to be uint16_t or
char16_t because GCC >=4.4 can handle
UTF16 string literals. This makes the
definition of UChar platform-dependent
but allows direct string type
compatibility with platforms with
16-bit wchar_t types.

So, aside from anything icu_function might be doing, you are trying to fit a 16-bit value into an 8-bit-wide type.

If you must use a Java byte array, I suggest converting to the 8-bit char type by transcoding to a Unicode encoding.

To paraphrase some C code:

UChar *utf16 = (UChar*) malloc(len16 * sizeof(UChar));
//TODO: fill data
// convert to UTF-8
UConverter *encoding = ucnv_open("UTF-8", &status);
int len8 = ucnv_fromUChars(encoding, NULL, 0, utf16, len16, &status);
char *utf8 = (char*) malloc(len8 * sizeof(char));
ucnv_fromUChars(encoding, utf8, len8, utf16, len16, &status);
ucnv_close(encoding);
//TODO: char to jbyte

You can then transcode this to a Java String using new String(myICUBytes, "UTF-8").

I used UTF-8 because it was already in my sample code and you don't have to worry about endianness. Convert my C to C++ as appropriate.

回复收藏 0 原文

遗弃Ｍ 2024-10-26 05:36:48

您考虑过使用 ICU4J 吗？

此外，将字节转换为字符串时，您需要指定字符编码。我对有问题的库不熟悉，所以我不能给你进一步的建议，但也许这将是“UTF-16”或类似的？

哦，还值得注意的是，您可能只是遇到显示错误，因为您要打印的终端没有使用正确的字符集和/或没有正确的可用字形。

回复收藏 0 原文

~没有更多了~

关于作者

山川志

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

将 ICU4C 字节转换为 java 字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

将 ICU4C 字节转换为 java 字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。