我正在通过 JNI 访问 ICU4C 函数,该函数返回 UChar * (即 unicode 字符数组)....我能够通过将 UChar 数组的每个成员等同于我创建的本地 jbyte[] 数组来将其转换为 jbyteArray然后我使用 env->SetByteArrayRegion() 函数将其返回给 Java...现在我在 Java 中有 Byte[] 数组,但它几乎都是乱码...充其量是奇怪的符号...我不确定在哪里问题可能是...我正在使用 unicode 字符,如果这很重要的话...如何在 java 中正确地将 byte[] 转换为 char[] ?有些东西没有被正确映射...这是代码片段:
--- JNI 代码(稍微改变以使其更短) ---
static jint testFunction(JNIEnv* env, jclass c, jcharArray srcArray, jbyteArray destArray) {
jchar* src = env->GetCharArrayElements(srcArray, NULL);
int n = env->getArrayLength(srcArray);
UChar *testStr = new UChar[n];
jbyte destChr[n];
//calling ICU4C function here
icu_function (src, testStr); //takes source characters and returns UChar*
for (int i=0; i<n; i++)
destChr[i] = testStr[i]; //is this correct?
delete testStr;
env->SetByteArrayRegion(destArray, 0, n, destChr);
env->ReleaseCharArrayElements(srcArray, src, JNI_ABORT);
return (n); //anything for now
}
-- Java 代码 ---
string wohoo = "ABCD bal bla bla";
char[] myChars = wohoo.toCharArray();
byte[] myICUBytes = new byte[myChars.length];
int value = MyClass.testFunction (myChars, myICUBytes);
System.out.println(new String(myICUBytes)) ;// produces gibberish & weird symbols
我也尝试过: System.out.println(new String(myICUBytes, Charset.forName("UTF-16"))) ,它就像 gabberishy....
请注意,ICU 函数确实返回 UChar 中正确的 unicode 字符*...转换为 jbyteArray 和 Java 之间的某个地方是混乱的...
救命!
I am accessing an ICU4C function through JNI which returns a UChar * (i.e. unicode character array).... I was able to convert that to jbyteArray by equating each member of the UChar array to a local jbyte[] array that I created and then I returned it to Java using the env->SetByteArrayRegion() function... now I have the Byte[] array in Java but it's all gibberish pretty much.. Weird symbols at best... I am not sure where the problem might be... I am working with unicode characters if that matters... how do I convert the byte[] to a char[] in java properly? Something is not being mapped right... Here is a snippet of the code:
--- JNI code (altered slighter to make it shorter) ---
static jint testFunction(JNIEnv* env, jclass c, jcharArray srcArray, jbyteArray destArray) {
jchar* src = env->GetCharArrayElements(srcArray, NULL);
int n = env->getArrayLength(srcArray);
UChar *testStr = new UChar[n];
jbyte destChr[n];
//calling ICU4C function here
icu_function (src, testStr); //takes source characters and returns UChar*
for (int i=0; i<n; i++)
destChr[i] = testStr[i]; //is this correct?
delete testStr;
env->SetByteArrayRegion(destArray, 0, n, destChr);
env->ReleaseCharArrayElements(srcArray, src, JNI_ABORT);
return (n); //anything for now
}
-- Java code --
string wohoo = "ABCD bal bla bla";
char[] myChars = wohoo.toCharArray();
byte[] myICUBytes = new byte[myChars.length];
int value = MyClass.testFunction (myChars, myICUBytes);
System.out.println(new String(myICUBytes)) ;// produces gibberish & weird symbols
I also tried: System.out.println(new String(myICUBytes, Charset.forName("UTF-16"))) and it's just as gebberishy....
note that the ICU function does return the proper unicode characters in the UChar *... somewheres between the conversion to jbyteArray and Java that is is messing up...
Help!
发布评论
评论(2)
这看起来像是一个问题。
JNI 类型:
ICU4C类型:
因此,除了
icu_function
可能执行的任何操作之外,您还尝试将 16 位值放入 8 位宽类型中。如果必须使用 Java 字节数组,我建议通过转码为 Unicode 编码来转换为 8 位
char
类型。解释一下一些C代码 :
然后您可以使用 new String(myICUBytes, "UTF-8") 将其转码为 Java 字符串。
我使用了 UTF-8,因为它已经在我的示例代码中,并且您不必担心字节顺序。根据需要将我的 C 转换为 C++。
This looks like an issue all right.
JNI types:
ICU4C types:
So, aside from anything
icu_function
might be doing, you are trying to fit a 16-bit value into an 8-bit-wide type.If you must use a Java byte array, I suggest converting to the 8-bit
char
type by transcoding to a Unicode encoding.To paraphrase some C code:
You can then transcode this to a Java String using
new String(myICUBytes, "UTF-8")
.I used UTF-8 because it was already in my sample code and you don't have to worry about endianness. Convert my C to C++ as appropriate.
您考虑过使用 ICU4J 吗?
此外,将字节转换为字符串时,您需要指定字符编码。我对有问题的库不熟悉,所以我不能给你进一步的建议,但也许这将是“UTF-16”或类似的?
哦,还值得注意的是,您可能只是遇到显示错误,因为您要打印的终端没有使用正确的字符集和/或没有正确的可用字形。
Have you considered using ICU4J?
Also, when converting your bytes to a string, you will need to specify a character encoding. I'm not familiar with the library in question, so I can't advise you further, but perhaps this will be "UTF-16" or similar?
Oh, and it's also worth noting that you might simply be getting display errors because the terminal you're printing to isn't using the correct character set and/or doesn't have the right glyphs available.