如何使用 XML、JDom、JNI 和 C++ 处理字符编码
我正在开发一个应用程序,它读取 XML 文档并使用 JNI 将内容传递到 C++-DLL 来验证它。
对于此任务,我使用 JDom 和 JUniversalChardet 以正确的编码解析 XML 文件。我的 C++ 接受 const char*
作为 XML 文件的内容,并且需要它采用“ISO-8895-15”编码,否则会因字符格式错误而引发异常。
我的第一种方法是使用 JDom 附带的 OutputFormatter 并告诉它在将 JDom 文档格式化为字符串时使用 Charset.forName("ISO-8859-15")
。之后,该字符串中 XML 的标头部分显示:
<?xml version="1.0" encoding="ISO-8859-15"?>
问题是它仍然存储在 Java 字符串中,因此如果我没猜错的话,它是 UTF-16。
我的本机方法如下所示:
public native String jniApiCall(String xmlFileContents);
因此,我将上述字符串从 JDom 的 OutputFormatter 传递到此 JNI 方法中。还是一切都是UTF-16,对吧?
在 JNI-C++-Method 中,我使用以下命令访问 xmlFileContents String
那么
const string xmlDataString = env->GetStringUTFChars(xmlFileContents, NULL);
,现在我得到了上述 UTF-16 或 UTF-8 格式的字符串?我的下一个问题是:如何将 std::string xmlDataString
的字符编码更改为 ISO-8859-15?或者我这样做的方式不太优雅?或者有没有办法完全用Java来完成字符编码?
感谢您的帮助! 马可
I am developing an application that reads in an XML document and passes the contents with JNI to a C++-DLL which validates it.
For this task I am using JDom and JUniversalChardet to parse the XML file in the correct encoding. My C++ accepts a const char*
for the contents of the XML file and needs it in the encoding "ISO-8895-15", otherwise it will throw an exception because of malformed characters.
My first approach was to use the shipped OutputFormatter of JDom and tell it to use Charset.forName("ISO-8859-15")
while formatting the JDom document to a String. After that the header part of the XML in this String says:
<?xml version="1.0" encoding="ISO-8859-15"?>
The Problem is that it is still stored in a Java String and therefore UTF-16 if I got that right.
My native method looks something like this:
public native String jniApiCall(String xmlFileContents);
So I pass the above mentioned String from the OutputFormatter of JDom into this JNI-Method. Still everything UTF-16, right?
In the JNI-C++-Method I access the xmlFileContents String
with
const string xmlDataString = env->GetStringUTFChars(xmlFileContents, NULL);
So, now I got my above mentioned String in UTF-16 or UTF-8? And my next question would be: how can I change the character encoding of the std::string xmlDataString
to ISO-8859-15? Or is the way I am doing this not exactly elegant? Or is there a way to do the character encoding completely in Java?
Thanks for your help!
Marco
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您始终可以使用
byte[] getBytes(Charset charset)
方法(甚至byte[] getBytes(String charsetName)将任何
)。String
转换为具有所需字符编码的字节数组)You can always convert any
String
to byte array with needed character encoding usingbyte[] getBytes(Charset charset)
method (or evenbyte[] getBytes(String charsetName)
).在java中你可以使用
myString.getBytes("ISO-8859-15")
;使用用作参数的字符编码(在本例中为ISO-8859-15
)获取字符串的字节数组。然后在 C 中使用该字节数组来获取 std::string ,如下所示:
In java you can maybe use
myString.getBytes("ISO-8859-15")
; to get the byte array of the String using the character encoding used as parameter (in this caseISO-8859-15
).And then use that byte array in
C
to get thestd::string
with something like: