Java - 通过 Web 服务和 XML 发送可能包含非法字符的 UTF-8 字符串
我有一个用 Java 编写的 Web 服务。我想以 XML 文件的形式发送一些字符串。但这些字符串可能包含一些在 XML 中被识别为非法的字符。目前,我将它们全部替换为 ?,创建 XML 并通过网络发送它(到 Silverlight 应用程序)。但有时我得到的只是问号!因此,我想在发送这些字符串之前和之后以某种方式对这些字符串进行编码/解码,以获取确切的字符串。这些字符串采用 UTF-8 编码。我正在使用类似的东西来创建 XML:
try{
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
//root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("SearchResults");
rootElement.setAttribute("count", Integer.toString(total));
doc.appendChild(rootElement);
for(int i = 0; i < results.size(); i++)
{
Result res = results.get(i);
//title
Element title = doc.createElement("Title");
title.appendChild(doc.createTextNode(res.title));
searchRes.appendChild(title);
//...
}
//write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
transformer.transform(source, result);
String ret = sw.toString();
return ret;
}
catch(ParserConfigurationException pce){
pce.printStackTrace();
}catch(TransformerException tfe){
tfe.printStackTrace();
}
return null;
谢谢。
附: 有些人说他们不明白我的问题,所以也许我没有说对,所以我尝试用一个例子来澄清。 假设我有一个项目数组。
每个项目有 3 个字符串。
这些字符串是 UTF-8 字符串(来自多种语言)。
我想通过 Java 中的 Web 服务将这些字符串发送到客户端。
客户端部分是Silverlight。在 Silverlight 应用程序中,
我获取 XML、解析它并使用 LinQ 从中提取数据,然后在我的 Silverlight 应用程序中使用该数据。
当我尝试转义字符时,Silverlight中的解析器会抛出一个异常,说在调试后源字符串(XML字符串)中存在非法字符,我发现实际上存在非法字符,但我不知道如何生成有保证的合法 XML 字符串。
编辑: 感谢大家的支持。我真的很感激。
我解决了我的问题。
结果在我的代码中的某个地方我生成了一个非法字符并将其附加到我的结果字符串中。
问题仍然存在(即使我提供了一些非法字符,我怎样才能生成合法的 XML 文件 - 请注意,我通过在生成 XML 之前消除非法字符解决了问题,所以我仍然想知道如果我想以某种方式发送怎么办?结束了吗?)但既然我的问题已经解决了,而且这里有大量的答案,我想未来的读者已经开始了面对这个问题的旅程。
我没有时间,但我相信这些会有帮助。
有很多答案和帮助,因此我无法选择其中之一作为我的具体答案。
但我必须选择其中之一。
我真诚地感谢所有的回复。
I have a Web Service written in Java. I want to send some strings in the form of a XML file. But these strings may contain some characters that are recognized as illegal in XML. Currently I replace all of them with ?, create the XML and send it over the network (to the Silverlight app). But sometimes all I get are question marks! So I want to somehow encode/decode these strings before and after I send them to get the exact strings. These strings are in UTF-8 encoding. I'm using something like this to create the XML:
try{
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
//root elements
Document doc = docBuilder.newDocument();
Element rootElement = doc.createElement("SearchResults");
rootElement.setAttribute("count", Integer.toString(total));
doc.appendChild(rootElement);
for(int i = 0; i < results.size(); i++)
{
Result res = results.get(i);
//title
Element title = doc.createElement("Title");
title.appendChild(doc.createTextNode(res.title));
searchRes.appendChild(title);
//...
}
//write the content into xml file
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
transformer.transform(source, result);
String ret = sw.toString();
return ret;
}
catch(ParserConfigurationException pce){
pce.printStackTrace();
}catch(TransformerException tfe){
tfe.printStackTrace();
}
return null;
Thank you.
PS:
Some people said that they didn't understand my question so maybe I didn't say it right so I try to clarify it with an example.
Suppose I have an array of items.
Each item has 3 strings.
These strings are UTF-8 strings (from many languages).
I want to send these strings to the client via a Web Service in Java.
The client part is Silverlight. In the Silverlight app,
I get the XML, parse it and use LinQ to extract data from it and I use that data in my Silverlight app.
When I try to escape the characters, somehow the parser in the Silverlight throws an exception saying that there's an illegal character in the source string (XML string) after debugging I found out that actually there IS an illegal character but I don't know how to produce a guaranteed legal XML string.
Edit:
Thank you all for your support. I REALLY appreciate it.
I solved my problem.
Turns out somewhere in my code I was producing an illegal character and appending it to my result string.
The question still remains (How can I produce a legal XML file even though I'm providing it some illegal characters - note that I solved the problem by eliminating the illegal character before producing the XML so I still wonder what if I wanted to somehow send it over?) but since my problem is solved and there's tons of answers here, I guess the future readers have a head start to begin the journey to face this problem.
I didn't have the time but I'm sure these will help.
There's lots of answers and helps so I cannot select one of them to be my specific answer.
But I have to choose one of them.
I sincerely thank all of the responses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您在 XML 中发送非字符数据(例如二进制数据),则可以使用 Base64 对它们进行编码。但我不确定我是否正确理解了你的问题。
也许您只是忘记使用
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8")
将 XML 编码为 UTF-8If you're sending non-character data (i.e. binary data for example) in your XML, you might encode them using Base64. But I'm not sure I've understood your question correctly.
Maybe you just forgot to encode your XML in UTF-8, using
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8")
不确定我是否理解您的问题,但也许您应该将数据包装在
CDATA
标记下,这样 XML 解析器就不会解析它。以下是来自 MSDN 的文档。Not sure I understand your question, but maybe you should wrap the data under
CDATA
tag so that its not parsed by the XML parser. Here is the documentation from MSDN.用
和
]]>
包裹内容。更多信息请参见:http://www.w3schools.com/xml/xml_cdata.asp
Wrap the content with
<![CDATA[
and]]>
.More info here: http://www.w3schools.com/xml/xml_cdata.asp
根据经验,我建议转义/取消转义 XML。
看看 StringEscapeUtils 来自 < a href="http://commons.apache.org/lang/" rel="nofollow">Apache Commons Lang。
By experience I would recommend escaping / unescaping XML.
Take at look at StringEscapeUtils from Apache Commons Lang.
你应该尝试 apache 的 StringEscapeUtils
You should try the StringEscapeUtils from apache