为什么 DataOutputStream.writeUTF() 在开头添加额外的 2 个字节?
当我尝试使用 sax 通过套接字解析 xml 时,我遇到了一个奇怪的现象。 经过分析,我注意到 DataOutputStream 在我的数据前面添加了 2 个字节。
通过 DataOutputStream 发送消息:
0020 50 18 00 20 0f df 00 00 00 9d 3c 3f 78 6d 6c 20 P.. .... ..<?xml
0030 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 3f 3e 3c version= "1.0"?><
0040 63 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c company> <staff><
0050 66 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f firstnam e>yong</
0060 66 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e firstnam e><lastn
0070 61 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 ame>mook kim</la
0080 73 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 stname>< nickname
0090 3e c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 >..</nic kname><s
00a0 61 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 alary>10 0000</sa
00b0 6c 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 lary></s taff></c
00c0 6f 6d 70 61 6e 79 3e ompany>
使用 Transformer 发送消息:
0020 50 18 00 20 b6 b1 00 00 3c 3f 78 6d 6c 20 76 65 P.. .... <?xml ve
0030 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f rsion="1 .0" enco
0040 64 69 6e 67 3d 22 75 74 66 2d 38 22 3f 3e 3c 63 ding="ut f-8"?><c
0050 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c 66 ompany>< staff><f
0060 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f 66 irstname >yong</f
0070 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e 61 irstname ><lastna
0080 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 73 me>mook kim</las
0090 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 3e tname><n ickname>
00a0 c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 61 ..</nick name><sa
00b0 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 6c lary>100 000</sal
00c0 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 6f ary></st aff></co
00d0 6d 70 61 6e 79 3e mpany>
人们可能会注意到 DataOutputStream 在消息前面添加了两个字节。因此,sax 解析器会抛出异常“org.xml.sax.SAXParseException:序言中不允许内容。”。然而,当我跳过这 2 个字节时,sax 解析器工作得很好。 另外我注意到 DataInputStream 无法读取 Transformer 消息。
我的问题是:为什么 DataOutputStream 添加这些字节而 Transformer 不添加?
对于那些有兴趣复制问题的人,这里有一些代码:
使用 DataInputStream 的服务器:
String data = "<?xml version=\"1.0\"?><company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>§</nickname><salary>100000</salary></staff></company>";
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.writeUTF(data);
os.close();
socket.close();
使用 Transformer 的服务器:
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
Document doc = createDocument();
printXML(doc, os);
os.close();
socket.close();
public synchronized static void printXML(Document document, OutputStream stream) throws TransformerException
{
DOMSource domSource = new DOMSource(document);
StreamResult streamResult = new StreamResult(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "no");
serializer.transform(domSource, streamResult);
}
private static Document createDocument() throws ParserConfigurationException
{
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element company = document.createElement("company");
Element staff = document.createElement("staff");
Element firstname = document.createElement("firstname");
Element lastname = document.createElement("lastname");
Element nickname = document.createElement("nickname");
Element salary = document.createElement("salary");
Text firstnameText = document.createTextNode("yong");
Text lastnameText = document.createTextNode("mook kim");
Text nicknameText = document.createTextNode("§");
Text salaryText = document.createTextNode("100000");
document.appendChild(company);
company.appendChild(staff);
staff.appendChild(firstname);
staff.appendChild(lastname);
staff.appendChild(nickname);
staff.appendChild(salary);
firstname.appendChild(firstnameText);
lastname.appendChild(lastnameText);
nickname.appendChild(nicknameText);
salary.appendChild(salaryText);
return document;
}
使用 SAX 解析器的客户端:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new MyHandler();
Socket socket = new Socket("localhost", 60000);
InputSource is = new InputSource(new InputStreamReader(socket.getInputStream()));
is.setEncoding("UTF-8");
//socket.getInputStream().skip(2); // skip over the 2 bytes from the DataInputStream
saxParser.parse(is, handler);
使用 DataInputStream 的客户端:
Socket socket = new Socket("localhost", 60000);
DataInputStream os = new DataInputStream(socket.getInputStream());
while(true) {
String data = os.readUTF();
System.out.println("Data: " + data);
}
When I was trying to parse xml using sax over sockets I came across a strange occurence.
Upon analysing I noticed that DataOutputStream adds 2 bytes in front of my data.
Message send by DataOutputStream:
0020 50 18 00 20 0f df 00 00 00 9d 3c 3f 78 6d 6c 20 P.. .... ..<?xml
0030 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 3f 3e 3c version= "1.0"?><
0040 63 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c company> <staff><
0050 66 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f firstnam e>yong</
0060 66 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e firstnam e><lastn
0070 61 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 ame>mook kim</la
0080 73 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 stname>< nickname
0090 3e c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 >..</nic kname><s
00a0 61 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 alary>10 0000</sa
00b0 6c 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 lary></s taff></c
00c0 6f 6d 70 61 6e 79 3e ompany>
Message send using Transformer:
0020 50 18 00 20 b6 b1 00 00 3c 3f 78 6d 6c 20 76 65 P.. .... <?xml ve
0030 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f rsion="1 .0" enco
0040 64 69 6e 67 3d 22 75 74 66 2d 38 22 3f 3e 3c 63 ding="ut f-8"?><c
0050 6f 6d 70 61 6e 79 3e 3c 73 74 61 66 66 3e 3c 66 ompany>< staff><f
0060 69 72 73 74 6e 61 6d 65 3e 79 6f 6e 67 3c 2f 66 irstname >yong</f
0070 69 72 73 74 6e 61 6d 65 3e 3c 6c 61 73 74 6e 61 irstname ><lastna
0080 6d 65 3e 6d 6f 6f 6b 20 6b 69 6d 3c 2f 6c 61 73 me>mook kim</las
0090 74 6e 61 6d 65 3e 3c 6e 69 63 6b 6e 61 6d 65 3e tname><n ickname>
00a0 c2 a7 3c 2f 6e 69 63 6b 6e 61 6d 65 3e 3c 73 61 ..</nick name><sa
00b0 6c 61 72 79 3e 31 30 30 30 30 30 3c 2f 73 61 6c lary>100 000</sal
00c0 61 72 79 3e 3c 2f 73 74 61 66 66 3e 3c 2f 63 6f ary></st aff></co
00d0 6d 70 61 6e 79 3e mpany>
As one might notice DataOutputStream adds two bytes in front of the message. Thus the sax parser throws the exception "org.xml.sax.SAXParseException: Content is not allowed in prolog.". However when I skip over these 2 bytes the sax parser works just fine.
Additional I noticed that DataInputStream is unable to read the Transformer message.
My question is: Why does DataOutputStream adds these bytes and why doesn't the Transformer?
For those who are interested in replicating the problem here is some code:
Server using DataInputStream:
String data = "<?xml version=\"1.0\"?><company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>§</nickname><salary>100000</salary></staff></company>";
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.writeUTF(data);
os.close();
socket.close();
Server using Transformer:
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
Document doc = createDocument();
printXML(doc, os);
os.close();
socket.close();
public synchronized static void printXML(Document document, OutputStream stream) throws TransformerException
{
DOMSource domSource = new DOMSource(document);
StreamResult streamResult = new StreamResult(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
serializer.setOutputProperty(OutputKeys.INDENT, "no");
serializer.transform(domSource, streamResult);
}
private static Document createDocument() throws ParserConfigurationException
{
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element company = document.createElement("company");
Element staff = document.createElement("staff");
Element firstname = document.createElement("firstname");
Element lastname = document.createElement("lastname");
Element nickname = document.createElement("nickname");
Element salary = document.createElement("salary");
Text firstnameText = document.createTextNode("yong");
Text lastnameText = document.createTextNode("mook kim");
Text nicknameText = document.createTextNode("§");
Text salaryText = document.createTextNode("100000");
document.appendChild(company);
company.appendChild(staff);
staff.appendChild(firstname);
staff.appendChild(lastname);
staff.appendChild(nickname);
staff.appendChild(salary);
firstname.appendChild(firstnameText);
lastname.appendChild(lastnameText);
nickname.appendChild(nicknameText);
salary.appendChild(salaryText);
return document;
}
Client using SAX Parser:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new MyHandler();
Socket socket = new Socket("localhost", 60000);
InputSource is = new InputSource(new InputStreamReader(socket.getInputStream()));
is.setEncoding("UTF-8");
//socket.getInputStream().skip(2); // skip over the 2 bytes from the DataInputStream
saxParser.parse(is, handler);
Client using DataInputStream:
Socket socket = new Socket("localhost", 60000);
DataInputStream os = new DataInputStream(socket.getInputStream());
while(true) {
String data = os.readUTF();
System.out.println("Data: " + data);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
DataOutputStream.writeUTF()
的输出是自定义格式,旨在由DataInputStream.readUTF()
读取。您调用的 writeUTF 方法的 javadoc 说:
The output of
DataOutputStream.writeUTF()
is a custom format, intended to be read byDataInputStream.readUTF()
.The javadocs of the
writeUTF
method you are calling say:读取和写入数据时始终使用相同类型的流。如果您将流直接输入 sax 解析器,那么您不应该使用 DataOutputStream。
只需使用
Always use the same type of stream when reading and writing data. If you are feeding the stream directly into a sax parser, then you should not use a DataOutputStream.
Just use