为什么 DataOutputStream.writeUTF() 在开头添加额外的 2 个字节?

发布于 2024-12-07 10:36:00 字数 5459 浏览 1 评论 0原文

当我尝试使用 sax 通过套接字解析 xml 时,我遇到了一个奇怪的现象。 经过分析,我注意到 DataOutputStream 在我的数据前面添加了 2 个字节。

通过 DataOutputStream 发送消息:

0020  50 18 00 20 0f df 00 00  00 9d 3c 3f 78 6d 6c 20   P.. .... ..<?xml 
0030  76 65 72 73 69 6f 6e 3d  22 31 2e 30 22 3f 3e 3c   version= "1.0"?><
0040  63 6f 6d 70 61 6e 79 3e  3c 73 74 61 66 66 3e 3c   company> <staff><
0050  66 69 72 73 74 6e 61 6d  65 3e 79 6f 6e 67 3c 2f   firstnam e>yong</
0060  66 69 72 73 74 6e 61 6d  65 3e 3c 6c 61 73 74 6e   firstnam e><lastn
0070  61 6d 65 3e 6d 6f 6f 6b  20 6b 69 6d 3c 2f 6c 61   ame>mook  kim</la
0080  73 74 6e 61 6d 65 3e 3c  6e 69 63 6b 6e 61 6d 65   stname>< nickname
0090  3e c2 a7 3c 2f 6e 69 63  6b 6e 61 6d 65 3e 3c 73   >..</nic kname><s
00a0  61 6c 61 72 79 3e 31 30  30 30 30 30 3c 2f 73 61   alary>10 0000</sa
00b0  6c 61 72 79 3e 3c 2f 73  74 61 66 66 3e 3c 2f 63   lary></s taff></c
00c0  6f 6d 70 61 6e 79 3e                               ompany>

使用 Transformer 发送消息:

0020  50 18 00 20 b6 b1 00 00  3c 3f 78 6d 6c 20 76 65   P.. .... <?xml ve
0030  72 73 69 6f 6e 3d 22 31  2e 30 22 20 65 6e 63 6f   rsion="1 .0" enco
0040  64 69 6e 67 3d 22 75 74  66 2d 38 22 3f 3e 3c 63   ding="ut f-8"?><c
0050  6f 6d 70 61 6e 79 3e 3c  73 74 61 66 66 3e 3c 66   ompany>< staff><f
0060  69 72 73 74 6e 61 6d 65  3e 79 6f 6e 67 3c 2f 66   irstname >yong</f
0070  69 72 73 74 6e 61 6d 65  3e 3c 6c 61 73 74 6e 61   irstname ><lastna
0080  6d 65 3e 6d 6f 6f 6b 20  6b 69 6d 3c 2f 6c 61 73   me>mook  kim</las
0090  74 6e 61 6d 65 3e 3c 6e  69 63 6b 6e 61 6d 65 3e   tname><n ickname>
00a0  c2 a7 3c 2f 6e 69 63 6b  6e 61 6d 65 3e 3c 73 61   ..</nick name><sa
00b0  6c 61 72 79 3e 31 30 30  30 30 30 3c 2f 73 61 6c   lary>100 000</sal
00c0  61 72 79 3e 3c 2f 73 74  61 66 66 3e 3c 2f 63 6f   ary></st aff></co
00d0  6d 70 61 6e 79 3e                                  mpany>  

人们可能会注意到 DataOutputStream 在消息前面添加了两个字节。因此,sax 解析器会抛出异常“org.xml.sax.SAXParseException:序言中不允许内容。”。然而,当我跳过这 2 个字节时,sax 解析器工作得很好。 另外我注意到 DataInputStream 无法读取 Transformer 消息。

我的问题是:为什么 DataOutputStream 添加这些字节而 Transformer 不添加?




对于那些有兴趣复制问题的人,这里有一些代码:

使用 DataInputStream 的服务器:

String data = "<?xml version=\"1.0\"?><company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>§</nickname><salary>100000</salary></staff></company>";
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.writeUTF(data);
os.close();
socket.close();

使用 Transformer 的服务器:

ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
Document doc = createDocument();
printXML(doc, os);
os.close();
socket.close();

public synchronized static void printXML(Document document, OutputStream stream) throws TransformerException
{
    DOMSource domSource = new DOMSource(document);
    StreamResult streamResult = new StreamResult(stream);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "no");
    serializer.transform(domSource, streamResult);
}

private static Document createDocument() throws ParserConfigurationException
{
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    Element company = document.createElement("company");
    Element staff = document.createElement("staff");
    Element firstname = document.createElement("firstname");
    Element lastname = document.createElement("lastname");
    Element nickname = document.createElement("nickname");
    Element salary = document.createElement("salary");
    Text firstnameText = document.createTextNode("yong");
    Text lastnameText = document.createTextNode("mook kim");
    Text nicknameText = document.createTextNode("§");
    Text salaryText = document.createTextNode("100000");
    document.appendChild(company);
    company.appendChild(staff);
    staff.appendChild(firstname);
    staff.appendChild(lastname);
    staff.appendChild(nickname);
    staff.appendChild(salary);
    firstname.appendChild(firstnameText);
    lastname.appendChild(lastnameText);
    nickname.appendChild(nicknameText);
    salary.appendChild(salaryText);
    return document;
}


使用 SAX 解析器的客户端:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new MyHandler();
Socket socket = new Socket("localhost", 60000);
InputSource is = new InputSource(new InputStreamReader(socket.getInputStream()));
is.setEncoding("UTF-8");
//socket.getInputStream().skip(2); // skip over the 2 bytes from the DataInputStream
saxParser.parse(is, handler);

使用 DataInputStream 的客户端:

Socket socket = new Socket("localhost", 60000);
DataInputStream os = new DataInputStream(socket.getInputStream());
while(true) {
    String data = os.readUTF();
    System.out.println("Data: " + data);
}

When I was trying to parse xml using sax over sockets I came across a strange occurence.
Upon analysing I noticed that DataOutputStream adds 2 bytes in front of my data.

Message send by DataOutputStream:

0020  50 18 00 20 0f df 00 00  00 9d 3c 3f 78 6d 6c 20   P.. .... ..<?xml 
0030  76 65 72 73 69 6f 6e 3d  22 31 2e 30 22 3f 3e 3c   version= "1.0"?><
0040  63 6f 6d 70 61 6e 79 3e  3c 73 74 61 66 66 3e 3c   company> <staff><
0050  66 69 72 73 74 6e 61 6d  65 3e 79 6f 6e 67 3c 2f   firstnam e>yong</
0060  66 69 72 73 74 6e 61 6d  65 3e 3c 6c 61 73 74 6e   firstnam e><lastn
0070  61 6d 65 3e 6d 6f 6f 6b  20 6b 69 6d 3c 2f 6c 61   ame>mook  kim</la
0080  73 74 6e 61 6d 65 3e 3c  6e 69 63 6b 6e 61 6d 65   stname>< nickname
0090  3e c2 a7 3c 2f 6e 69 63  6b 6e 61 6d 65 3e 3c 73   >..</nic kname><s
00a0  61 6c 61 72 79 3e 31 30  30 30 30 30 3c 2f 73 61   alary>10 0000</sa
00b0  6c 61 72 79 3e 3c 2f 73  74 61 66 66 3e 3c 2f 63   lary></s taff></c
00c0  6f 6d 70 61 6e 79 3e                               ompany>

Message send using Transformer:

0020  50 18 00 20 b6 b1 00 00  3c 3f 78 6d 6c 20 76 65   P.. .... <?xml ve
0030  72 73 69 6f 6e 3d 22 31  2e 30 22 20 65 6e 63 6f   rsion="1 .0" enco
0040  64 69 6e 67 3d 22 75 74  66 2d 38 22 3f 3e 3c 63   ding="ut f-8"?><c
0050  6f 6d 70 61 6e 79 3e 3c  73 74 61 66 66 3e 3c 66   ompany>< staff><f
0060  69 72 73 74 6e 61 6d 65  3e 79 6f 6e 67 3c 2f 66   irstname >yong</f
0070  69 72 73 74 6e 61 6d 65  3e 3c 6c 61 73 74 6e 61   irstname ><lastna
0080  6d 65 3e 6d 6f 6f 6b 20  6b 69 6d 3c 2f 6c 61 73   me>mook  kim</las
0090  74 6e 61 6d 65 3e 3c 6e  69 63 6b 6e 61 6d 65 3e   tname><n ickname>
00a0  c2 a7 3c 2f 6e 69 63 6b  6e 61 6d 65 3e 3c 73 61   ..</nick name><sa
00b0  6c 61 72 79 3e 31 30 30  30 30 30 3c 2f 73 61 6c   lary>100 000</sal
00c0  61 72 79 3e 3c 2f 73 74  61 66 66 3e 3c 2f 63 6f   ary></st aff></co
00d0  6d 70 61 6e 79 3e                                  mpany>  

As one might notice DataOutputStream adds two bytes in front of the message. Thus the sax parser throws the exception "org.xml.sax.SAXParseException: Content is not allowed in prolog.". However when I skip over these 2 bytes the sax parser works just fine.
Additional I noticed that DataInputStream is unable to read the Transformer message.

My question is: Why does DataOutputStream adds these bytes and why doesn't the Transformer?




For those who are interested in replicating the problem here is some code:

Server using DataInputStream:

String data = "<?xml version=\"1.0\"?><company><staff><firstname>yong</firstname><lastname>mook kim</lastname><nickname>§</nickname><salary>100000</salary></staff></company>";
ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
DataOutputStream os = new DataOutputStream(socket.getOutputStream());
os.writeUTF(data);
os.close();
socket.close();

Server using Transformer:

ServerSocket server = new ServerSocket(60000);
Socket socket = server.accept();
Document doc = createDocument();
printXML(doc, os);
os.close();
socket.close();

public synchronized static void printXML(Document document, OutputStream stream) throws TransformerException
{
    DOMSource domSource = new DOMSource(document);
    StreamResult streamResult = new StreamResult(stream);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "no");
    serializer.transform(domSource, streamResult);
}

private static Document createDocument() throws ParserConfigurationException
{
    Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
    Element company = document.createElement("company");
    Element staff = document.createElement("staff");
    Element firstname = document.createElement("firstname");
    Element lastname = document.createElement("lastname");
    Element nickname = document.createElement("nickname");
    Element salary = document.createElement("salary");
    Text firstnameText = document.createTextNode("yong");
    Text lastnameText = document.createTextNode("mook kim");
    Text nicknameText = document.createTextNode("§");
    Text salaryText = document.createTextNode("100000");
    document.appendChild(company);
    company.appendChild(staff);
    staff.appendChild(firstname);
    staff.appendChild(lastname);
    staff.appendChild(nickname);
    staff.appendChild(salary);
    firstname.appendChild(firstnameText);
    lastname.appendChild(lastnameText);
    nickname.appendChild(nicknameText);
    salary.appendChild(salaryText);
    return document;
}


Client using SAX Parser:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new MyHandler();
Socket socket = new Socket("localhost", 60000);
InputSource is = new InputSource(new InputStreamReader(socket.getInputStream()));
is.setEncoding("UTF-8");
//socket.getInputStream().skip(2); // skip over the 2 bytes from the DataInputStream
saxParser.parse(is, handler);

Client using DataInputStream:

Socket socket = new Socket("localhost", 60000);
DataInputStream os = new DataInputStream(socket.getInputStream());
while(true) {
    String data = os.readUTF();
    System.out.println("Data: " + data);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

画中仙 2024-12-14 10:36:00

DataOutputStream.writeUTF() 的输出是自定义格式,旨在由 DataInputStream.readUTF() 读取。

您调用的 writeUTF 方法的 javadoc 说:

使用修改后的 UTF-8 编码以与机器无关的方式将字符串写入底层输出流。

首先,两个字节被写入输出流,就像通过 writeShort 方法给出要跟随的字节数一样。该值是实际写入的字节数输出,而不是字符串的长度。根据长度,使用修改后的 UTF-8 编码按顺序输出字符串的每个字符。如果没有抛出异常,计数器writing 就会增加写入输出流的字节总数。这至少是 2 加上 str 的长度,最多是 2 加上 str 长度的三倍。

The output of DataOutputStream.writeUTF() is a custom format, intended to be read by DataInputStream.readUTF().

The javadocs of the writeUTF method you are calling say:

Writes a string to the underlying output stream using modified UTF-8 encoding in a machine-independent manner.

First, two bytes are written to the output stream as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for the character. If no exception is thrown, the counter written is incremented by the total number of bytes written to the output stream. This will be at least two plus the length of str, and at most two plus thrice the length of str.

丢了幸福的猪 2024-12-14 10:36:00

读取和写入数据时始终使用相同类型的流。如果您将流直接输入 sax 解析器,那么您不应该使用 DataOutputStream。

只需使用

BufferedOutputStream bos = new BufferedOutputStream(socket.getOutputStream());
bos.write(os.getBytes("UTF-8"));

Always use the same type of stream when reading and writing data. If you are feeding the stream directly into a sax parser, then you should not use a DataOutputStream.

Just use

BufferedOutputStream bos = new BufferedOutputStream(socket.getOutputStream());
bos.write(os.getBytes("UTF-8"));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文