ObjectOutputStream 的 writeObject 方法使用什么字符编码?
我读到Java内部使用UTF-16编码。即我明白,如果我有这样的: String var = "जनमत";那么“जनमत”将在内部以 UTF-16 编码。那么,如果我将此变量转储到某个文件,如下所示:
fileOut = new FileOutputStream("output.xyz");
out = new ObjectOutputStream(fileOut);
out.writeObject(var);
文件“output.xyz”中字符串“जनमत”的编码是否为 UTF-16?另外,稍后如果我想通过 ObjectInputStream 读取文件“output.xyz”,我是否能够获得变量的 UTF-16 表示形式?
谢谢。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
文件中字符串的编码将采用
ObjectOutputStream
想要放入的任何格式。您应该将其视为只能由ObjectInputStream
读取的黑匣子>。 (说真的 - 尽管格式是 IIRC 记录充分< /a>,如果您想要使用其他工具读取它,您应该自己将对象序列化为 XML 或 JSON 或其他形式。)如果您使用
ObjectInputStream
读取文件,您将获得原始对象的副本。这将包括一个java.lang.String
,它只是一个字符流(而不是字节) - 如果您愿意,可以通过 getBytes() 方法 (虽然我怀疑你实际上并不需要)。总之,不要太担心序列化的内部细节。如果您需要知道发生了什么,请自己创建该文件;如果您只是好奇,请相信 JVM 会做正确的事情。
The encoding of your string in the file will be in whatever format the
ObjectOutputStream
wants to put it in. You should treat it as a black box that can only be read by anObjectInputStream
. (Seriously - even though the format is IIRC well-documented, if you want to read it with some other tool, you should serialise the object yourself as XML or JSON or whatever.)If you read the file with an
ObjectInputStream
, you'll get a copy of the original object back. This will include ajava.lang.String
, which is a just stream of characters (not bytes) - from which you could get the UTF-16 representation if you wished via the getBytes() method (though I suspect you don't actually need to).In conclusion, don't worry too much about the internal details of serialization. If you need to know what's going on, create the file yourself; and if you're just curious, trust in the JVM to do the right thing.
关闭:它并不完全是UTF-16,而是类似UCS-2的东西;但无论哪种方式,它确实对大多数字符使用 2 个字节(以及 2 个字符的序列,即对一些很少使用的代码点使用 4 个字节)。
ObjectOutputStream 使用一种称为“修改的 UTF-8”的东西,它类似于 UTF-8,但其中零字符表示为 2 字节序列,这对于 UTF-8 来说是不合法的(由于编码的唯一性限制),但这种自然解码回到值 0。
但是您真正要问的是“它是否有效,以便我写入一个字符串,读取一个字符串”——答案是肯定的。 JDK 在写出字节时进行正确的编码,在读取时进行解码。
就其价值而言,您最好对字符串使用“writeUTF()”方法,因为我认为结果输出更紧凑。但“writeObject()”也可以工作,只是需要更多的元数据。
Close: it is not exactly UTF-16, but something like UCS-2; but either way it does use 2 bytes for most characters (and sequence of 2 chars, i.e. 4 bytes for some rarely used code points).
ObjectOutputStream uses something called modified UTF-8, which is like UTF-8 but where zero character is expressed as 2-byte sequence which is not legal as per UTF-8 (due to uniqueness restrictions of encoding), but that sort of naturally decodes back to value 0.
But what you are really asking is "does it work so that I write a String, read a String" -- and answer to that is yes. JDK does proper encoding when writing bytes out, and decoding when reading.
For what it's worth, you are better of using "writeUTF()" method for Strings, since I think resulting output is bit more compact. but "writeObject()" also works, just needs bit more metadata.
补充一下,
ObjectOutputStream.writeString()
将确定给定字符串的 UTF 长度,并以“标准”UTF 或“长”UTF 格式写入,其中“长”如javadoc我从代码中得到了这个......
并且在
writeObject(Object obj)
中他们做了检查Just to add on this,
ObjectOutputStream.writeString()
will determing the UTF length of a given string and write it in "standard" UTF or in "long" UTF format where "long" as stated in the javadocI got this from code...
and in
writeObject(Object obj)
they do a check