ObjectOutputStream 的 writeObject 方法使用什么字符编码?

发布于 2024-10-07 03:42:27 字数 370 浏览 4 评论 0 原文

我读到Java内部使用UTF-16编码。即我明白,如果我有这样的: String var = "जनमत";那么“जनमत”将在内部以 UTF-16 编码。那么,如果我将此变量转储到某个文件,如下所示:

fileOut = new FileOutputStream("output.xyz");
out = new ObjectOutputStream(fileOut);
out.writeObject(var);

文件“output.xyz”中字符串“जनमत”的编码是否为 UTF-16?另外,稍后如果我想通过 ObjectInputStream 读取文件“output.xyz”,我是否能够获得变量的 UTF-16 表示形式?

谢谢。

I read that Java uses UTF-16 encoding internally. i.e. I understand that if I have like: String var = "जनमत"; then the "जनमत" will be encoded in UTF-16 internally. So, If I dump this variable to some file such as below:

fileOut = new FileOutputStream("output.xyz");
out = new ObjectOutputStream(fileOut);
out.writeObject(var);

will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16? Also, later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

黎夕旧梦 2024-10-14 03:42:27

那么,如果我将此变量转储到某个文件...文件“output.xyz”中字符串“जनमत”的编码是否为 UTF-16?

文件中字符串的编码将采用 ObjectOutputStream 想要放入的任何格式。您应该将其视为只能由 ObjectInputStream 读取的黑匣子>。 (说真的 - 尽管格式是 IIRC 记录充分< /a>,如果您想要使用其他工具读取它,您应该自己将对象序列化为 XML 或 JSON 或其他形式。)

稍后,如果我想通过 ObjectInputStream 读取文件“output.xyz”,我能否获得变量的 UTF-16 表示形式?

如果您使用 ObjectInputStream 读取文件,您将获得原始对象的副本。这将包括一个 java.lang.String,它只是一个字符流(而不是字节) - 如果您愿意,可以通过 getBytes() 方法 (虽然我怀疑你实际上并不需要)。


总之,不要太担心序列化的内部细节。如果您需要知道发生了什么,请自己创建该文件;如果您只是好奇,请相信 JVM 会做正确的事情。

So, If I dump this variable to some file... will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16?

The encoding of your string in the file will be in whatever format the ObjectOutputStream wants to put it in. You should treat it as a black box that can only be read by an ObjectInputStream. (Seriously - even though the format is IIRC well-documented, if you want to read it with some other tool, you should serialise the object yourself as XML or JSON or whatever.)

Later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?

If you read the file with an ObjectInputStream, you'll get a copy of the original object back. This will include a java.lang.String, which is a just stream of characters (not bytes) - from which you could get the UTF-16 representation if you wished via the getBytes() method (though I suspect you don't actually need to).


In conclusion, don't worry too much about the internal details of serialization. If you need to know what's going on, create the file yourself; and if you're just curious, trust in the JVM to do the right thing.

牵你手 2024-10-14 03:42:27

关闭:它并不完全是UTF-16,而是类似UCS-2的东西;但无论哪种方式,它确实对大多数字符使用 2 个字节(以及 2 个字符的序列,即对一些很少使用的代码点使用 4 个字节)。

ObjectOutputStream 使用一种称为“修改的 UTF-8”的东西,它类似于 UTF-8,但其中零字符表示为 2 字节序列,这对于 UTF-8 来说是不合法的(由于编码的唯一性限制),但这种自然解码回到值 0。

但是您真正要问的是“它是否有效,以便我写入一个字符串,读取一个字符串”——答案是肯定的。 JDK 在写出字节时进行正确的编码,在读取时进行解码。

就其价值而言,您最好对字符串使用“writeUTF()”方法,因为我认为结果输出更紧凑。但“writeObject()”也可以工作,只是需要更多的元数据。

Close: it is not exactly UTF-16, but something like UCS-2; but either way it does use 2 bytes for most characters (and sequence of 2 chars, i.e. 4 bytes for some rarely used code points).

ObjectOutputStream uses something called modified UTF-8, which is like UTF-8 but where zero character is expressed as 2-byte sequence which is not legal as per UTF-8 (due to uniqueness restrictions of encoding), but that sort of naturally decodes back to value 0.

But what you are really asking is "does it work so that I write a String, read a String" -- and answer to that is yes. JDK does proper encoding when writing bytes out, and decoding when reading.

For what it's worth, you are better of using "writeUTF()" method for Strings, since I think resulting output is bit more compact. but "writeObject()" also works, just needs bit more metadata.

海拔太高太耀眼 2024-10-14 03:42:27

补充一下,ObjectOutputStream.writeString() 将确定给定字符串的 UTF 长度,并以“标准”UTF 或“长”UTF 格式写入,其中“长”如javadoc

“长”UTF 格式与
标准 UTF,只不过它使用 8
字节头(而不是标准的 2
bytes) 来传达 UTF 编码
长度。

我从代码中得到了这个......

private void writeString(String str, boolean unshared) throws IOException {
    handles.assign(unshared ? null : str);
    long utflen = bout.getUTFLength(str);
    if (utflen <= 0xFFFF) {
        bout.writeByte(TC_STRING);
        bout.writeUTF(str, utflen);
    } else {
        bout.writeByte(TC_LONGSTRING);
        bout.writeLongUTF(str, utflen);
    }
}

并且在 writeObject(Object obj) 中他们做了检查

if (obj instanceof String) {
    writeString((String) obj, unshared);
}

Just to add on this, ObjectOutputStream.writeString() will determing the UTF length of a given string and write it in "standard" UTF or in "long" UTF format where "long" as stated in the javadoc

"Long" UTF format is identical to
standard UTF, except that it uses an 8
byte header (instead of the standard 2
bytes) to convey the UTF encoding
length.

I got this from code...

private void writeString(String str, boolean unshared) throws IOException {
    handles.assign(unshared ? null : str);
    long utflen = bout.getUTFLength(str);
    if (utflen <= 0xFFFF) {
        bout.writeByte(TC_STRING);
        bout.writeUTF(str, utflen);
    } else {
        bout.writeByte(TC_LONGSTRING);
        bout.writeLongUTF(str, utflen);
    }
}

and in writeObject(Object obj) they do a check

if (obj instanceof String) {
    writeString((String) obj, unshared);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文