BufferedWriter#write(int) javadoc 查询
Javadoc 对此表示:
仅写入整数 oneChar 的低两个字节。
这对写入已转换为 int 的非 utf8 编码字符有何影响(如果有)?
更新:
相关代码从套接字接收数据并将其写入文件。 (在接收和写入之间会发生很多事情,所以我不能只使用使用 BufferedReader#readLine() 获得的字符串)。我使用的是 Writer#write(char[]) 但这意味着我每次都必须创建一个新的 char 数组。为了避免每次创建数组,我有一个用 -1 填充的 char 数组(转换为 char)。
然后我使用 TextUtils#getChars 来填充它,必要时扩展数组。为了进行写入,我循环遍历数组,写入 Writer 直到 char[i] == (char) -1 == true。
The Javadoc for this says:
Only the lower two bytes of the integer oneChar are written.
What effect, if any, does this have on writing non-utf8 encoded chars which have been cast to an int?
Update:
The code in question receives data from a socket and writes it to a file. (A lot of things happen between receiving and writing, so I can't just use the string I get using BufferedReader#readLine()). I was using Writer#write(char[]) but this meant I had to create a new char array each time. To get around creating an array everytime, I had a single char array which is filled with -1 (cast to a char).
I then use TextUtils#getChars to fill it, expanding the array if necessary. For writing, I loop through the array, writing to the Writer until char[i] == (char) -1 == true.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在内部,
write(int)
只会将其参数转换为char
,因此write(i)
相当于write((char )i)
。现在在Java中,
char
内部只是一个整数类型,范围是0-65535(即16位)。强制转换 int -> char 是“缩小原始转换”(Java语言规范,5.1.3),并且int
是有符号整数,因此:这就是为什么 Javadoc 说只写入低两个字节。
现在,这对字符的含义取决于您想要如何解释 int 值。 Java 中的
char
表示 UTF-16 中的 Unicode 代码点,即 char 表示的 16 位数字被解释为 Unicode 代码点的编号。因此,如果您的每个 int 值都是 16 位代码点的数字,那就没问题(实际上,这只适用于 BMP 中的字符;如果您在补充平面中使用字符,则每个 Unicode 代码点都将被编码分成两个char
)。如果是其他内容(包括超过 16 位的代码点、负数或完全其他内容),您将得到垃圾。不存在“非 utf8 字符”这样的东西。 UTF-8 是一种编码,它是表示 Unicode 代码点的一种方式,因此提出的问题毫无意义。也许您可以解释一下您的代码的作用?
Internally,
write(int)
will just cast its parameter tochar
, sowrite(i)
is equivalent towrite((char)i)
.Now in Java, internally
char
is just an integer type, with the range 0-65535 (i.e. 16 bit). The cast int -> char is a "narrowing primitive conversion" (Java Language spec, 5.1.3), andint
is a signed integer, hence:That's why the Javadoc says that only the lower two bytes are written.
Now, what this means in terms of characters depends on how you want to interpret the int values. A
char
in Java represents a Unicode code point in UTF-16, that is the 16 bit number represented by the char is interpreted as the number of the Unicode code point. So if each of your int values is the number of a 16 bit code point, you're fine (actually, this is only true for characters in the BMP; if you use characters in the supplementary planes, each Unicode code point will be encoded into twochar
s). If it's anything else (including a code point with more than 16 bit, or a negative number, or something else entirely), you'll get garbage.There is no such thing as a "non-utf8 char". UTF-8 is an encoding, that is a way to represent a Unicode code point, so the question as posed is meaningless. Maybe you could explain what your code does?