BufferedWriter#write(int) javadoc 查询

发布于 2024-09-18 07:57:13 字数 709 浏览 16 评论 0原文

Javadoc 对此表示:

仅写入整数 oneChar 的低两个字节。

这对写入已转换为 int 的非 utf8 编码字符有何影响(如果有)?

更新:

相关代码从套接字接收数据并将其写入文件。 (在接收和写入之间会发生很多事情,所以我不能只使用使用 BufferedReader#readLine() 获得的字符串)。我使用的是 Writer#write(char[]) 但这意味着我每次都必须创建一个新的 char 数组。为了避免每次创建数组,我有一个用 -1 填充的 char 数组(转换为 char)。

然后我使用 TextUtils#getChars 来填充它,必要时扩展数组。为了进行写入,我循环遍历数组,写入 Writer 直到 char[i] == (char) -1 == true。

The Javadoc for this says:

Only the lower two bytes of the integer oneChar are written.

What effect, if any, does this have on writing non-utf8 encoded chars which have been cast to an int?

Update:

The code in question receives data from a socket and writes it to a file. (A lot of things happen between receiving and writing, so I can't just use the string I get using BufferedReader#readLine()). I was using Writer#write(char[]) but this meant I had to create a new char array each time. To get around creating an array everytime, I had a single char array which is filled with -1 (cast to a char).

I then use TextUtils#getChars to fill it, expanding the array if necessary. For writing, I loop through the array, writing to the Writer until char[i] == (char) -1 == true.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

凉城已无爱 2024-09-25 07:57:13

在内部,write(int) 只会将其参数转换为 char,因此 write(i) 相当于 write((char )i)

现在在Java中,char内部只是一个整数类型,范围是0-65535(即16位)。强制转换 int -> char 是“缩小原始转换”(Java语言规范,5.1.3),并且 int 是有符号整数,因此:

有符号的缩小转换
简单地将整数转换为整数类型 T
丢弃除第 n 个最低阶之外的所有阶
位,其中 n 是位数
用于表示类型T。另外
可能会导致信息丢失
关于数字的大小
值,这可能会导致符号
结果值不同于
输入值的符号。

这就是为什么 Javadoc 说只写入低两个字节。

现在,这对字符的含义取决于您想要如何解释 int 值。 Java 中的 char 表示 UTF-16 中的 Unicode 代码点,即 char 表示的 16 位数字被解释为 Unicode 代码点的编号。因此,如果您的每个 int 值都是 16 位代码点的数字,那就没问题(实际上,这只适用于 BMP 中的字符;如果您在补充平面中使用字符,则每个 Unicode 代码点都将被编码分成两个 char)。如果是其他内容(包括超过 16 位的代码点、负数或完全其他内容),您将得到垃圾。

这会产生什么影响(如果有的话)
写入非 utf8 字符
转换为 int?

不存在“非 utf8 字符”这样的东西。 UTF-8 是一种编码,它是表示 Unicode 代码点的一种方式,因此提出的问题毫无意义。也许您可以解释一下您的代码的作用?

Internally, write(int) will just cast its parameter to char, so write(i) is equivalent to write((char)i).

Now in Java, internally char is just an integer type, with the range 0-65535 (i.e. 16 bit). The cast int -> char is a "narrowing primitive conversion" (Java Language spec, 5.1.3), and int is a signed integer, hence:

A narrowing conversion of a signed
integer to an integral type T simply
discards all but the n lowest order
bits, where n is the number of bits
used to represent type T. In addition
to a possible loss of information
about the magnitude of the numeric
value, this may cause the sign of the
resulting value to differ from the
sign of the input value.

That's why the Javadoc says that only the lower two bytes are written.

Now, what this means in terms of characters depends on how you want to interpret the int values. A char in Java represents a Unicode code point in UTF-16, that is the 16 bit number represented by the char is interpreted as the number of the Unicode code point. So if each of your int values is the number of a 16 bit code point, you're fine (actually, this is only true for characters in the BMP; if you use characters in the supplementary planes, each Unicode code point will be encoded into two chars). If it's anything else (including a code point with more than 16 bit, or a negative number, or something else entirely), you'll get garbage.

What effect, if any, does this have on
writing non-utf8 chars which have been
cast to an int?

There is no such thing as a "non-utf8 char". UTF-8 is an encoding, that is a way to represent a Unicode code point, so the question as posed is meaningless. Maybe you could explain what your code does?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文