为什么 US-ASCII 编码接受非 US-ASCII 字符?
考虑以下代码:
public class ReadingTest {
public void readAndPrint(String usingEncoding) throws Exception {
ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2, (byte) 0xB5}); // 'micro' sign UTF-8 representation
InputStreamReader isr = new InputStreamReader(bais, usingEncoding);
char[] cbuf = new char[2];
isr.read(cbuf);
System.out.println(cbuf[0]+" "+(int) cbuf[0]);
}
public static void main(String[] argv) throws Exception {
ReadingTest w = new ReadingTest();
w.readAndPrint("UTF-8");
w.readAndPrint("US-ASCII");
}
}
观察到的输出:
µ 181
? 65533
为什么第二次调用 readAndPrint()
(使用 US-ASCII 的调用)成功?我希望它会抛出错误,因为输入不是此编码中的正确字符。 Java API 或 JLS 中的哪个位置强制执行此行为?
Consider the following code:
public class ReadingTest {
public void readAndPrint(String usingEncoding) throws Exception {
ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2, (byte) 0xB5}); // 'micro' sign UTF-8 representation
InputStreamReader isr = new InputStreamReader(bais, usingEncoding);
char[] cbuf = new char[2];
isr.read(cbuf);
System.out.println(cbuf[0]+" "+(int) cbuf[0]);
}
public static void main(String[] argv) throws Exception {
ReadingTest w = new ReadingTest();
w.readAndPrint("UTF-8");
w.readAndPrint("US-ASCII");
}
}
Observed output:
µ 181
? 65533
Why does the second call of readAndPrint()
(the one using US-ASCII) succeed? I would expect it to throw an error, since the input is not a proper character in this encoding. What is the place in the Java API or JLS which mandates this behavior?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在输入流中查找不可解码字节时的默认操作是将其替换为 Unicode 字符 U+FFFD 替换字符。
如果您想更改它,可以传递
CharacterDecoder
到具有不同CodingErrorAction
配置:The default operation when finding un-decodable bytes in the input-stream is to replace them with the Unicode Character U+FFFD REPLACEMENT CHARACTER.
If you want to change that, you can pass a
CharacterDecoder
to theInputStreamReader
which has a differentCodingErrorAction
configured:我想说,这与构造函数相同
String(byte 字节[], int 偏移量, int 长度, Charset 字符集)
:使用
CharsetDecoder
您可以指定不同的CodingErrorAction
。I'd say, this is the same as for the constructor
String(byte bytes[], int offset, int length, Charset charset)
:Using
CharsetDecoder
you can specify a differentCodingErrorAction
.