请帮我澄清一些 Java IO 的概念,也许我会爱上它!
我试图熟悉 Java 提供的不同类型的流 IO,因此我在这里编写了这段小代码。
public static void main(String[] args) throws IOException {
String str = "English is being IOed!\nLine 2 has a number.\n中文字體(Chinese)";
FileOutputStream fos = new FileOutputStream("ByteIO.txt");
Scanner fis = new Scanner(new FileInputStream("ByteIO.txt"));
FileWriter fw = new FileWriter("CharIO.txt");
Scanner fr = new Scanner(new FileReader("CharIO.txt"));
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("BufferedByteIO.txt"));
Scanner bis = new Scanner(new BufferedInputStream(new FileInputStream("BufferedByteIO.txt")));
BufferedWriter bw = new BufferedWriter(new FileWriter("BufferedCharIO.txt"));
Scanner br = new Scanner(new BufferedReader(new FileReader("BufferedCharIO.txt")));
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream((new FileOutputStream("DataBufferedByteIO.txt"))));
Scanner dis = new Scanner(new DataInputStream(new BufferedInputStream((new FileInputStream("DataBufferedByteIO.txt")))));
try {
System.out.printf("ByteIO:\n");
fos.write(str.getBytes());
while (fis.hasNext())
System.out.print(fis.next());// in the form of a String
System.out.printf("\nCharIO:\n");
fw.write(str);
while (fr.hasNext())
System.out.print(fr.next());
System.out.printf("\nBufferedByteIO:\n");
bos.write(str.getBytes());
bos.flush();// buffer is not full, so you'll need to flush it
while (bis.hasNext())
System.out.print(bis.next());
System.out.printf("\nBufferedCharIO:\n");
bw.write(str);
bw.flush();// buffer is not full, so you'll need to flush it
while (br.hasNext())
System.out.print(br.next());
System.out.printf("\nDataBufferedByteIO:\n");
dos.write(str.getBytes());
//dos.flush();// dos doesn't seem to need this...
while (dis.hasNext())
System.out.print(dis.next());
} finally {
fos.close();
fis.close();
fw.close();
fr.close();
bos.close();
br.close();
dos.close();
dis.close();
}
}
它所做的只是将预定义的字符串写入文件,然后读取它。当我运行代码时出现问题,我得到:
ByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
CharIO:
//<--Empty line here
BufferedByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
BufferedCharIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
DataBufferedByteIO:
//<--Empty line here
文件都填充了正确的数据,所以我认为扫描仪出了问题,但我只是不知道出了什么问题,我希望有人可以为我指出错误。
这些文件均填充有相同数据。根据Java I/O Streams<,这很奇怪/a>,字节流只能处理单个字节,并且只有字符流可以处理Unicode,所以字节流在处理汉字时不应该吐出乱码吗,汉字是UTF-16(我认为)?字节流和字符流(fos 与 fw)之间到底有什么区别?
在一个部分不相关的主题上,我认为字节流用于处理音乐和图像等二进制数据,我还认为字节流吐出的数据应该难以辨认,但我似乎错了,不是吗?我应该使用哪个 I/O Stream Class(es)如果我正在处理二进制数据?
I'm trying to familiarize myself with the different types of stream IOs Java has to offer, so I wrote this little piece of code here.
public static void main(String[] args) throws IOException {
String str = "English is being IOed!\nLine 2 has a number.\n中文字體(Chinese)";
FileOutputStream fos = new FileOutputStream("ByteIO.txt");
Scanner fis = new Scanner(new FileInputStream("ByteIO.txt"));
FileWriter fw = new FileWriter("CharIO.txt");
Scanner fr = new Scanner(new FileReader("CharIO.txt"));
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("BufferedByteIO.txt"));
Scanner bis = new Scanner(new BufferedInputStream(new FileInputStream("BufferedByteIO.txt")));
BufferedWriter bw = new BufferedWriter(new FileWriter("BufferedCharIO.txt"));
Scanner br = new Scanner(new BufferedReader(new FileReader("BufferedCharIO.txt")));
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream((new FileOutputStream("DataBufferedByteIO.txt"))));
Scanner dis = new Scanner(new DataInputStream(new BufferedInputStream((new FileInputStream("DataBufferedByteIO.txt")))));
try {
System.out.printf("ByteIO:\n");
fos.write(str.getBytes());
while (fis.hasNext())
System.out.print(fis.next());// in the form of a String
System.out.printf("\nCharIO:\n");
fw.write(str);
while (fr.hasNext())
System.out.print(fr.next());
System.out.printf("\nBufferedByteIO:\n");
bos.write(str.getBytes());
bos.flush();// buffer is not full, so you'll need to flush it
while (bis.hasNext())
System.out.print(bis.next());
System.out.printf("\nBufferedCharIO:\n");
bw.write(str);
bw.flush();// buffer is not full, so you'll need to flush it
while (br.hasNext())
System.out.print(br.next());
System.out.printf("\nDataBufferedByteIO:\n");
dos.write(str.getBytes());
//dos.flush();// dos doesn't seem to need this...
while (dis.hasNext())
System.out.print(dis.next());
} finally {
fos.close();
fis.close();
fw.close();
fr.close();
bos.close();
br.close();
dos.close();
dis.close();
}
}
All it does is just write a pre-defined string into the file and then read it. The problem arises when I run the code, I get this:
ByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
CharIO:
//<--Empty line here
BufferedByteIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
BufferedCharIO:
EnglishisbeingIOed!Line2hasanumber.中文字體(Chinese)
DataBufferedByteIO:
//<--Empty line here
The files are all populated with the correct data, so I suppose something is wrong with the scanner, but I just don't know what went wrong, and I hope somebody can point the mistake out for me.
The files are all populated with the same data. That's weird, according to Java I/O Streams, Byte Streams can only process single bytes, and only Character Streams can process Unicode, so shouldn't Byte Streams spit out gibberish when processing Chinese characters, which are UTF-16 (I think)? What exactly is the difference between a Byte Stream and a Character Stream (fos vs fw)?
On a partially unrelated topic, I thought Byte Streams were used to work with binary data such as music and images, I also thought that the data Byte Streams spit out should be illegible, but I seem to be wrong, am I? Exactly which I/O Stream Class(es) should I work with if I'm dealing with binary data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这里需要理解的一个重要概念是编码。
String
/char[]
/Writer
/Reader
用于处理任何类型的文本数据。byte[]
/OutputStream
/InputStream
用于处理二进制数据。另外,磁盘上的文件仅存储二进制数据(是的,确实如此,希望稍后会更清楚)。每当您在这两个世界之间进行转换时,都会使用某种编码。在 Java 中,有多种方法可以在这些世界之间进行转换,而无需指定编码。在这种情况下,将使用平台默认编码(哪一种编码取决于您的平台和配置/区域设置)。 [*]
编码的任务是将一些给定的二进制数据(通常来自
byte[]
/ByteBuffer
/InputStream
)转换为文本数据(通常放入char[]
/CharBuffer
/Writer
)或相反。具体如何发生取决于所使用的编码。某些编码(例如 ISO-8859-* 系列)是从
byte
值到相应 unicode 代码点的简单映射,其他编码(例如 UTF-8)则更为复杂,单个 unicode 代码点可以是1 到 4 字节之间的任何内容。有一篇非常好的文章,对整个编码问题进行了基本概述,标题为: 每个软件开发人员的绝对最低限度绝对,必须了解 Unicode 和字符集(没有任何借口!)
[*] 通常不需要使用平台默认编码,因为它会使您的程序不可移植且难以使用,但这不是重点为了这个帖子。
An important concept to understand here is that of encoding.
String
/char[]
/Writer
/Reader
are used to deal with textual data of any kind.byte[]
/OutputStream
/InputStream
are used to deal with binary data. Also, a file on your disk only every stores binary data (yes, that's true, it will hopefully be a bit more clear in a minute).Whenever you convert between those two worlds some kind of encoding will be in play. In Java, there are several ways to convert between those worlds without specifying an encoding. In this case, the platform default encoding will be used (which one this is depends on your platform and configuration/locale). [*]
The task of an encoding is to convert some given binary data (usually from a
byte[]
/ByteBuffer
/InputStream
) to textual data (usually intochar[]
/CharBuffer
/Writer
) or the other way around.How exactly this happens depends on the encoding used. Some encodings (such as the ISO-8859-* family) are a simple mapping from
byte
values to corresponding unicode codepoints, others (such as UTF-8) are more complex and a single unicode codepoint can be anything from 1 to 4 bytes.There's a quite nice article that gives a basic overview over the whole encoding issue titled: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
[*] Using the platform default encoding is usually not desired, because it makes your program un-portable and hard to use, but that's beside the point for this post.
使用 BufferedInputStream 和 DataInputStream 不会更改数据的内容。
字节流用于读取二进制数据。这里不适合。
字符流用于读取文本,扫描仪假设您正在读取新行终止行。 (你似乎没有)
如果我运行
我得到
你可以看到原始字符被保留。这 '?'意味着该字符无法在我的终端或我的字符编码上显示。 (我不知道为什么)
Using BufferedInputStream and DataInputStream does not alter the content of the data.
Byte stream is for reading binary data. It is not suitable here.
Character stream is for reading text, the scanner assumes you are reading new line terminated lines. (Which you don't appear to have)
If I run
I get
You can see that the original characters are preserved. The '?' means the character could not be displayed on my terminal or my character encoding. (I don't know why)