为什么是字符流?
据我所知,Java 字符流包装字节流,以便根据系统默认值或其他专门定义的字符集解释底层字节流。
我的系统默认字符集是UTF-8。
如果我使用 FileReader 读取文本文件,一切看起来都很正常,因为默认字符集用于解释底层 InputStreamReader 中的字节。如果我显式定义一个 InputStreamReader
来读取 UTF-8 编码的文本文件作为 UTF-16,那么一切显然看起来都很奇怪。使用像 FileInputStream 这样的字节流并将其输出重定向到 System.out,一切看起来都很好。
所以,我的问题是;
为什么使用字符流很有用?
为什么我要使用字符流而不是直接使用字节流?
什么时候定义特定的字符集有用?
I understand that Java character streams wrap byte streams such that the underlying byte stream is interpreted as per the system default or an otherwise specifically defined character set.
My systems default char-set is UTF-8.
If I use a FileReader
to read in a text file, everything looks normal as the default char-set is used to interpret the bytes from the underlying InputStreamReader
. If I explicitly define an InputStreamReader
to read the UTF-8 encoded text file in as UTF-16, everything obviously looks strange. Using a byte stream like FileInputStream
and redirecting its output to System.out, everything looks fine.
So, my questions are;
Why is it useful to use a character stream?
Why would I use a character stream instead of directly using a byte stream?
When is it useful to define a specific char-set?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
处理字符串的代码应该只根据文本“思考” - 例如,逐行读取输入源,您不想关心该源的性质。
但是,存储通常是面向字节的 - 因此您需要在源的面向字节视图(由
InputStream
封装)和源的面向字符视图(由封装)之间创建转换。代码>阅读器)。
因此,计算输入源中文本行数的方法应该采用 Reader 参数。如果您想计算两个文件中的文本行数,其中一个以 UTF-8 编码,另一个以 UTF-16 编码,您可以围绕
创建一个
为每个文件,每次指定适当的编码。InputStreamReader
>FileInputStream(我个人会完全避免
FileReader
- 事实上,它不允许您指定编码,这使得它在我看来毫无用处。)Code that deals with strings should only "think" in terms of text - for example, reading an input source line by line, you don't want to care about the nature of that source.
However, storage is usually byte-oriented - so you need to create a conversion between the byte-oriented view of a source (encapsulated by
InputStream
) and the character-oriented view of a source (encapsulated byReader
).So a method which (say) counts the lines of text in an input source should take a
Reader
parameter. If you want to count the lines of text in two files, one of which is encoded in UTF-8 and one of which is encoded in UTF-16, you'd create anInputStreamReader
around aFileInputStream
for each file, specifying the appropriate encoding each time.(Personally I would avoid
FileReader
completely - the fact that it doesn't let you specify an encoding makes it useless IMO.)InputStream
读取字节,而Reader
读取字符。由于字节映射到字符的方式,您需要在创建InputStreamReader
时指定字符集(或编码),默认为平台字符集。An
InputStream
reads bytes, while aReader
reads characters. Because of the way bytes map to characters, you need to specify the character set (or encoding) when you create anInputStreamReader
, the default being the platform character set.当您正在阅读/编写包含可能>的字符的文本时127、使用char流。当您读取/写入二进制数据时,请使用字节流。
如果您愿意,您可以将文本读取为二进制,但除非您做出大量假设,否则它很少会给您带来多大好处。
When you are reading/writing text which contains characters which could be > 127 , use a char stream. When you are reading/writing binary data use a byte stream.
You cna read text as binary if you wish, but unless you make alot of assumptions it rarely gains you much.