Java 流的误解...一些澄清?

发布于 2024-11-29 05:08:40 字数 357 浏览 0 评论 0原文

我知道字节流处理字节,字符流处理字符...如果我使用字节流读取字符,这是否会限制我可能读取的字符类型?例如,字节读取为 8 位字节,字符读取为 16 位字符......这是否意味着可以使用字符流而不是字节流来表示更多字符?

我困惑的最后一件事是字节流如何写入文件以供读取。如果我从网络套接字接收字节,我会将它们包装在 InputStreamReader 中进行写入,这样我就可以获得字符流提供的字符转换逻辑。如果我使用 FileInputStream 读取文件并使用 FileOutputStream 写出,为什么当我使用文本编辑器打开该文件时它是可读的? FileOutputStream 如何处理字节?

I understand that byte streams deal with bytes and character streams deal with characters... if I use a byte stream to read in characters, could this limit me to the sorts of characters I might read? For instance, bytes are read in as 8 bit bytes, characters are read in as 16 bit characters... does this mean that more characters can be represented using character streams rather than byte streams?

The last thing im confused about is how a byte stream writes out to a file for reading. If I was recieving bytes from a network socket, I would wrap them in a InputStreamReader for writing, this way I would get the character transformation logic the character stream provides. If I read from a file using a FileInputStream and write out using a FileOutputStream, why is this file readable when I open it with a text editor? How is the FileOutputStream treating the bytes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

找个人就嫁了吧 2024-12-06 05:08:40

这里的关键概念是字符编码:每个人类可读的字符都以某种方式编码为一个或多个字节。有很多字符编码。最流行的是:

  • ASCII(7 位,剩余位未使用),将一个字符视为一个字节
  • UTF-8:最常见的字符表示为单个字节,较少见的是 2 个甚至更多

这些编码即使在以下情况下也是可读的:您在十六进制编辑器中打开一个文件。然而,有许多字符编码不具有​​此功能,即 UTF-16 和 UTF-32。

现在回到你的问题:InputStream 只给你一个字节流。如果您的字节表示使用 ASCII 或 UTF-8 编码的字符,则大多数情况下都没有问题。但如果这些字节表示更复杂的内容(例如 UTF-16),则绝对需要一个Reader。当然,读者必须知道底层的InputStream 提供哪种字符编码。这通常是初学者遇到的问题 - 未使用字符编码显式初始化的 Reader 通常会回退到系统默认值。

其他方式(与作家)类似。如果您只是将 char 转换为 byte,大多数情况下都会没问题。但是,如果您的字符包含不太流行的国家字母,您的输出将格式错误/被截断。因此,您创建一个 Writer 来将每个给定的字符转换为一系列一个或多个字节。您有义务再次提供字符编码。

重要规则:

  • 处理二进制数据(多媒体、ZIP 和 PDF 文件等)时始终使用 InputStream
  • 始终使用 Reader< /code> 读取文本(txt、HTML、XML...)时
  • 始终了解并指定字符编码 当从字节流读取字符时,始终有意识地选择您使用的字符编码写入数据。

The key concept here is character encoding: each human readable character is somehow encoded into one or more bytes. There are plenty of character encodings. The most popular ones are:

  • ASCII (7 bit, remaining bit is unused) that treats one character as one byte
  • UTF-8: most common characters are represented as a single byte, less common as 2 or even more

These encodings are readable even when you open a file in hex editor. However there many character encodings that do not have this feature, namely UTF-16 and UTF-32.

Now back to your question: InputStream only gives you a stream of bytes. If your bytes represent characters encoded with ASCII or UTF-8, most of the time you are fine. But if these bytes represent something more sophisticated like UTF-16, you absolutely need a Reader. Of course the reader has to know which character encoding does the underlying InputStream provide. This is often a problem done by the beginners - Reader not initialized with character encoding explicitly will often fall back to system default.

Other way (with writers) is similar. If you simply cast your chars to bytes, most of the time you will be fine. But if your characters contain less popular national letters, your output will be malformed/truncated. So you create a Writer that converts each given charater to a series of one or more bytes. Once again you are obligated to provide the character encoding.

Important rules:

  • always use InputStream when dealing with binary data (multimedia, ZIP and PDF files, etc.)
  • always use Reader when reading text (txt, HTML, XML...)
  • always know and specify character encoding when reading character from byte stream, always consciously choose character encoding you use to write the data.
终难愈 2024-12-06 05:08:40

char 是表示 Unicode 字符的 16 位字符串。

字节 是一个 8 位字符串,表示 2 的补码数。

这里重要的是它们都是位串。从技术上讲,一个 char 只是 2 byte。除了 Java 如何处理这两者的一些次要语义之外,仅此而已。就计算机(或输入/输出流)而言,唯一的区别是它们保存的位数。

A char is a 16 bit string that represents a Unicode character.

A byte is an 8 bit string that represents a 2's complement number.

The important thing here is that they are both bit strings. Technically speaking, a char is simply 2 bytes. Nothing more, nothing less aside from some minor semantics with how Java treats the two. As far as the computer (or Input/OutputStreams) are concerned, the only difference is the number of bits they hold.

层林尽染 2024-12-06 05:08:40

我认为你需要掌握字节和字符之间的关系才能得到澄清。

这个问题的公认答案非常明确恕我直言:为什么Java I/O中的一个字节可以代表一个字符?

我还想看看字节流和字符流

如果你不想让 Joel 抓住你让你在潜艇里剥洋葱 6 个月,只需阅读 http://www.joelonsoftware.com/articles/Unicode.html

I think you need to grasp the relation between a byte and a character in order to get your clarification.

The accepted answer to this question is quite clear IMHO : Why does a byte in Java I/O can represent a character?

I'd also check out byte stream and character stream

And if you don't want Joel to catch you and make you peel onions for 6 months in a submarine, just read http://www.joelonsoftware.com/articles/Unicode.html

清引 2024-12-06 05:08:40

java中所有的IO流底层都只是字节流。字节到字符(反之亦然)的转换是使用编码完成的。但在这一切之下,它们都是字节。

All IO streams in java are just byte streams underneath. Byte to Character(and vice versa) conversions are done using encoding. But underneath it all, they are all bytes.

阳光下慵懒的猫 2024-12-06 05:08:40

回答您的问题:

我理解字节流处理的是字节流和字符流
处理字符...如果我使用字节流读取字符,
这是否会限制我可以阅读的字符类型?

字符不是字节。根据所选的编码方案,字符存储在一个或多个字节中。编码方案消除/扩展了您可以读取的字符种类的限制。

例如,字节读入为8位字节,字符读入
作为 16 位字符...这是否意味着可以容纳更多字符
使用字符流而不是字节流表示?

在某种程度上,是的。

我困惑的最后一件事是字节流如何写入
文件以供阅读。如果我从网络套接字接收字节,我
将它们包装在 InputStreamReader 中进行写入,这样我会
获取字符流提供的字符转换逻辑。
如果我使用 FileInputStream 读取文件并使用
FileOutputStream,为什么当我用文本打开这个文件时可读
编辑? FileOutputStream 如何处理字节?

对于与字符对应的字节/数据,您应该使用 OutputStreamWriter 写入文件并使其可以用文本编辑器读取。您可以在创建时指定编码,流将执行文本数据的编码。

To answer your questions:

I understand that byte streams deal with bytes and character streams
deal with characters... if I use a byte stream to read in characters,
could this limit me to the sorts of characters I might read?

Characters are not bytes. A character is store in one or more bytes according to the selected encoding scheme. The encoding scheme removes/extends the limit of sorts of characters you can read.

For instance, bytes are read in as 8 bit bytes, characters are read in
as 16 bit characters... does this mean that more characters can be
represented using character streams rather than byte streams?

In a way, yes.

The last thing im confused about is how a byte stream writes out to a
file for reading. If I was recieving bytes from a network socket, I
would wrap them in a InputStreamReader for writing, this way I would
get the character transformation logic the character stream provides.
If I read from a file using a FileInputStream and write out using a
FileOutputStream, why is this file readable when I open it with a text
editor? How is the FileOutputStream treating the bytes?

For bytes/data corresponding to characters, you should use OutputStreamWriter for writing to a file and make it readable with a text editor. You can specify encoding at creation and the stream will perform the encoding of you text data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文