当前位置：文江博客话题详情

将字符串转换为字节数组时会发生什么

发布于 2024-11-30 09:57:16 字数 490 浏览 4 评论 0原文

我认为这是一个新手类型的问题，但我已经很理解了。

我可以找到很多关于如何用各种语言将字符串转换为字节数组的帖子。

我不明白的是逐个字符地发生了什么。据我所知，屏幕上显示的每个字符都由一个数字表示，例如它的 ascii 代码。（我们现在可以坚持使用 ASCII，这样我就可以在概念上得到这个:-)）

这是否意味着当我想要表示一个字符或一个字符串（这是一个字符列表）时，会发生以下情况：

将字符转换为 ASCII 值 >将 ascii 值表示为二进制？

我见过通过将字节数组定义为输入字符串长度的 1/2 来创建字节数组的代码，因此字节数组肯定与字符串长度相同吗？

所以我有点困惑。基本上我试图将一个字符串值存储到 ColdFusion 中的字节数组中，我看不到它有一个显式的字符串到字节数组函数。

然而，我可以了解底层的java，但我需要知道理论层面上发生了什么。

预先感谢，如果您认为我疯了，请告诉我！

格斯

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

剩余の解释 2024-12-07 09:57:16

在 Java 中，字符串存储为 16 位 char 值的数组。字符串中的每个 Unicode 字符在数组中存储为一个或（很少）两个 char 值。

如果要将一些字符串数据存储在 byte 数组中，您需要能够将字符串的 Unicode 字符转换为字节序列。这个过程称为编码，有多种方法可以实现，每种方法具有不同的规则和结果。如果两段代码想要使用字节数组共享字符串数据，它们需要就使用哪种编码达成一致。

例如，假设我们有一个字符串 s，我们希望使用 UTF-8 编码。 UTF-8 具有一个方便的属性，即如果使用它对仅包含 ASCII 字符的字符串进行编码，则输入中的每个字符都会使用该字符的 ASCII 值转换为单个字节。我们可以将 Java 字符串转换为 Java 字节数组，如下所示：

byte[] bytes = s.getBytes("UTF-8");

字节数组 bytes 现在包含来自 s 的字符串数据，使用 UTF-8 编码将其编码为字节。

现在，我们在某处存储或传输字节，另一端的代码想要将字节解码回 Java String。它将执行如下操作：

String t = new String(bytes, "UTF-8");

假设没有出现任何问题，字符串 t 现在包含与原始字符串 s 相同的字符串数据。

请注意，两段代码必须就所使用的编码达成一致。如果他们不同意，生成的字符串可能最终包含垃圾，或者甚至可能无法解码全部。

In Java, strings are stored as an array of 16-bit char values. Each Unicode character in the string is stored as one or (rarely) two char values in the array.

If you want to store some string data in a byte array, you will need to be able to convert the string's Unicode characters into a sequence of bytes. This process is called encoding and there are several ways to do it, each with different rules and results. If two pieces of code want to share string data using byte arrays, they need to agree on which encoding is being used.

For example, suppose we have a string s that we want to encode using the UTF-8 encoding. UTF-8 has the convenient property that if you use it to encode a string that contains only ASCII characters, every character in the input gets converted to a single byte with that character's ASCII value. We might convert our Java string to a Java byte array as follows:

byte[] bytes = s.getBytes("UTF-8");

The byte array bytes now contains the string data from s, encoded into bytes using the UTF-8 encoding.

Now, we store or transmit the bytes somewhere, and the code on the other end wants to decode the bytes back into a Java String. It will do something like the following:

String t = new String(bytes, "UTF-8");

Assuming nothing went wrong, the string t now contains the same string data as the original string s.

Note that both pieces of code had to agree on what encoding was being used. If they disagreed, the resulting string might end up containing garbage, or might even fail to decode at all.

回复收藏 0 原文

江南月 2024-12-07 09:57:16

你没有发疯。在所有关于字符串的事情中要记住的关键是，对于计算机来说，字符不存在，只有数字存在。字符、字符串、文本或类似的东西实际上不是通过存储数字来实现的（实际上这适用于所有数据类型：布尔值实际上是范围非常小的数字，枚举是内部数字等）。这就是为什么说一段数据代表“A”或任何其他字符是没有意义的，您必须知道周围代码采用的字符编码。

将字符串转换为字节数组正是发生在有意视角（“这应该打印为‘A’”）和内部视角（“此内存单元包含 65”）之间的边界处。因此，为了获得正确的结果，您必须根据几种可能的字符集之一在它们之间进行转换，并选择正确的一种。请注意，JDK 提供了不需要字符集名称的便捷方法，并且始终使用从平台和环境变量推导的默认字符集；但知道你在做什么并明确地声明字符集几乎总是一个更好的主意，而不是编写一些今天可以工作但当你在另一台机器上执行它时神秘地失败的东西。

回复收藏 0 原文