将字符串转换为字节数组时会发生什么
我认为这是一个新手类型的问题,但我已经很理解了。
我可以找到很多关于如何用各种语言将字符串转换为字节数组的帖子。
我不明白的是逐个字符地发生了什么。 据我所知,屏幕上显示的每个字符都由一个数字表示,例如它的 ascii 代码。 (我们现在可以坚持使用 ASCII,这样我就可以在概念上得到这个:-))
这是否意味着当我想要表示一个字符或一个字符串(这是一个字符列表)时,会发生以下情况:
将字符转换为 ASCII 值 >将 ascii 值表示为二进制?
我见过通过将字节数组定义为输入字符串长度的 1/2 来创建字节数组的代码,因此字节数组肯定与字符串长度相同吗?
所以我有点困惑。 基本上我试图将一个字符串值存储到 ColdFusion 中的字节数组中,我看不到它有一个显式的字符串到字节数组函数。
然而,我可以了解底层的java,但我需要知道理论层面上发生了什么。
预先感谢,如果您认为我疯了,请告诉我!
格斯
I think that this is a newbie type question but I have quite understood this.
I can find many posts on how to convert a string to a byte array in various languages.
What I do not understand is what is happening at a character by character basis.
I understand that each character displayed on the screen is represented by a number such as it's ascii code. (Can we stick with ASCII at the moment so I get this conceptually :-))
Does this mean that when I want to represent a character or a string (which is a list of chartacters) the following occurs
Convert character to ASCII value > represent ascii value as binary?
I have seen code that creates Byte arrays by defining the byte array as 1/2 the length of the input string so surely a byte array would be the same length of string?
So I am a little confused.
Basically I am trying to store a sting value into a byte array in ColdFusion which I cannot see has an explicit string to byte array function.
However I can get to the underlying java but I need to know whats happening at the theoretical level.
Thanks in advance and please tell me nicely if you think I am barking mad !!
Gus
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 Java 中,字符串存储为 16 位 char 值的数组。字符串中的每个 Unicode 字符在数组中存储为一个或(很少)两个
char
值。如果要将一些字符串数据存储在
byte
数组中,您需要能够将字符串的 Unicode 字符转换为字节序列。这个过程称为编码,有多种方法可以实现,每种方法具有不同的规则和结果。如果两段代码想要使用字节数组共享字符串数据,它们需要就使用哪种编码达成一致。例如,假设我们有一个字符串
s
,我们希望使用 UTF-8 编码。 UTF-8 具有一个方便的属性,即如果使用它对仅包含 ASCII 字符的字符串进行编码,则输入中的每个字符都会使用该字符的 ASCII 值转换为单个字节。我们可以将 Java 字符串转换为 Java 字节数组,如下所示:字节数组
bytes
现在包含来自s
的字符串数据,使用 UTF-8 编码将其编码为字节。现在,我们在某处存储或传输字节,另一端的代码想要将字节解码回 Java
String
。它将执行如下操作:假设没有出现任何问题,字符串
t
现在包含与原始字符串s
相同的字符串数据。请注意,两段代码必须就所使用的编码达成一致。如果他们不同意,生成的字符串可能最终包含垃圾,或者甚至可能无法解码全部。
In Java, strings are stored as an array of 16-bit
char
values. Each Unicode character in the string is stored as one or (rarely) twochar
values in the array.If you want to store some string data in a
byte
array, you will need to be able to convert the string's Unicode characters into a sequence of bytes. This process is called encoding and there are several ways to do it, each with different rules and results. If two pieces of code want to share string data using byte arrays, they need to agree on which encoding is being used.For example, suppose we have a string
s
that we want to encode using the UTF-8 encoding. UTF-8 has the convenient property that if you use it to encode a string that contains only ASCII characters, every character in the input gets converted to a single byte with that character's ASCII value. We might convert our Java string to a Java byte array as follows:The byte array
bytes
now contains the string data froms
, encoded into bytes using the UTF-8 encoding.Now, we store or transmit the bytes somewhere, and the code on the other end wants to decode the bytes back into a Java
String
. It will do something like the following:Assuming nothing went wrong, the string
t
now contains the same string data as the original strings
.Note that both pieces of code had to agree on what encoding was being used. If they disagreed, the resulting string might end up containing garbage, or might even fail to decode at all.
你没有发疯。在所有关于字符串的事情中要记住的关键是,对于计算机来说,字符不存在,只有数字存在。字符、字符串、文本或类似的东西实际上不是通过存储数字来实现的(实际上这适用于所有数据类型:布尔值实际上是范围非常小的数字,枚举是内部数字等)。这就是为什么说一段数据代表“A”或任何其他字符是没有意义的,您必须知道周围代码采用的字符编码。
将字符串转换为字节数组正是发生在有意视角(“这应该打印为‘A’”)和内部视角(“此内存单元包含 65”)之间的边界处。因此,为了获得正确的结果,您必须根据几种可能的字符集之一在它们之间进行转换,并选择正确的一种。请注意,JDK 提供了不需要字符集名称的便捷方法,并且始终使用从平台和环境变量推导的默认字符集;但知道你在做什么并明确地声明字符集几乎总是一个更好的主意,而不是编写一些今天可以工作但当你在另一台机器上执行它时神秘地失败的东西。
You are not barking mad. The key to remember in all matters String, is that to the computer, characters do not exist, only numbers exist. There is no such thing as a character, String, text or similar that isn't actually implemented through storing numbers (actually that goes for all data types: booleans are really numbers with very small range, enums are internally numbers, etc.) This is why it is meaningless to say that a piece of data represents "A" or any other character, you must know what character encoding the surrounding code assumes.
Converting Strings into byte arrays occurs precisely at this boundary between the intentional perspective ("This should print as 'A'") and the internal perspective ("This memory cell contains a 65"). Therefore, to get the right result, you must convert between them according to one of several possible character sets, and choose the right one. Note that the JDK offers convenience methods that do not require a charset name and always use the default charset deduced from your platform and environment variables; but it is almost always a better idea to know what you're doing and state the charset explicitly, rather than code something that works today and mysteriously fails when you execute it on another machine.
字符串根据字符集编码为字节数组。
字符集可以将字符编码为或多或少的位,然后编码为字节。
例如,如果您必须仅显示密码(10 个不同的字符),您可以使用定义每个字符 4 位的字符集,获得每个字节 2 个字符的表示形式。
在 String 到 byteArray 编码器中,通常会默认选择操作系统的字符集。
要获取字符串,您必须使用相同的字符集对该字符串进行解码。
String is encoded into bytearray according to a Charset.
A charset can encode a char into more or less bits and then, bytes.
For example if you have to display only ciphres (10 different charcters) you may use a charset defining 4 bits per character, obtaining a 2 characters per byte representation.
Charset of the OS is often choosed by default in String to byteArray encoders.
To obtain back the string you have to decode that string with the same charset.