获取字符串、int 等二进制表示形式?
是否可以获取二进制格式的字符串、整数等?我的意思是,假设我有字符串:
“Hello”,并且我想以二进制格式存储它,所以假设“Hello”是
二进制格式的 11110000110011001111111100000000 (我不知道,我只是快速输入了一些内容)。
我可以不将上面的二进制文件存储为字符串,而是以带有位的实际格式存储吗?
除此之外,实际上是否可以存储少于8位的数据。我的意思是,如果字母 A 是文本中最常用的字母,我可以使用 1 位来存储它以进行压缩,而不是构建二叉树吗?
Is it possible to get strings, ints, etc in binary format? What I mean is that assume I have the string:
"Hello" and I want to store it in binary format, so assume "Hello" is
11110000110011001111111100000000 in binary (I know it not, I just typed something quickly).
Can I store the above binary not as a string, but in the actual format with the bits.
In addition to this, is it actually possible to store less than 8 bits. What I am getting at is if the letter A is the most frequent letter used in a text, can I use 1 bit to store it with regards to compression instead of building a binary tree.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您正在寻找类似于 霍夫曼编码 的内容,它用于用较短的位模式。
存储位代码的方式仍然仅限于整个字节。没有使用少于一个字节的数据类型。存储可变宽度位值的方法是将它们首尾相连地打包在字节数组中。这样你就有了一个位值流,但这也意味着你只能从头到尾读取流,不能像字节数组中的字节值那样随机访问这些值。
What you are looking for is something like Huffman coding, it's used to represent more common values with a shorter bit pattern.
How you store the bit codes is still limited to whole bytes. There is no data type that uses less than a byte. The way that you store variable width bit values is to pack them end to end in a byte array. That way you have a stream of bit values, but that also means that you can only read the stream from start to end, there is no random access to the values like you have with the byte values in a byte array.
您描述的算法称为霍夫曼编码。与您的示例相关,如果“A”在数据中频繁出现,则算法会将“A”简单地表示为 1。如果“B”也频繁出现(但频率低于 A),则算法通常会表示“B” ' 为 01。然后,其余字符将是 00xxxxx...等等。
本质上,该算法对数据进行统计分析并生成一个代码,为您提供最大的压缩。
The algorithm you're describing is known as Huffman coding. To relate to your example, if 'A' appears frequently in the data, then the algorithm will represent 'A' as simply 1. If 'B' also appears frequently (but less frequently than A), the algorithm usually would represent 'B' as 01. Then, the rest of the characters would be 00xxxxx... etc.
In essence, the algorithm performs statistical analysis on the data and generates a code that will give you the most compression.
您可以使用以下内容:
一旦拥有字节,您就可以进行所有您想要的操作。在我们为您提供更多有用信息之前,您需要某种算法。
You can use things like:
Once you have the bytes, you can do all the bit twiddling you want. You would need an algorithm of some sort before we can give you much more useful information.
字符串实际上以二进制格式存储,就像所有字符串一样。
字符串和其他数据类型之间的区别在于,当程序显示字符串时,它会检索二进制并显示相应的 (ASCII) 字符。
如果要以压缩格式存储数据,则需要为每个字符分配 1 位以上。您还能如何识别哪个字符最常见?
如果 1 代表“A”,那么 0 代表什么? 所有其他角色?
The string is actually stored in binary format, as are all strings.
The difference between a string and another data type is that when your program displays the string, it retrieves the binary and shows the corresponding (ASCII) characters.
If you were to store data in a compressed format, you would need to assign more than 1 bit per character. How else would you identify which character is the mose frequent?
If 1 represents an 'A', what does 0 mean? all the other characters?
是的。有几种不同的方法可以做到这一点。一种常见的方法是从字节数组中创建一个 MemoryStream,然后在该内存流之上创建一个 BinaryWriter,然后将整数、布尔值、字符、字符串等写入 BinaryWriter。这将用表示您写入的数据的字节填充数组。还有其他方法可以做到这一点。
当然,您可以存储字节数组。
不可以。C# 中的最小存储单位是字节。但是,有些类可以让您将字节数组视为位数组。您应该阅读有关 BitArray 类的内容。
Yes. There are several different methods for doing so. One common method is to make a MemoryStream out of an array of bytes, and then make a BinaryWriter on top of that memory stream, and then write ints, bools, chars, strings, whatever, to the BinaryWriter. That will fill the array with the bytes that represent the data you wrote. There are other ways to do this too.
Sure, you can store an array of bytes.
No. The smallest unit of storage in C# is a byte. However, there are classes that will let you treat an array of bytes as an array of bits. You should read about the BitArray class.
您会假设什么编码?
What encoding would you be assuming?