如何对一串 1 和 0 进行编码以进行传输?
对于遗传算法应用程序,我使用了一整堆二进制字符串。大多数时候,它们实际上采用01001010110
的形式,这样它们就可以交配、变异和“交叉”。
然而对于运输和储存来说,这似乎是浪费的。将其编码为较短字符串的最简单方法是什么?
我猜这是相当微不足道的,但我不知道从哪里开始寻找。
更新:我实际上需要以另一个字符串结尾:其中一个传输请求将是 GET 请求。
For a genetic algorithm application, I'm using a whole load of binary strings. Most of the time they literally take the form of 01001010110
, so that they can be mated, mutated and "crossed-over".
For transport and storage however, this seems wasteful. What's the simplest way to encode this as a shorter string?
I'm guessing this is pretty trivial, but I'm not sure where to start looking.
Update: I actually need to end up with another string: one of the transport requests will be GET requests.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
最简单的方法是将每个数字视为一个位。每组8位可以存储在一个字节中。然后您可以将其作为字节流发送。您还需要存储原始字符串的长度,以便可以区分“0”和“00”。
这是编写从字符串到字节数组的转换的一种方法:
反转操作非常相似。
如果您需要将数据作为字符串传输,您可以Base 64 编码结果字节数组。
您可能还想考虑将其以这种形式保存在内存中。这比将其存储为字符串(每个数字存储为 2 字节字符)要高效得多。您使用的内存大约是存储数据所需内存的 16 倍。缺点是这种形式使用起来稍微困难一些,所以如果你有足够的内存,那么你当前所做的可能就很好。
The simplest would be to take each digit and treat it as a bit. Each group of 8 bits can be stored in a byte. Then you can send it as a stream of bytes. You will also need to store the length of the original string so that you can distinguish between "0" and "00".
Here is one way you could write the conversion from string to a byte array:
Reversing the operation is very similar.
If you need to transmit the data as a string you can base 64 encode the resulting byte array.
You may also want to consider keeping it in this form in memory too. This will be much more efficient than storing it as a string where each digit is stored as a 2 byte character. You are using roughly 16 times more memory than you need to for storing your data. The disadvtange is that it is slightly more difficult to use in this form, so if you have enough memory then what you are currently doing might be just fine.
将其转换为以 10 为基数的整数怎么样?
Convert.ToInt32() 文档
What about converting it to it's base 10 integer equivalent?
Convert.ToInt32() documentation
我只是将它们存储为字节数组,并使用辅助函数在字节数组版本和字符串版本之间进行转换。
I would just store them as an array of bytes and use a helper function to translate between the byte array version and the string version.
或者实施运行长度编码或霍夫曼编码。两者都相当容易实现。 RLE 是迄今为止最简单的,但在大多数情况下压缩比较差。如果您的数据通常具有许多相同值的连续字符,它仍然可以提供实质性的改进。
Or implement Run length encoding or Huffman coding. Both are fairly easy to implement. RLE is by far the easiest, but will in most cases have worse compression ratio. If your data typically has many consecutive characters of the same value, it could still provide a substantial improvement.
Abe Miessler 的答案是一个很好的答案,但在评论中提到了警告。
如果 64 位不足以表示您的字符串,请考虑使用 BigInt 类
http://www.codeproject.com/KB/cs/BigInt.aspx (您可能想要向其中添加
to/fromBinary()
扩展方法。或者将其表示为 ... 字节链接列表。这两种方法都存在丢弃任何前导零的问题,因此您也想存储原始长度。
Abe Miessler's answer is a good one, but with caveat mentioned in comments.
If 64 bits is not enough to represent your string, then consider using a
BigInt
classhttp://www.codeproject.com/KB/cs/BigInt.aspx (you would probably want to add
to/fromBinary()
extension methods to it. Alternatively represent this as a ... linked list of bytes.Either approach has the problem of discarding any leading zeroes, so you would want to store the original length as well.