如何在 Python 中解压缩比特流?
我有一个使用运行长度编码(RLE)在 Python 中压缩给定比特流的函数。我现在希望能够解压缩该压缩的比特流。这是我压缩比特流的代码。
def RLEC(inputstream):
count = ""
result = ""
prev_char = ""
for i in range(len(inputstream)):
if inputstream[i] != prev_char:
result = result + str(count) + prev_char
prev_char = inputstream[i]
count = 1
else:
count += 1
else:
result += str(count) + prev_char
return result
如果我压缩比特流,例如 0111111111111111111111111110 将被压缩为 1026110。 我怎样才能解压它来给出我原来的答案?或者至少能够将数字分成几个部分,每个部分告诉我有多少位以及它是什么位?如果这是错误的,我应该使用什么格式来最大限度地提高比特流效率并能够解压缩/拆分为单独的部分?
I have a function that compresses a given bitstream in Python, using Run Length Encoding (RLE). I want to now be able to decompress that compressed bitstream. This is my code to compress the bitstream.
def RLEC(inputstream):
count = ""
result = ""
prev_char = ""
for i in range(len(inputstream)):
if inputstream[i] != prev_char:
result = result + str(count) + prev_char
prev_char = inputstream[i]
count = 1
else:
count += 1
else:
result += str(count) + prev_char
return result
If I compress the bitstream, for example 0111111111111111111111111110 would be compressed as 1026110.
How would I be able to decompress that to give me my original answer? Or at least be able to split the numbers into sections, each telling how many bits I have and what bit it is? If this is wrong, what format should I use to maximise bitstream efficiency and be able to decompress/split into separate sections?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如评论中所指出的,您的格式从根本上来说是有缺陷的,并且无法唯一地解压缩。您的示例
1026110
可能是 1 个 0、26 个 1 和 1 个 0,或者 它可能是 1026 个 1 和 1 个 0。或者只是按顺序读取,1 个 0,261 个 1,然后是其他内容。或者一个零然后 2611 个零。由于 1 和 0 必须交替,因此您根本不需要将它们放入输出中。相反,请考虑用句点来终止十进制数字,以使它们明确。那么您将得到
1.26.1.
,分别代表 1 个零、26 个 1 和 1 个零。约定是以零开始,因此如果流以一开始,压缩版本将以零计数开始。例如,0.26.1.
表示零个零、26 个一和一个零。最后,您解压缩的绝不是“比特流”。正如建议的那样,它是一个由十进制数字和终止符组成的 ASCII 字符字符串。这是相当低的空间效率(至少 2.4 倍),这是一个问题,因为目的是压缩。相反,您应该设计一种真正的比特流格式,它将游程长度编码为位而不是十进制数字。
As pointed out in the comments, your format is fundamentally flawed, and cannot be uniquely decompressed. Your example
1026110
could be one zero, 26 ones, and one zero, or it could be 1026 ones and one zero. Or just reading sequentially, one zero, 261 ones, followed by something else. Or one zero and then 2611 zeros.Since ones and zeros must alternate, you don't need to put those in the output at all. Instead consider terminating the decimal numbers with, say, a period to make them unambiguous. Then you would have
1.26.1.
, for one zero, 26 ones, and one zero. The convention would be to start with a zero, so if the stream starts with a one, the compressed version would start with a zero count. E.g.0.26.1.
for zero zeros, 26 ones, and one zero.Lastly, what you are decompressing is in no way a "bitstream". It is a string of ASCII characters as decimal digits and terminators, as suggested. This is rather space inefficient (by a factor of at least 2.4), which is an issue since the intent is compression. You should instead design a true bitstream format, which encodes the run lengths in bits instead of decimal digits.