如何在 Python 中将长整型写为二进制?
在 Python 中,长整数具有无限的精度。我想将 16 字节(128 位)整数写入文件。标准库中的struct
仅支持最多 8 字节整数。 array
也有同样的限制。有没有办法在不屏蔽和移动每个整数的情况下做到这一点?
这里需要澄清一下:我正在写入一个将从非 Python 程序中读入的文件,所以 pickle 已经被淘汰了。全部 128 位均已使用。
In Python, long integers have unlimited precision. I would like to write a 16 byte (128 bit) integer to a file. struct
from the standard library supports only up to 8 byte integers. array
has the same limitation. Is there a way to do this without masking and shifting each integer?
Some clarification here: I'm writing to a file that's going to be read in from non-Python programs, so pickle is out. All 128 bits are used.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我认为对于无符号整数(并忽略字节序),类似的东西
在技术上可能满足非Python特定输出、不使用显式掩码以及(我假设)不使用任何非标准模块的要求。不过,并不是特别优雅。
I think for unsigned integers (and ignoring endianness) something like
might technically satisfy the requirements of having non-Python-specific output, not using an explicit mask, and (I assume) not using any non-standard modules. Not particularly elegant, though.
两种可能的解决方案:
只需 pickle 你的长整数。这将以特殊格式写入整数,以便再次读取它,如果这就是您想要的。
使用这个答案将长整型转换为大端字符串(如果您愿意,可以轻松更改为小端字符串),并将该字符串写入您的文件.
问题在于 bigint 的内部表示并不直接包含您要求的二进制数据。
Two possible solutions:
Just pickle your long integer. This will write the integer in a special format which allows it to be read again, if this is all you want.
Use the second code snippet in this answer to convert the long int to a big endian string (which can be easily changed to little endian if you prefer), and write this string to your file.
The problem is that the internal representation of bigints does not directly include the binary data you ask for.
PyPi bitarray 模块与内置
bin()
函数相结合对于简单而灵活的解决方案来说,这似乎是一个很好的组合。可以通过多行代码来控制字节顺序。您必须评估效率。
The PyPi bitarray module in combination with the builtin
bin()
function seems like a good combination for a solution that is simple and flexible.The endianness can be controlled with a few more lines of code. You'll have to evaluate the efficiency.
为什么不将 struct 与 unsigned long long 类型一起使用两次?
此处记录了这一点(向下滚动以获取带 Q 的表格): http://docs.python.org /library/struct.html
Why not use struct with the unsigned long long type twice?
That's documented here (scroll down to get the table with Q): http://docs.python.org/library/struct.html
这可能无法避免“屏蔽并移位每个整数”的要求。我不确定在 Python 长值的上下文中避免使用掩码和移位意味着什么。
字节如下:
然后,您可以使用 struct.pack( '16b', bytes ) 打包此字节列表
This may not avoid the "mask and shift each integer" requirement. I'm not sure that avoiding mask and shift means in the context of Python long values.
The bytes are these:
You can then pack this list of bytes using
struct.pack( '16b', bytes )
对于 Python 3.2 及更高版本,您可以使用
int.to_bytes
和int.from_bytes
:https://docs.python.org/3/library/stdtypes.html#int.to_bytesWith Python 3.2 and later, you can use
int.to_bytes
andint.from_bytes
: https://docs.python.org/3/library/stdtypes.html#int.to_bytes您可以将对象腌制为二进制,使用协议缓冲区(我不知道它们是否允许您序列化无限精度的整数)或 BSON(如果您不想编写代码)。
但是,如果时间要求不高,编写一个通过移位转储 16 字节整数的函数应该不难。
You could pickle the object to binary, use protocol buffers (I don't know if they allow you to serialize unlimited precision integers though) or BSON if you do not want to write code.
But writing a function that dumps 16 byte integers by shifting it should not be so hard to do if it's not time critical.
这可能有点晚了,但我不明白为什么你不能使用 struct:
bigint 本身被拒绝,但如果你用 &0xFFFFFFFFFFFFFFFF 屏蔽它,你可以将它减少到 8 字节 int 而不是 16。然后上部也被移动并遮盖。您可能需要稍微尝试一下字节顺序。我用的是!标记告诉它产生网络字节顺序。此外,msb 和 lsb(高字节和低字节)可能需要颠倒。我将把它作为练习留给用户来确定。我想说将数据保存为网络字节序会更安全,因此您始终知道数据的字节序是什么。
不,不要问我网络字节序是大字节序还是小字节序......
This may be a little late, but I don't see why you can't use struct:
The bigint by itself is rejected, but if you mask it with &0xFFFFFFFFFFFFFFFF you can reduce it to an 8 byte int instead of 16. Then the upper part is shifted and masked as well. You may have to play with byte ordering a bit. I used the ! mark to tell it to produce a network endian byte order. Also, the msb and lsb (upper and lower bytes) may need to be reversed. I will leave that as an exercise for the user to determine. I would say saving things as network endian would be safer so you always know what the endianess of your data is.
No, don't ask me if network endian is big or little endian...
根据@DSM的答案,为了支持负整数和不同的字节大小,我创建了以下改进的代码片段:
这将正确处理负整数并让用户设置字节数
Based on @DSM's answer, and to support negative integers and varying byte sizes, I've created the following improved snippet:
This will properly handle negative integers and let the user set the number of bytes