用Python读取二进制文件
我发现使用Python阅读二进制文件特别困难。你能帮我个忙吗? 我需要读取此文件,该文件在fortran 90中可以轻松
int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)
详细读取,文件格式是:
Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N.
如何使用Python读取此文件?我尝试了一切,但它从来没有起作用。我是否有可能在Python中使用F90程序,阅读此二进制文件,然后保存我需要使用的数据?
I find particularly difficult reading binary file with Python. Can you give me a hand?
I need to read this file, which in Fortran 90 is easily read by
int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)
In detail, the file format is:
Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N.
How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
以这样的方式读取二进制文件内容:
然后使用 struct.unpack
:启动字节:
struct.unpack(“ iiiii”,filecontent [:20])
正文:忽略标题字节和尾随字节(= 24);其余部分形成了身体,以了解体内的字节数量4;获得的商乘以字符串
'i'
来为拆卸方法创建正确的格式:end byte:
struct.unpack(“ i”,filecontent [-4:])
Read the binary file content like this:
then "unpack" binary data using struct.unpack:
The start bytes:
struct.unpack("iiiii", fileContent[:20])
The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string
'i'
to create the correct format for the unpack method:The end byte:
struct.unpack("i", fileContent[-4:])
从数据的0-3中读取二进制文件对象:
创建
int
从数据的字节0-3创建:从数据:
pathlib
int.from_bytes()
struct> struct
To read a binary file to a
bytes
object:To create an
int
from bytes 0-3 of the data:To unpack multiple
int
s from the data:pathlib
int.from_bytes()
struct
通常,我建议您考虑使用Python的 struct 模块。它是Python的标准配置,应该很容易将您的问题规范转换为适合
struct.unpack()
的格式字符串。请注意,如果字段之间/周围周围有“隐形”填充,则需要弄清楚并将其包含在
unvack()
呼叫中,否则您将读取错误的位。读取文件的内容以拆开包装是很微不足道的:
这将拆开前两个字段,假设它们是在文件的开始(没有填充或无关数据的)开始的,并且也假设本机字节订单(
@
符号)。格式字符串中的i
表示“未签名整数,32位”。In general, I would recommend that you look into using Python's struct module for this. It's standard with Python, and it should be easy to translate your question's specification into a formatting string suitable for
struct.unpack()
.Do note that if there's "invisible" padding between/around the fields, you will need to figure that out and include it in the
unpack()
call, or you will read the wrong bits.Reading the contents of the file in order to have something to unpack is pretty trivial:
This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the
@
symbol). TheI
s in the formatting string mean "unsigned integer, 32 bits".您可以使用
,可以从文本和二进制文件中读取数据。您将首先构建一个数据类型,该数据类型代表您的文件格式,使用“ noreferrer”>numpy.numpy.forompy.fromfile
numpy.dtype
,然后使用文件读取此类型numpy.fromfile
。You could use
numpy.fromfile
, which can read data from both text and binary files. You would first construct a data type, which represents your file format, usingnumpy.dtype
, and then read this type from file usingnumpy.fromfile
.我也发现python在阅读和编写二进制文件时缺乏,所以我编写了一个小模块(对于Python 3.6+)。
使用 binaryfile 您会做这样的事情(我猜我不知道, fortran):
产生这样的输出:
我使用skip()跳过其他数据fortran添加,但是您可能需要添加一个实用程序来正确处理fortran记录。如果这样做,欢迎拉动请求。
I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).
With binaryfile you'd do something like this (I'm guessing, since I don't know Fortran):
Which produces an output like this:
I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.
如果数据类似于数组,我喜欢使用加载它。
这是一个示例,该示例从64个通道中加载1000个样本,并存储为两个字节整数。
然后,您可以将数据切成任一轴:
所有常用格式都可用,包括c-和fortran-order,各种dtypes和endianness等。
这种方法的某些优点:
。对于非阵列数据(例如编译代码),异质格式(“ 10个字符,然后是3个int,然后是5个浮子,...”)或类似的方法,上面给出的其他方法之一可能更有意义。
If the data is array-like, I like to use numpy.memmap to load it.
Here's an example that loads 1000 samples from 64 channels, stored as two-byte integers.
You can then slice the data along either axis:
All the usual formats are available, including C- and Fortran-order, various dtypes and endianness, etc.
Some advantages of this approach:
For non-array data (e.g., compiled code), heterogeneous formats ("10 chars, then 3 ints, then 5 floats, ..."), or similar, one of the other approaches given above probably makes more sense.