用Python读取二进制文件

发布于 2025-02-12 21:16:54 字数 691 浏览 0 评论 0原文

我发现使用Python阅读二进制文件特别困难。你能帮我个忙吗? 我需要读取此文件,该文件在fortran 90中可以轻松

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

详细读取,文件格式是:

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N. 

如何使用Python读取此文件?我尝试了一切,但它从来没有起作用。我是否有可能在Python中使用F90程序,阅读此二进制文件,然后保存我需要使用的数据?

I find particularly difficult reading binary file with Python. Can you give me a hand?
I need to read this file, which in Fortran 90 is easily read by

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

In detail, the file format is:

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N. 

How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

反差帅 2025-02-19 21:16:54

以这样的方式读取二进制文件内容:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

然后使用 struct.unpack

:启动字节:struct.unpack(“ iiiii”,filecontent [:20])

正文:忽略标题字节和尾随字节(= 24);其余部分形成了身体,以了解体内的字节数量4;获得的商乘以字符串'i'来为拆卸方法创建正确的格式:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

end byte:struct.unpack(“ i”,filecontent [-4:])

Read the binary file content like this:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

then "unpack" binary data using struct.unpack:

The start bytes: struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

The end byte: struct.unpack("i", fileContent[-4:])

残龙傲雪 2025-02-19 21:16:54

从数据的0-3中读取二进制文件对象:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

创建int从数据的字节0-3

i = int.from_bytes(data[:4], byteorder='little', signed=False)

创建:从数据:

import struct
ints = struct.unpack('iiii', data[:16])

To read a binary file to a bytes object:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

To create an int from bytes 0-3 of the data:

i = int.from_bytes(data[:4], byteorder='little', signed=False)

To unpack multiple ints from the data:

import struct
ints = struct.unpack('iiii', data[:16])
梦中楼上月下 2025-02-19 21:16:54

通常,我建议您考虑使用Python的 struct 模块。它是Python的标准配置,应该很容易将您的问题规范转换为适合struct.unpack()的格式字符串。

请注意,如果字段之间/周围周围有“隐形”填充,则需要弄清楚并将其包含在unvack()呼叫中,否则您将读取错误的位。

读取文件的内容以拆开包装是很微不足道的:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

这将拆开前两个字段,假设它们是在文件的开始(没有填充或无关数据的)开始的,并且也假设本机字节订单( @符号)。格式字符串中的i表示“未签名整数,32位”。

In general, I would recommend that you look into using Python's struct module for this. It's standard with Python, and it should be easy to translate your question's specification into a formatting string suitable for struct.unpack().

Do note that if there's "invisible" padding between/around the fields, you will need to figure that out and include it in the unpack() call, or you will read the wrong bits.

Reading the contents of the file in order to have something to unpack is pretty trivial:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the @ symbol). The Is in the formatting string mean "unsigned integer, 32 bits".

临走之时 2025-02-19 21:16:54

You could use numpy.fromfile, which can read data from both text and binary files. You would first construct a data type, which represents your file format, using numpy.dtype, and then read this type from file using numpy.fromfile.

清引 2025-02-19 21:16:54

我也发现python在阅读和编写二进制文件时缺乏,所以我编写了一个小模块(对于Python 3.6+)。

使用 binaryfile 您会做这样的事情(我猜我不知道, fortran):

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

产生这样的输出:

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

我使用skip()跳过其他数据fortran添加,但是您可能需要添加一个实用程序来正确处理fortran记录。如果这样做,欢迎拉动请求。

I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).

With binaryfile you'd do something like this (I'm guessing, since I don't know Fortran):

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

Which produces an output like this:

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.

羁〃客ぐ 2025-02-19 21:16:54

如果数据类似于数组,我喜欢使用加载它。

这是一个示例,该示例从64个通道中加载1000个样本,并存储为两个字节整数。

import numpy as np
mm = np.memmap(filename, np.int16, 'r', shape=(1000, 64))

然后,您可以将数据切成任一轴:

mm[5, :] # sample 5, all channels
mm[:, 5] # all samples, channel 5

所有常用格式都可用,包括c-和fortran-order,各种dtypes和endianness等。

这种方法的某些优点:

  • 直到您实际使用它为止,没有将数据加载到内存中(这就是memmap的目的)。
  • 更直观的语法(无需生成由6400​​0个字符组成的struct.unpack字符串)
  • 可以为您的应用程序提供任何形状,

。对于非阵列数据(例如编译代码),异质格式(“ 10个字符,然后是3个int,然后是5个浮子,...”)或类似的方法,上面给出的其他方法之一可能更有意义。

If the data is array-like, I like to use numpy.memmap to load it.

Here's an example that loads 1000 samples from 64 channels, stored as two-byte integers.

import numpy as np
mm = np.memmap(filename, np.int16, 'r', shape=(1000, 64))

You can then slice the data along either axis:

mm[5, :] # sample 5, all channels
mm[:, 5] # all samples, channel 5

All the usual formats are available, including C- and Fortran-order, various dtypes and endianness, etc.

Some advantages of this approach:

  • No data is loaded into memory until you actually use it (that's what a memmap is for).
  • More intuitive syntax (no need to generate a struct.unpack string consisting of 64000 character)
  • Data can be given any shape that makes sense for your application.

For non-array data (e.g., compiled code), heterogeneous formats ("10 chars, then 3 ints, then 5 floats, ..."), or similar, one of the other approaches given above probably makes more sense.

深海里的那抹蓝 2025-02-19 21:16:54
#!/usr/bin/python

import array
data = array.array('f')
f = open('c:\\code\\c_code\\no1.dat', 'rb')
data.fromfile(f, 5)
print(data)
#!/usr/bin/python

import array
data = array.array('f')
f = open('c:\\code\\c_code\\no1.dat', 'rb')
data.fromfile(f, 5)
print(data)
东北女汉子 2025-02-19 21:16:54
import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()
import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文