使用 Python scipy.io 读取未格式化的 Fortran 文件

发布于 2025-01-11 02:16:02 字数 1872 浏览 4 评论 0原文

目前我正在开发一个可以读取 Fortran 文件的 Python 模块。第一条记录有固定的开头：

recl = ['a20', 'i4', 'a20', 'a6', 'a1', 'i4', 'i4', 'a1', 'i4']

这是可能的最小大小。然而，recl 的长度未知。已知的是，将添加更多 ('i4', 'i4') 块。所以下一个可能的大小是

recl = ['a20', 'i4', 'a20', 'a6', 'a1', 'i4', 'i4', 'a1', 'i4', "i4", "i4"]

等。我的想法如下：

def read_ff(fname):
    from scipy.io import FortranFile, FortranEOFError, FortranFormattingError.

    f = FortranFile(fname, "r")
    recl = ["a8", "i4", "a6", "a1", "i4", "i4", "a1", "i4"] #RECL = integer-value must be present for DIRECT files. 
                                                            #It is the length in bytes of each record in the file. 
    while True :                                            # unknown length of the recl due to varying electron configuration
        try:                                                # it is added up until record can be read
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
        except FortranEOFError:
            break
        except FortranFormattingError:
            break
    dim = f.read_ints("i4") #nrow, ncol, nzc1  
    matrix = f.read_record("f8").reshape((dim[2],dim[0],dim[1])) #reshape((nzc1,nrows,ncols))

    return record, matrix

但是，从第二次循环运行开始，不再执行 f.read_record(*recl) 命令，而是执行 ValueError 直接返回，并在 except 语句中扩展了 recl，以便循环无限期地运行。

如果您在开始时点击相应的recl，则会读取该文件。所以我认为用于创建该文件的 Fortran 代码是无关紧要的，但如果您愿意，我可以提供它。

我期待您的意见和建议。

对于我们这些对化学有兴趣的人来说：第一个记录描述了带有图片变化误差校正因子的数据集，如下所示：'DFT/PBE ',[6],'cc-pVDZ ','point ' ，'C'，[4]，[0]，'O'，[1]，[0]，[2]。

未知长度的条目是完成计算的电子构型，其结构如下：“C”：闭壳 i,j：对称/不对称轨道中的 i/j 个电子，“O”：开壳 n： 0 表示没有电子，n >=1 表示 n * ("i4", "i4") 在开壳中。

原文

At the moment I am working on a Python module that is supposed to read a Fortran file. The first record has a fixed beginning:

recl = ['a20', 'i4', 'a20', 'a6', 'a1', 'i4', 'i4', 'a1', 'i4']

This is the smallest possible size. However, the recl has an unknown length. What is known is that more ('i4', 'i4') blocks will be added.
So the next possible size is

recl = ['a20', 'i4', 'a20', 'a6', 'a1', 'i4', 'i4', 'a1', 'i4', "i4", "i4"]

etc. My idea is the following :

def read_ff(fname):
    from scipy.io import FortranFile, FortranEOFError, FortranFormattingError.

    f = FortranFile(fname, "r")
    recl = ["a8", "i4", "a6", "a1", "i4", "i4", "a1", "i4"] #RECL = integer-value must be present for DIRECT files. 
                                                            #It is the length in bytes of each record in the file. 
    while True :                                            # unknown length of the recl due to varying electron configuration
        try:                                                # it is added up until record can be read
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
        except FortranEOFError:
            break
        except FortranFormattingError:
            break
    dim = f.read_ints("i4") #nrow, ncol, nzc1  
    matrix = f.read_record("f8").reshape((dim[2],dim[0],dim[1])) #reshape((nzc1,nrows,ncols))

    return record, matrix

However, from the second run of the loop, the f.read_record(*recl) command is no longer executed, but the ValueError is returned directly and the recl is extended in the except statement so that the loop runs indefinitely.

If you hit the appropriate recl right at the beginning, the file is read. So I think the Fortran code used to create the file is irrelevant, but I can provide it if you like.

I look forward to your comments and advice.

For those of us with an affinity for chemistry: The first record describes the data set with a correction factor for the picture change error and looks like this: 'DFT/PBE ',[6],'cc-pVDZ ','point ','C',[4],[0], 'O',[1],[0],[2].

The entry of the unknown length is the electron configuration with which the calculation was done and is structured like this: "C" : closed shell i,j : i/j electrons in symmetrical / asymmetrical orbital, "O" : open shell n : 0 for no electrons, n >=1 for n * ("i4", "i4") in open shells.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

只有影子陪我不离不弃 2025-01-18 02:16:02

即使尝试失败，第一次尝试读取记录也会使文件指针前进。如果您要重试，则必须重置文件指针。尝试如下操作：

    while True :
        try:
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
            f._fp.seek(0)  # Reset the file pointer to the beginning of the file.
        except FortranEOFError:
            break
        except FortranFormattingError:
            break

请注意，属性 _fp 的前导下划线表示它是私有属性。无法保证此属性在 SciPy 的未来版本中不会更改，因此使用时需自行承担风险。

或者，您可以关闭并重新打开该文件：

    while True :                                            # unknown length of the recl due to varying electron configuration
        try:                                                # it is added up until record can be read
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
            f.close()
            f = FortranFile(fname, "r")
        except FortranEOFError:
            break
        except FortranFormattingError:
            break

The first attempt to read a record advances the file pointer even if the attempt fails. You'll have to reset the file pointer if you are going to try again. Try something like this:

    while True :
        try:
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
            f._fp.seek(0)  # Reset the file pointer to the beginning of the file.
        except FortranEOFError:
            break
        except FortranFormattingError:
            break

Note that the leading underscore of the attribute _fp indicates that it is intended to be a private attribute. There is no guarantee that this attribute won't change in future versions of SciPy, so use at your own risk.

Alternatively, you could close and reopen the file:

    while True :                                            # unknown length of the recl due to varying electron configuration
        try:                                                # it is added up until record can be read
            record = f.read_record(*recl)
            break        
        except ValueError:
            recl += "i4", "i4"
            f.close()
            f = FortranFile(fname, "r")
        except FortranEOFError:
            break
        except FortranFormattingError:
            break

回复收藏 0 原文

~没有更多了~