pandas.read_table()错过第一行

发布于 2025-02-10 02:03:52 字数 2417 浏览 0 评论 0原文

我正在尝试使用pandas.read_table()提取XYZ数据。示例输入是:

    6
 i =     3231, time =      390.951, E =     -2300.2877174514
  O        25.8720962272       23.5487057768       25.9550094332
  H        26.7134918815       24.0996155532       25.8555927088
  H        25.8292636841       23.2306526549       26.9004259942
  O        36.9646620515       28.9501274283       25.1072617903
  H        37.8691423626       29.0116615687       25.5343998500
  H        36.2185832462       28.8942276136       25.7705056388
    6
 i =     3232, time =      391.072, E =     -2300.2946639751
  O        25.8723495329       23.5479993390       25.9555177002
  H        26.7142142389       24.1025173761       25.8565522557
  H        25.8306551175       23.2286729433       26.9006241724
  O        36.9645963995       28.9502158930       25.1069529796
  H        37.8719573287       29.0104227338       25.5340553516
  H        36.2186719315       28.8940372004       25.7719527369
    6
 i =     3233, time =      391.193, E =     -2300.3008277490
  O        25.8725995056       23.5472934823       25.9560188995
  H        26.7150201443       24.1054166977       25.8575713111
  H        25.8320275873       23.2266501926       26.9008810031
  O        36.9645310733       28.9503042170       25.1066361406
  H        37.8748229967       29.0091775522       25.5337615585
  H        36.2187349648       28.8938586630       25.7734429616

我的MWE是:

import numpy as np
import pandas as pd

count_steps=0
n_atoms=6
with open("./test.xyz",'r') as inputfile:
     for data_file in inputfile:
         for i in range(0, n_atoms):
             molecule = pd.read_table(inputfile, comment='#', skiprows=2, nrows=n_atoms, delim_whitespace=True, names=['atom', 'x', 'y', 'z'])
             count_steps += 1
             print(molecule)
inputfile.close()

XYZ格式是:

<number of atoms>
comment line
<element> <X> <Y> <Z>
...

我跳过前两行,并根据原子数读取其余数据。在运行上述片段时,我注意到输出中缺少第一行(即输入中的第三行):

  atom          x          y          z
0    H  26.713492  24.099616  25.855593
1    H  25.829264  23.230653  26.900426
2    O  36.964662  28.950127  25.107262
3    H  37.869142  29.011662  25.534400
4    H  36.218583  28.894228  25.770506
5    6        NaN        NaN        NaN

我无法弄清楚。我缺少一个read_table()参数吗?还是某种格式问题?

I am trying to extract xyz data using pandas.read_table(). The example input is:

    6
 i =     3231, time =      390.951, E =     -2300.2877174514
  O        25.8720962272       23.5487057768       25.9550094332
  H        26.7134918815       24.0996155532       25.8555927088
  H        25.8292636841       23.2306526549       26.9004259942
  O        36.9646620515       28.9501274283       25.1072617903
  H        37.8691423626       29.0116615687       25.5343998500
  H        36.2185832462       28.8942276136       25.7705056388
    6
 i =     3232, time =      391.072, E =     -2300.2946639751
  O        25.8723495329       23.5479993390       25.9555177002
  H        26.7142142389       24.1025173761       25.8565522557
  H        25.8306551175       23.2286729433       26.9006241724
  O        36.9645963995       28.9502158930       25.1069529796
  H        37.8719573287       29.0104227338       25.5340553516
  H        36.2186719315       28.8940372004       25.7719527369
    6
 i =     3233, time =      391.193, E =     -2300.3008277490
  O        25.8725995056       23.5472934823       25.9560188995
  H        26.7150201443       24.1054166977       25.8575713111
  H        25.8320275873       23.2266501926       26.9008810031
  O        36.9645310733       28.9503042170       25.1066361406
  H        37.8748229967       29.0091775522       25.5337615585
  H        36.2187349648       28.8938586630       25.7734429616

and my MWE is:

import numpy as np
import pandas as pd

count_steps=0
n_atoms=6
with open("./test.xyz",'r') as inputfile:
     for data_file in inputfile:
         for i in range(0, n_atoms):
             molecule = pd.read_table(inputfile, comment='#', skiprows=2, nrows=n_atoms, delim_whitespace=True, names=['atom', 'x', 'y', 'z'])
             count_steps += 1
             print(molecule)
inputfile.close()

The XYZ format is:

<number of atoms>
comment line
<element> <X> <Y> <Z>
...

So I skip the first two lines and read the rest of the data according to the number of atoms. While running the above snippet I noticed that I missing the first row (i.e., the third row in the input) in the output:

  atom          x          y          z
0    H  26.713492  24.099616  25.855593
1    H  25.829264  23.230653  26.900426
2    O  36.964662  28.950127  25.107262
3    H  37.869142  29.011662  25.534400
4    H  36.218583  28.894228  25.770506
5    6        NaN        NaN        NaN

which I am cannot figure out. Is there is a read_table() parameter I am missing or is it some sort of formatting issue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文