pandas.read_table（）错过第一行

发布于 2025-02-10 02:03:52 字数 2417 浏览 0 评论 0原文

我正在尝试使用pandas.read_table（）提取XYZ数据。示例输入是：

    6
 i =     3231, time =      390.951, E =     -2300.2877174514
  O        25.8720962272       23.5487057768       25.9550094332
  H        26.7134918815       24.0996155532       25.8555927088
  H        25.8292636841       23.2306526549       26.9004259942
  O        36.9646620515       28.9501274283       25.1072617903
  H        37.8691423626       29.0116615687       25.5343998500
  H        36.2185832462       28.8942276136       25.7705056388
    6
 i =     3232, time =      391.072, E =     -2300.2946639751
  O        25.8723495329       23.5479993390       25.9555177002
  H        26.7142142389       24.1025173761       25.8565522557
  H        25.8306551175       23.2286729433       26.9006241724
  O        36.9645963995       28.9502158930       25.1069529796
  H        37.8719573287       29.0104227338       25.5340553516
  H        36.2186719315       28.8940372004       25.7719527369
    6
 i =     3233, time =      391.193, E =     -2300.3008277490
  O        25.8725995056       23.5472934823       25.9560188995
  H        26.7150201443       24.1054166977       25.8575713111
  H        25.8320275873       23.2266501926       26.9008810031
  O        36.9645310733       28.9503042170       25.1066361406
  H        37.8748229967       29.0091775522       25.5337615585
  H        36.2187349648       28.8938586630       25.7734429616

我的MWE是：

import numpy as np
import pandas as pd

count_steps=0
n_atoms=6
with open("./test.xyz",'r') as inputfile:
     for data_file in inputfile:
         for i in range(0, n_atoms):
             molecule = pd.read_table(inputfile, comment='#', skiprows=2, nrows=n_atoms, delim_whitespace=True, names=['atom', 'x', 'y', 'z'])
             count_steps += 1
             print(molecule)
inputfile.close()

XYZ格式是：

<number of atoms>
comment line
<element> <X> <Y> <Z>
...

我跳过前两行，并根据原子数读取其余数据。在运行上述片段时，我注意到输出中缺少第一行（即输入中的第三行）：

  atom          x          y          z
0    H  26.713492  24.099616  25.855593
1    H  25.829264  23.230653  26.900426
2    O  36.964662  28.950127  25.107262
3    H  37.869142  29.011662  25.534400
4    H  36.218583  28.894228  25.770506
5    6        NaN        NaN        NaN

我无法弄清楚。我缺少一个read_table（）参数吗？还是某种格式问题？

原文

I am trying to extract xyz data using pandas.read_table(). The example input is:

    6
 i =     3231, time =      390.951, E =     -2300.2877174514
  O        25.8720962272       23.5487057768       25.9550094332
  H        26.7134918815       24.0996155532       25.8555927088
  H        25.8292636841       23.2306526549       26.9004259942
  O        36.9646620515       28.9501274283       25.1072617903
  H        37.8691423626       29.0116615687       25.5343998500
  H        36.2185832462       28.8942276136       25.7705056388
    6
 i =     3232, time =      391.072, E =     -2300.2946639751
  O        25.8723495329       23.5479993390       25.9555177002
  H        26.7142142389       24.1025173761       25.8565522557
  H        25.8306551175       23.2286729433       26.9006241724
  O        36.9645963995       28.9502158930       25.1069529796
  H        37.8719573287       29.0104227338       25.5340553516
  H        36.2186719315       28.8940372004       25.7719527369
    6
 i =     3233, time =      391.193, E =     -2300.3008277490
  O        25.8725995056       23.5472934823       25.9560188995
  H        26.7150201443       24.1054166977       25.8575713111
  H        25.8320275873       23.2266501926       26.9008810031
  O        36.9645310733       28.9503042170       25.1066361406
  H        37.8748229967       29.0091775522       25.5337615585
  H        36.2187349648       28.8938586630       25.7734429616

and my MWE is:

import numpy as np
import pandas as pd

count_steps=0
n_atoms=6
with open("./test.xyz",'r') as inputfile:
     for data_file in inputfile:
         for i in range(0, n_atoms):
             molecule = pd.read_table(inputfile, comment='#', skiprows=2, nrows=n_atoms, delim_whitespace=True, names=['atom', 'x', 'y', 'z'])
             count_steps += 1
             print(molecule)
inputfile.close()

The XYZ format is:

<number of atoms>
comment line
<element> <X> <Y> <Z>
...

So I skip the first two lines and read the rest of the data according to the number of atoms. While running the above snippet I noticed that I missing the first row (i.e., the third row in the input) in the output:

  atom          x          y          z
0    H  26.713492  24.099616  25.855593
1    H  25.829264  23.230653  26.900426
2    O  36.964662  28.950127  25.107262
3    H  37.869142  29.011662  25.534400
4    H  36.218583  28.894228  25.770506
5    6        NaN        NaN        NaN

which I am cannot figure out. Is there is a read_table() parameter I am missing or is it some sort of formatting issue?

分享到QQ

分享到微博