Python DataFrame操纵:如何快速提取一组列

发布于 2025-01-26 20:52:21 字数 2029 浏览 3 评论 0原文

我需要从研究小组中其他同事使用的数据框架中访问和提取信息。

数据帧结构是:

zee.loc[zee['layer']=='EMB2'].loc[zee['roi']==0]

            e           et       eta       phi      deta    dphi     samp    hash     det   layer roi   eventNumber
2249    20.677443   20.675829   0.0125  -1.067651   0.025   0.024544    3   2030015444  2   EMB2    0   2
2250    21.635288   21.633598   0.0125  -1.043107   0.025   0.024544    3   2030015445  2   EMB2    0   2
2251    -29.408310  -29.406013  0.0125  -1.018563   0.025   0.024544    3   2030015446  2   EMB2    0   2
2252    43.127533   43.124165   0.0125  -0.994020   0.025   0.024544    3   2030015447  2   EMB2    0   2
2253    -3.025344   -3.025108   0.0125  -0.969476   0.025   0.024544    3   2030015448  2   EMB2    0   2
... ... ... ... ... ... ... ... ... ... ... ... ...
4968988 -5.825550   -5.309279   0.4375  -0.454058   0.025   0.024544    3   2030019821  2   EMB2    0   3955
4968989 39.750645   36.227871   0.4375  -0.429515   0.025   0.024544    3   2030019822  2   EMB2    0   3955
4968990 80.568573   73.428436   0.4375  -0.404971   0.025   0.024544    3   2030019823  2   EMB2    0   3955
4968991 -28.921751  -26.358652  0.4375  -0.380427   0.025   0.024544    3   2030019824  2   EMB2    0   3955
4968992 55.599472   50.672146   0.4375  -0.355884   0.025   0.024544    3   2030019825  2   EMB2    0   3955

因此,我只需要与该层:EMB2和列:ET,ETA,PHI。要拿起这些列,我正在使用以下代码:

EtEtaPhi, EventLens  = [], []
events = set(zee.loc[zee['layer']=='EMB2']['eventNumber'].to_numpy())
roi    = set(zee.loc[zee['layer']=='EMB2']['roi'].to_numpy())
for ee in events:
    for rr in roi:
        if len(zee.loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr])==0: break       
        EtEtaPhi.append(zee[['et','eta','phi']].loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr].to_numpy())
        EventLens.append(len(EtEtaPhi[-1]))

但是要阅读4000个事件需要很长时间,每个事件几乎一秒钟。这个结果不好,将近一个小时仅用于提取这些列!

是否有一些方法可以更有效,更快地从数据框架中提取列?

I need to access and extract information from a Dataframe that is used for other colleagues in a research group.

The DataFrame structure is:

zee.loc[zee['layer']=='EMB2'].loc[zee['roi']==0]

            e           et       eta       phi      deta    dphi     samp    hash     det   layer roi   eventNumber
2249    20.677443   20.675829   0.0125  -1.067651   0.025   0.024544    3   2030015444  2   EMB2    0   2
2250    21.635288   21.633598   0.0125  -1.043107   0.025   0.024544    3   2030015445  2   EMB2    0   2
2251    -29.408310  -29.406013  0.0125  -1.018563   0.025   0.024544    3   2030015446  2   EMB2    0   2
2252    43.127533   43.124165   0.0125  -0.994020   0.025   0.024544    3   2030015447  2   EMB2    0   2
2253    -3.025344   -3.025108   0.0125  -0.969476   0.025   0.024544    3   2030015448  2   EMB2    0   2
... ... ... ... ... ... ... ... ... ... ... ... ...
4968988 -5.825550   -5.309279   0.4375  -0.454058   0.025   0.024544    3   2030019821  2   EMB2    0   3955
4968989 39.750645   36.227871   0.4375  -0.429515   0.025   0.024544    3   2030019822  2   EMB2    0   3955
4968990 80.568573   73.428436   0.4375  -0.404971   0.025   0.024544    3   2030019823  2   EMB2    0   3955
4968991 -28.921751  -26.358652  0.4375  -0.380427   0.025   0.024544    3   2030019824  2   EMB2    0   3955
4968992 55.599472   50.672146   0.4375  -0.355884   0.025   0.024544    3   2030019825  2   EMB2    0   3955

So, I need to work only with the layer: EMB2 and the columns: et, eta, phi. To pick up these columns, I'm using the following code:

EtEtaPhi, EventLens  = [], []
events = set(zee.loc[zee['layer']=='EMB2']['eventNumber'].to_numpy())
roi    = set(zee.loc[zee['layer']=='EMB2']['roi'].to_numpy())
for ee in events:
    for rr in roi:
        if len(zee.loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr])==0: break       
        EtEtaPhi.append(zee[['et','eta','phi']].loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr].to_numpy())
        EventLens.append(len(EtEtaPhi[-1]))

But to read 4000 events take so long time, almost one second per event. This result isn't good, almost one hour just to extract those columns!

Is there some way to extract columns from a DataFrame more efficiently and faster?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

逆夏时光 2025-02-02 20:52:21

代码

zee[['et','eta','phi']].loc[zee['layer']=='EMB2']

您已经在那里已经有某个地方的 应该做您要求的代码。其余的不需要。

The code

zee[['et','eta','phi']].loc[zee['layer']=='EMB2']

which you already have somewhere in there should do what you asked for. The rest is not needed.

永言不败 2025-02-02 20:52:21

只需使用.loc

sample = zee.loc[zee["layer"].eq("EMB2"), ["et","eta","phi"]]

Just use .loc:

sample = zee.loc[zee["layer"].eq("EMB2"), ["et","eta","phi"]]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文