Python DataFrame操纵：如何快速提取一组列

发布于 2025-01-26 20:52:21 字数 2029 浏览 3 评论 0原文

我需要从研究小组中其他同事使用的数据框架中访问和提取信息。

数据帧结构是：

zee.loc[zee['layer']=='EMB2'].loc[zee['roi']==0]

            e           et       eta       phi      deta    dphi     samp    hash     det   layer roi   eventNumber
2249    20.677443   20.675829   0.0125  -1.067651   0.025   0.024544    3   2030015444  2   EMB2    0   2
2250    21.635288   21.633598   0.0125  -1.043107   0.025   0.024544    3   2030015445  2   EMB2    0   2
2251    -29.408310  -29.406013  0.0125  -1.018563   0.025   0.024544    3   2030015446  2   EMB2    0   2
2252    43.127533   43.124165   0.0125  -0.994020   0.025   0.024544    3   2030015447  2   EMB2    0   2
2253    -3.025344   -3.025108   0.0125  -0.969476   0.025   0.024544    3   2030015448  2   EMB2    0   2
... ... ... ... ... ... ... ... ... ... ... ... ...
4968988 -5.825550   -5.309279   0.4375  -0.454058   0.025   0.024544    3   2030019821  2   EMB2    0   3955
4968989 39.750645   36.227871   0.4375  -0.429515   0.025   0.024544    3   2030019822  2   EMB2    0   3955
4968990 80.568573   73.428436   0.4375  -0.404971   0.025   0.024544    3   2030019823  2   EMB2    0   3955
4968991 -28.921751  -26.358652  0.4375  -0.380427   0.025   0.024544    3   2030019824  2   EMB2    0   3955
4968992 55.599472   50.672146   0.4375  -0.355884   0.025   0.024544    3   2030019825  2   EMB2    0   3955

因此，我只需要与该层：EMB2和列：ET，ETA，PHI。要拿起这些列，我正在使用以下代码：

EtEtaPhi, EventLens  = [], []
events = set(zee.loc[zee['layer']=='EMB2']['eventNumber'].to_numpy())
roi    = set(zee.loc[zee['layer']=='EMB2']['roi'].to_numpy())
for ee in events:
    for rr in roi:
        if len(zee.loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr])==0: break       
        EtEtaPhi.append(zee[['et','eta','phi']].loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr].to_numpy())
        EventLens.append(len(EtEtaPhi[-1]))

但是要阅读4000个事件需要很长时间，每个事件几乎一秒钟。这个结果不好，将近一个小时仅用于提取这些列！

是否有一些方法可以更有效，更快地从数据框架中提取列？

原文

I need to access and extract information from a Dataframe that is used for other colleagues in a research group.

The DataFrame structure is:

zee.loc[zee['layer']=='EMB2'].loc[zee['roi']==0]

            e           et       eta       phi      deta    dphi     samp    hash     det   layer roi   eventNumber
2249    20.677443   20.675829   0.0125  -1.067651   0.025   0.024544    3   2030015444  2   EMB2    0   2
2250    21.635288   21.633598   0.0125  -1.043107   0.025   0.024544    3   2030015445  2   EMB2    0   2
2251    -29.408310  -29.406013  0.0125  -1.018563   0.025   0.024544    3   2030015446  2   EMB2    0   2
2252    43.127533   43.124165   0.0125  -0.994020   0.025   0.024544    3   2030015447  2   EMB2    0   2
2253    -3.025344   -3.025108   0.0125  -0.969476   0.025   0.024544    3   2030015448  2   EMB2    0   2
... ... ... ... ... ... ... ... ... ... ... ... ...
4968988 -5.825550   -5.309279   0.4375  -0.454058   0.025   0.024544    3   2030019821  2   EMB2    0   3955
4968989 39.750645   36.227871   0.4375  -0.429515   0.025   0.024544    3   2030019822  2   EMB2    0   3955
4968990 80.568573   73.428436   0.4375  -0.404971   0.025   0.024544    3   2030019823  2   EMB2    0   3955
4968991 -28.921751  -26.358652  0.4375  -0.380427   0.025   0.024544    3   2030019824  2   EMB2    0   3955
4968992 55.599472   50.672146   0.4375  -0.355884   0.025   0.024544    3   2030019825  2   EMB2    0   3955

So, I need to work only with the layer: EMB2 and the columns: et, eta, phi. To pick up these columns, I'm using the following code:

EtEtaPhi, EventLens  = [], []
events = set(zee.loc[zee['layer']=='EMB2']['eventNumber'].to_numpy())
roi    = set(zee.loc[zee['layer']=='EMB2']['roi'].to_numpy())
for ee in events:
    for rr in roi:
        if len(zee.loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr])==0: break       
        EtEtaPhi.append(zee[['et','eta','phi']].loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr].to_numpy())
        EventLens.append(len(EtEtaPhi[-1]))

But to read 4000 events take so long time, almost one second per event. This result isn't good, almost one hour just to extract those columns!

Is there some way to extract columns from a DataFrame more efficiently and faster?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆夏时光 2025-02-02 20:52:21

代码

zee[['et','eta','phi']].loc[zee['layer']=='EMB2']

您已经在那里已经有某个地方的应该做您要求的代码。其余的不需要。

The code

zee[['et','eta','phi']].loc[zee['layer']=='EMB2']

which you already have somewhere in there should do what you asked for. The rest is not needed.

回复收藏 0 原文

永言不败 2025-02-02 20:52:21

只需使用.loc：

sample = zee.loc[zee["layer"].eq("EMB2"), ["et","eta","phi"]]

Just use .loc:

sample = zee.loc[zee["layer"].eq("EMB2"), ["et","eta","phi"]]

回复收藏 0 原文

~没有更多了~

关于作者

○愚か者の日

暂无简介

文章

29 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

Python DataFrame操纵：如何快速提取一组列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

Python DataFrame操纵：如何快速提取一组列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。