当前位置：文江博客话题详情

Python pandas performance dataframe processing-efficiency

Pandas 通过索引选择器从 DataFrame 中查找值

发布于 2025-01-11 06:12:57 字数 965 浏览 0 评论 0原文

假设我们有一个带有任意但长列数的索引数据框：

from numpy.random import randint
import pandas as pd

df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
print(df)

>    A   B   C   D
> 0  78   1  97  98
> 1  93  58  46  45
> 2  50   1  77  27
> 3  63  87  66  21
> 4  26   1  10  46
> 5  26  60  71  79
> 6  74   4  62  98
> 7  93  22  23  89
> 8  30  31  14  46
> 9  51   4  90  22

并且有一个选择器，其中包含每列所需的索引，例如：

selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                        index=df.columns)
print(selector)

>    other_index
> A            9
> B            0
> C            3
> D            4

现在我想得到

selected = [df[c].loc[selector.loc[c][0]] for c in df.columns]
print(selected)

> [51, 1, 66, 46]

我很确定那里是 pandas 中实现此目的的更有效方法，但我找不到。

Suppose we have an indexed Dataframe with arbitrary but long number of columns:

from numpy.random import randint
import pandas as pd

df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
print(df)

>    A   B   C   D
> 0  78   1  97  98
> 1  93  58  46  45
> 2  50   1  77  27
> 3  63  87  66  21
> 4  26   1  10  46
> 5  26  60  71  79
> 6  74   4  62  98
> 7  93  22  23  89
> 8  30  31  14  46
> 9  51   4  90  22

And have a selector, which contains which index need for each columns, like:

selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                        index=df.columns)
print(selector)

>    other_index
> A            9
> B            0
> C            3
> D            4

Now I would like to get the

selected = [df[c].loc[selector.loc[c][0]] for c in df.columns]
print(selected)

> [51, 1, 66, 46]

I'm pretty sure there is a more efficient way in pandas to achieve this, but I can't find.

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（2）

转身以后 2025-01-18 06:12:57

我会在 df.lookup 将来被弃用之前使用它。 :)

df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
    A   B   C   D
0  93  30  17  42
1  38  55  10  46
2   7  30  86  36
3  25  48  25  62
4   1  61  50   0
5  18  87  98  87
6  61  57  80  34
7  38  50  32  96
8  72  68  75  74
9  70  99  77  28

selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                        index=df.columns)
   other_index
A            5
B            7
C            5
D            9

df.lookup(selector.other_index, selector.index)
array([18, 50, 98, 28])

I would use df.lookup before it got deprecated in the future. :)

df = pd.DataFrame(randint(0,100,size=(10, 4)), columns=list('ABCD'))
    A   B   C   D
0  93  30  17  42
1  38  55  10  46
2   7  30  86  36
3  25  48  25  62
4   1  61  50   0
5  18  87  98  87
6  61  57  80  34
7  38  50  32  96
8  72  68  75  74
9  70  99  77  28

selector = pd.DataFrame({ "other_index": randint(len(df.index),size=len(df.columns))}, 
                        index=df.columns)
   other_index
A            5
B            7
C            5
D            9

df.lookup(selector.other_index, selector.index)
array([18, 50, 98, 28])

回复收藏 0 原文

后知后觉 2025-01-18 06:12:57

IIUC，你可以 stack 和切片：

idx = zip(selector['other_index'], selector.index)
df.stack().loc[idx].to_list()

输出：[51, 31, 46, 46]

IIUC, you could stack and slice:

idx = zip(selector['other_index'], selector.index)
df.stack().loc[idx].to_list()

output: [51, 31, 46, 46]

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

卷耳

文章 0 评论 0

佚名

文章 0 评论 0

℉服软

文章 0 评论 0

qq_2gSKZM

文章 0 评论 0

凉宸

文章 0 评论 0

gyhjy

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文