提取与另一个DF的单个元素相对应的数据框的值

发布于 2025-01-20 08:39:44 字数 3849 浏览 3 评论 0 原文

,如下所示

DF1 Col1 Col2 Col3 Col3 Col4
row1 dog 猫猫 狮子
2 狮子
3 狮子 个熊猫 col4
row col4 col4 col4
2 DFS(DF1&amp ; col5 df2 Ant
df2 col1 col2 col3 col4 col5
row1 3.219843 1.3631996 1.0051135 0.89303696 0.4313375
row2 2.8661892 1.4396228 0.7863044 0.539315 0.48167187
row3 2.5679462 1.3657334 0.9470184 0.79186934 0.48637152
row4 3.631389 0.94815284 0.7561722 0.6743943 0.5441728
row5 2.4727197 1.5941181 1.4069512 1.064051 0.48297918

The string elements of the df1 correspond to the values of df2.对于两个数据框,条件都存在元素(或值)在同一行上不重复的条件。但是可以在不同的行上重复。

例如,Row1 = 3.219843的狗,Row3 = 0.9470184,Row4 = 0.7561722等。

我想将所有唯一元素的值提取到1st df的所有元素中。喜欢:

dog = [3.219843,0.539315,1.3657334]

1.3631996,2.8661892,2.5679462,3.631389,2.4727197]

cat = [

非常感谢!

I have 2 pandas dfs (df1 & df2) as seen here:

df1 col1 col2 col3 col4 col5
row1 Dog Cat Bird Tree Lion
row2 Cat Dragon Bird Dog Tree
row3 Cat Dog Bird Tree Hippo
row4 Cat Tree Bird Ant Fish
row5 Cat Tree Monkey Dragon Ant
df2 col1 col2 col3 col4 col5
row1 3.219843 1.3631996 1.0051135 0.89303696 0.4313375
row2 2.8661892 1.4396228 0.7863044 0.539315 0.48167187
row3 2.5679462 1.3657334 0.9470184 0.79186934 0.48637152
row4 3.631389 0.94815284 0.7561722 0.6743943 0.5441728
row5 2.4727197 1.5941181 1.4069512 1.064051 0.48297918

The string elements of the df1 correspond to the values of df2. For both dataframes the condition exists that an element (or a value) does not repeat on the same row. But can be repeated on different rows.

For example Dog of row1 = 3.219843, bird of row3 = 0.9470184, bird of row4 = 0.7561722 etc.

I would like to extract the values for all unique elements of the 1st df into different arrays. Like:

Dog = [3.219843, 0.539315, 1.3657334]

Cat = [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197]

etc...

Any ideas?

Many thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

南巷近海 2025-01-27 08:39:44

假设您的第一列 df1df2 是它们各自 df 的索引,我们可以提取 中每个唯一动​​物的值df1 通过使用第一个 df 作为掩码从第二个 df 中提取所有想要的值(结果是一个带有 NaN 的新 df code> 在不相关的单元格中,可以将其翻转使用 .stack().values 转换为一维数组)。

构建数据帧

首先,创建一些测试数据。请在以后的帖子中以类似的形式提供。这就是 @mozway 在评论中谈论的内容。 非常感谢

(并不总是有人愿意执行所有必要的复制和粘贴操作来启动和运行数据帧以进行测试。)

import pandas as pd
import numpy as np

index = ['row1', 'row2', 'row3', 'row4', 'row5']

data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
         'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
         'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
         'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
         'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}

data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
         'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
         'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
         'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
         'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}

df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)

提取数据

由于您没有指定所需的数据结构,因此这就是概述的策略上面的 dict 理解:

{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}

结果如下所示:

{'Ant': array([0.6743943 , 0.48297918]),
 'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
 'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
 'Dog': array([3.219843 , 0.539315 , 1.3657334]),
 'Dragon': array([1.4396228, 1.064051 ]),
 'Fish': array([0.5441728]),
 'Hippo': array([0.48637152]),
 'Lion': array([0.4313375]),
 'Monkey': array([1.4069512]),
 'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}

Assuming that your first columns df1 and df2 are the index of their respective df, we can extract the values for each unique animal in df1 by using the first df as a mask to extract all wanted values from the second one (the result is a new df with NaN in irrelevant cells, which can be turned into a 1-dimensional array with .stack().values).

Construct the dataframes

First of, create some test data. Please provide it in a form like this in future posts. That's what @mozway was talking about in the comments. It is greatly appreciated.

(It's not always the case that somebody is willing to do all the copy-and-pasting necessary to get dataframes up and running for testing.)

import pandas as pd
import numpy as np

index = ['row1', 'row2', 'row3', 'row4', 'row5']

data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
         'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
         'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
         'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
         'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}

data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
         'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
         'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
         'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
         'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}

df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)

Extract the data

Since you didn't specify what data structure you need, this is the strategy outlined above in a dict comprehension:

{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}

The result looks like this:

{'Ant': array([0.6743943 , 0.48297918]),
 'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
 'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
 'Dog': array([3.219843 , 0.539315 , 1.3657334]),
 'Dragon': array([1.4396228, 1.064051 ]),
 'Fish': array([0.5441728]),
 'Hippo': array([0.48637152]),
 'Lion': array([0.4313375]),
 'Monkey': array([1.4069512]),
 'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}
番薯 2025-01-27 08:39:44

假设@fsimonjetz提供的输入,您可以 两个dataframes,然后 groupby.agg 作为列表:

df2.stack().groupby(df1.stack()).agg(list).to_dict()

或,使用中间数据框架:

(pd
 .concat([df1.stack(),df2.stack()], axis=1)
 .groupby(0)[1].agg(list)
 .to_dict()
)

输出:

{'Ant': [0.6743943, 0.48297918],
 'Bird': [1.0051135, 0.7863044, 0.9470184, 0.7561722],
 'Cat': [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197],
 'Dog': [3.219843, 0.539315, 1.3657334],
 'Dragon': [1.4396228, 1.064051],
 'Fish': [0.5441728],
 'Hippo': [0.48637152],
 'Lion': [0.4313375],
 'Monkey': [1.4069512],
 'Tree': [0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181]}

Assuming the input kindly provided by @fsimonjetz, you can stack both dataframes, then GroupBy.agg as list:

df2.stack().groupby(df1.stack()).agg(list).to_dict()

or, using an intermediate DataFrame:

(pd
 .concat([df1.stack(),df2.stack()], axis=1)
 .groupby(0)[1].agg(list)
 .to_dict()
)

output:

{'Ant': [0.6743943, 0.48297918],
 'Bird': [1.0051135, 0.7863044, 0.9470184, 0.7561722],
 'Cat': [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197],
 'Dog': [3.219843, 0.539315, 1.3657334],
 'Dragon': [1.4396228, 1.064051],
 'Fish': [0.5441728],
 'Hippo': [0.48637152],
 'Lion': [0.4313375],
 'Monkey': [1.4069512],
 'Tree': [0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181]}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文