当前位置：文江博客话题详情

提取与另一个DF的单个元素相对应的数据框的值

发布于 2025-01-20 08:39:44 字数 3849 浏览 3 评论 0 原文

，如下所示

DF1	Col1	Col2	Col3	Col3	Col4
row1	dog	猫猫	鸟	树	狮子
2	猫	狮子	鸟	狗	树
3	狮子	龙	有	个熊猫	col4
row	col4	col4	col4	：	）
2	DFS（DF1＆amp ;	col5	df2	我	Ant

df2	col1	col2	col3	col4	col5
row1	3.219843	1.3631996	1.0051135	0.89303696	0.4313375
row2	2.8661892	1.4396228	0.7863044	0.539315	0.48167187
row3	2.5679462	1.3657334	0.9470184	0.79186934	0.48637152
row4	3.631389	0.94815284	0.7561722	0.6743943	0.5441728
row5	2.4727197	1.5941181	1.4069512	1.064051	0.48297918

The string elements of the df1 correspond to the values of df2.对于两个数据框，条件都存在元素（或值）在同一行上不重复的条件。但是可以在不同的行上重复。

例如，Row1 = 3.219843的狗，Row3 = 0.9470184，Row4 = 0.7561722等。

我想将所有唯一元素的值提取到1st df的所有元素中。喜欢：

dog = [3.219843，0.539315，1.3657334]

1.3631996，2.8661892，2.5679462，3.631389，2.4727197]

cat = [

非常感谢！

原文

I have 2 pandas dfs (df1 & df2) as seen here:

df1	col1	col2	col3	col4	col5
row1	Dog	Cat	Bird	Tree	Lion
row2	Cat	Dragon	Bird	Dog	Tree
row3	Cat	Dog	Bird	Tree	Hippo
row4	Cat	Tree	Bird	Ant	Fish
row5	Cat	Tree	Monkey	Dragon	Ant

df2	col1	col2	col3	col4	col5
row1	3.219843	1.3631996	1.0051135	0.89303696	0.4313375
row2	2.8661892	1.4396228	0.7863044	0.539315	0.48167187
row3	2.5679462	1.3657334	0.9470184	0.79186934	0.48637152
row4	3.631389	0.94815284	0.7561722	0.6743943	0.5441728
row5	2.4727197	1.5941181	1.4069512	1.064051	0.48297918

The string elements of the df1 correspond to the values of df2. For both dataframes the condition exists that an element (or a value) does not repeat on the same row. But can be repeated on different rows.

For example Dog of row1 = 3.219843, bird of row3 = 0.9470184, bird of row4 = 0.7561722 etc.

I would like to extract the values for all unique elements of the 1st df into different arrays. Like:

Dog = [3.219843, 0.539315, 1.3657334]

Cat = [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197]

etc...

Any ideas?

Many thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南巷近海 2025-01-27 08:39:44

假设您的第一列 df1 和 df2 是它们各自 df 的索引，我们可以提取 中每个唯一动物的值df1 通过使用第一个 df 作为掩码从第二个 df 中提取所有想要的值（结果是一个带有 NaN 的新 df code> 在不相关的单元格中，可以将其翻转使用 .stack().values 转换为一维数组）。

构建数据帧

首先，创建一些测试数据。请在以后的帖子中以类似的形式提供。这就是 @mozway 在评论中谈论的内容。非常感谢。

（并不总是有人愿意执行所有必要的复制和粘贴操作来启动和运行数据帧以进行测试。）

import pandas as pd
import numpy as np

index = ['row1', 'row2', 'row3', 'row4', 'row5']

data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
         'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
         'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
         'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
         'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}

data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
         'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
         'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
         'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
         'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}

df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)

提取数据

由于您没有指定所需的数据结构，因此这就是概述的策略上面的 dict 理解：

{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}

结果如下所示：

{'Ant': array([0.6743943 , 0.48297918]),
 'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
 'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
 'Dog': array([3.219843 , 0.539315 , 1.3657334]),
 'Dragon': array([1.4396228, 1.064051 ]),
 'Fish': array([0.5441728]),
 'Hippo': array([0.48637152]),
 'Lion': array([0.4313375]),
 'Monkey': array([1.4069512]),
 'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}

Assuming that your first columns df1 and df2 are the index of their respective df, we can extract the values for each unique animal in df1 by using the first df as a mask to extract all wanted values from the second one (the result is a new df with NaN in irrelevant cells, which can be turned into a 1-dimensional array with .stack().values).

Construct the dataframes

First of, create some test data. Please provide it in a form like this in future posts. That's what @mozway was talking about in the comments. It is greatly appreciated.

(It's not always the case that somebody is willing to do all the copy-and-pasting necessary to get dataframes up and running for testing.)

import pandas as pd
import numpy as np

index = ['row1', 'row2', 'row3', 'row4', 'row5']

data1 = {'col1': ['Dog', 'Cat', 'Cat', 'Cat', 'Cat'],
         'col2': ['Cat', 'Dragon', 'Dog', 'Tree', 'Tree'],
         'col3': ['Bird', 'Bird', 'Bird', 'Bird', 'Monkey'],
         'col4': ['Tree', 'Dog', 'Tree', 'Ant', 'Dragon'],
         'col5': ['Lion', 'Tree', 'Hippo', 'Fish', 'Ant']}

data2 = {'col1': [3.219843, 2.8661892, 2.5679462, 3.631389, 2.4727197],
         'col2': [1.3631996, 1.4396228, 1.3657334, 0.94815284, 1.5941181],
         'col3': [1.0051135, 0.7863044, 0.9470184, 0.7561722, 1.4069512],
         'col4': [0.89303696, 0.539315, 0.79186934, 0.6743943, 1.064051],
         'col5': [0.4313375, 0.48167187, 0.48637152, 0.5441728, 0.48297918]}

df1 = pd.DataFrame(data1, index=index)
df2 = pd.DataFrame(data2, index=index)

Extract the data

Since you didn't specify what data structure you need, this is the strategy outlined above in a dict comprehension:

{animal: df2[df1.eq(animal)].stack().values for animal in np.unique(df1)}

The result looks like this:

{'Ant': array([0.6743943 , 0.48297918]),
 'Bird': array([1.0051135, 0.7863044, 0.9470184, 0.7561722]),
 'Cat': array([1.3631996, 2.8661892, 2.5679462, 3.631389 , 2.4727197]),
 'Dog': array([3.219843 , 0.539315 , 1.3657334]),
 'Dragon': array([1.4396228, 1.064051 ]),
 'Fish': array([0.5441728]),
 'Hippo': array([0.48637152]),
 'Lion': array([0.4313375]),
 'Monkey': array([1.4069512]),
 'Tree': array([0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181 ])}

回复收藏 0 原文

番薯 2025-01-27 08:39:44

假设@fsimonjetz提供的输入，您可以两个dataframes，然后 groupby.agg 作为列表：

df2.stack().groupby(df1.stack()).agg(list).to_dict()

或，使用中间数据框架：

(pd
 .concat([df1.stack(),df2.stack()], axis=1)
 .groupby(0)[1].agg(list)
 .to_dict()
)

输出：

{'Ant': [0.6743943, 0.48297918],
 'Bird': [1.0051135, 0.7863044, 0.9470184, 0.7561722],
 'Cat': [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197],
 'Dog': [3.219843, 0.539315, 1.3657334],
 'Dragon': [1.4396228, 1.064051],
 'Fish': [0.5441728],
 'Hippo': [0.48637152],
 'Lion': [0.4313375],
 'Monkey': [1.4069512],
 'Tree': [0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181]}

Assuming the input kindly provided by @fsimonjetz, you can stack both dataframes, then GroupBy.agg as list:

df2.stack().groupby(df1.stack()).agg(list).to_dict()

or, using an intermediate DataFrame:

(pd
 .concat([df1.stack(),df2.stack()], axis=1)
 .groupby(0)[1].agg(list)
 .to_dict()
)

output:

{'Ant': [0.6743943, 0.48297918],
 'Bird': [1.0051135, 0.7863044, 0.9470184, 0.7561722],
 'Cat': [1.3631996, 2.8661892, 2.5679462, 3.631389, 2.4727197],
 'Dog': [3.219843, 0.539315, 1.3657334],
 'Dragon': [1.4396228, 1.064051],
 'Fish': [0.5441728],
 'Hippo': [0.48637152],
 'Lion': [0.4313375],
 'Monkey': [1.4069512],
 'Tree': [0.89303696, 0.48167187, 0.79186934, 0.94815284, 1.5941181]}

回复收藏 0 原文

~没有更多了~

关于作者

轻拂→两袖风尘

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

提取与另一个DF的单个元素相对应的数据框的值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

构建数据帧

提取数据

Construct the dataframes

Extract the data

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

提取与另一个DF的单个元素相对应的数据框的值

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

构建数据帧

提取数据

Construct the dataframes

Extract the data

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。