使用矢量化在两个熊猫数据框之间有效查找
我有两个pandas数据框架:一个是主要数据(DF1),另一个是查找表(DF2)。
主数据
列1 | ... |
---|---|
[数据1,数据2,数据3,...] | ... |
[数据11,数据21,数据31,...] | ... |
查找表
数据 | 位置 |
---|---|
data 1 | location1 location1 |
docity2 | location2 location2 |
data3 | location1 |
data11 | location1 |
... | ... |
因此,我的问题是如何在主数据表中使用pandas矢量化来创建具有此格式的新列:
column1 | ... | count |
---|---|---|
[data 1,data 2,data 2,data 3,.. .. 。 | | |
| | ...} |
我曾尝试使用.Apply(Axis = 1,一些Lambda函数)来创建工作,但是在主表中,它的效率已经降低了。
I have two pandas dataframe: one is the main data (df1) and the other a look up table (df2).
main data
Column1 | ... |
---|---|
[Data 1, Data 2, Data 3, ...] | ... |
[Data 11, Data 21, Data 31, ...] | ... |
Look up table
Data | location |
---|---|
Data1 | location1 |
Data2 | location2 |
Data3 | location1 |
Data11 | location1 |
... | ... |
So, my question is how to use pandas vectorization in the main data table to create a new column with this formatting:
Column1 | ... | Count |
---|---|---|
[Data 1, Data 2, Data 3, ...] | ... | {location1:[data1,data3], location2:[data2], ....} |
[Data 11, Data 21, Data 31, ...] | ... | {location1:[Data11], ....} |
I had tried using .apply(axis=1, some lambda function) to create a work around, but it has become inefficient with large entries in the main table.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
据我了解您想要完成的任务,您希望在主要数据中添加一列,如下所示 df1。此列应包含一个字典,其中 df2 中为 df1 列列表中的每个条目定义了位置。
虽然我不知道为什么你需要这个,并且肯定会寻找更好的方法来实现你的最终目标,但这就是我将如何进行:
给定:
上面创建了 df1 和 df2,如下所示:
df1:
df2:
然后定义函数:
使用 buildDict 函数,您可以执行以下操作:
这会导致修改 df1,如下所示:
As I understand what you are trying to accomplish, you want to add a column to the main data shown below as df1. This column should contain a dictionary with the locations defined in df2 for each entry in the list of the df1 column.
While I have no idea why you need this and certainly would look for a better way to achieve your final objectives, this is how I would proceed:
Given:
The above creates df1 and df2 as shown below:
df1:
df2:
Then define the function:
Using the buildDict function you can perform the following:
Which results in modifying df1 as illustrated below: