熊猫:即使没有共同值
我有3个数据范围。我合并df1
和df2
通过公共列。但是,我需要使用df3
来查找在创建的GroupBy中看到的对的值。我也可以使用两列合并通过内部加入来完成这一部分,但是我还需要了解没有任何共同元素的条目。到目前为止,我能做的是在这里有一个模型问题:
ch = {'country':['India','India','India','USA','USA','Italy','Italy'],'hotel':['Taj','Oberoi','Hilton','Taj','Hilton','Oberoi','Marriott']}
ch_df = pd.DataFrame.from_dict(ch)
hm = {'hotel':['Taj','Taj','Taj','Oberoi','Oberoi','Marriott','Marriott','Marriott','Hilton','Hilton'],'menu':['ildi','dosa','soup','soup','ildi','soup','pasta','pizza','pizza','burger']}
hm_df = pd.DataFrame.from_dict(hm)
cm = {'country':['India','India','India','USA','USA','USA','Italy','Italy'],'menu':['ildi','dosa','soup','dosa','burger','pizza','pizza','pasta']}
cm_df = pd.DataFrame.from_dict(cm)
chm_df = pd.merge(ch_df, hm_df, left_on='hotel', right_on='hotel')
pd.merge(left=chm_df, right=cm_df, on=['country','menu'], how='inner').groupby(['country','hotel'])['menu'].apply(list).reset_index(name='menu items')
country hotel menu items
0 India Oberoi [ildi, soup]
1 India Taj [ildi, dosa, soup]
2 Italy Marriott [pasta, pizza]
3 USA Hilton [pizza, burger]
4 USA Taj [dosa]
我需要的条目是:
5 Italy Oberoi []
...
一种低效的方法是在hm_df
中添加到每个对的条目,一个允许的菜单项并在GroupBy之后将其删除。但是看起来很丑。什么是更优雅的方法?
I have 3 dataframes. I merge df1
and df2
through a common column. However, I need to use df3
to find what values are allowed for pairs seen in groupby created. I could get this part done too using 2-column merge through inner join, but I also need to se the entries that did not have any common elements. So far what I could do is represented with a model problem here:
ch = {'country':['India','India','India','USA','USA','Italy','Italy'],'hotel':['Taj','Oberoi','Hilton','Taj','Hilton','Oberoi','Marriott']}
ch_df = pd.DataFrame.from_dict(ch)
hm = {'hotel':['Taj','Taj','Taj','Oberoi','Oberoi','Marriott','Marriott','Marriott','Hilton','Hilton'],'menu':['ildi','dosa','soup','soup','ildi','soup','pasta','pizza','pizza','burger']}
hm_df = pd.DataFrame.from_dict(hm)
cm = {'country':['India','India','India','USA','USA','USA','Italy','Italy'],'menu':['ildi','dosa','soup','dosa','burger','pizza','pizza','pasta']}
cm_df = pd.DataFrame.from_dict(cm)
chm_df = pd.merge(ch_df, hm_df, left_on='hotel', right_on='hotel')
pd.merge(left=chm_df, right=cm_df, on=['country','menu'], how='inner').groupby(['country','hotel'])['menu'].apply(list).reset_index(name='menu items')
country hotel menu items
0 India Oberoi [ildi, soup]
1 India Taj [ildi, dosa, soup]
2 Italy Marriott [pasta, pizza]
3 USA Hilton [pizza, burger]
4 USA Taj [dosa]
What I need are entries such as:
5 Italy Oberoi []
...
One inefficient way is to add to each pair in hm_df
an allowed menu item and remove it after groupby. But it looks ugly. What is a more elegant method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果需要,所有可能的组合都是可能的,可以使用 dataframe.unstack with
dataframe.stack
,对于替换为空名单的不存在的值fill_value = []
parameter:如果需要,则只有从<<代码> CHM_DF 转换为空列表:
If need all possible combinations is possible use
DataFrame.unstack
withDataFrame.stack
, for replace non exist values to empty lists usefill_value=[]
parameter:For completness if need only non exist values from
chm_df
convert to empty lists:我正在使用的桌子太大(〜100m行),因此我最初寻求的方法不起作用。因此,我现在已经做出了一种更好的方法。
The tables I was working with were too large (~100M rows) so the approach I initially sought does not work. So I have now made, hopefully, a better approach.