Pandas MultiIndiendex仅根据两个索引级匹配减去
假设我有一个带有 3 个索引的 Pandas 多索引数据框:
import pandas as pd
import numpy as np
arrays = [['UK', 'UK', 'US', 'FR'], ['Firm1', 'Firm1', 'Firm2', 'Firm1'], ['Andy', 'Peter', 'Peter', 'Andy']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm', 'Responsible'))
df_3idx = pd.DataFrame(np.random.randn(4,3), index = idx)
df_3idx
0 1 2
Country Firm Responsible
UK Firm1 Andy 0.237655 2.049636 0.480805
Peter 1.135344 0.745616 -0.577377
US Firm2 Peter 0.034786 -0.278936 0.877142
FR Firm1 Andy 0.048224 1.763329 -1.597279
我还有另一个 pd.dataframe,由上述数据中多索引级别 1 和 2 的独特组合组成:
arrays = [['UK', 'US', 'FR'], ['Firm1', 'Firm2', 'Firm1']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm'))
df_2idx = pd.DataFrame(np.random.randn(3,1), index = idx)
df_2idx
0
Country Firm
UK Firm1 -0.103828
US Firm2 0.096192
FR Firm1 -0.686631
我想从 df_3idx 中减去值 通过 df_2idx 中的相应值,因此,例如,我想从前两行的每个值中减去值 -0.103828,作为索引 1两个数据帧中的 2 和 2 都匹配。
有人知道该怎么做吗?我想我可以简单地拆开第一个数据帧然后减去,但我收到一条错误消息。
df_3idx.unstack('Responsible').sub(df_2idx, axis=0)
ValueError: cannot join with no overlapping index names
无论如何,取消堆叠可能不是一个更好的解决方案,因为我的数据非常大,并且取消堆叠可能需要很多时间。
我将不胜感激任何帮助。非常感谢!
Say I have a Pandas multi-index data frame with 3 indices:
import pandas as pd
import numpy as np
arrays = [['UK', 'UK', 'US', 'FR'], ['Firm1', 'Firm1', 'Firm2', 'Firm1'], ['Andy', 'Peter', 'Peter', 'Andy']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm', 'Responsible'))
df_3idx = pd.DataFrame(np.random.randn(4,3), index = idx)
df_3idx
0 1 2
Country Firm Responsible
UK Firm1 Andy 0.237655 2.049636 0.480805
Peter 1.135344 0.745616 -0.577377
US Firm2 Peter 0.034786 -0.278936 0.877142
FR Firm1 Andy 0.048224 1.763329 -1.597279
I have furthermore another pd.dataframe consisting of unique combinations of multi-index-level 1 and 2 from the above data:
arrays = [['UK', 'US', 'FR'], ['Firm1', 'Firm2', 'Firm1']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm'))
df_2idx = pd.DataFrame(np.random.randn(3,1), index = idx)
df_2idx
0
Country Firm
UK Firm1 -0.103828
US Firm2 0.096192
FR Firm1 -0.686631
I want to subtract the values from df_3idx
by the corresponding value in df_2idx
, so, for instance, I want to subtract from every value of the first two rows the value -0.103828, as index 1 and 2 from both dataframes match.
Does anybody know how to do this? I figured I could simply unstack the first dataframe and then subtract, but I am getting an error message.
df_3idx.unstack('Responsible').sub(df_2idx, axis=0)
ValueError: cannot join with no overlapping index names
Unstacking might anyway not be a preferable solution as my data is very big and unstacking might take a lot of time.
I would appreciate any help. Many thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
相关问题,但不专注于
Multiiindex
但是,答案并不真正在乎。
sub
方法将在匹配索引级别上对齐。
使用参数pd.dataframe.sub.sub
axis = 0
related question but not focused on
MultiIndex
However, the answer doesn't really care. The
sub
method will align on the matching index levels.pd.DataFrame.sub
with parameteraxis=0