当值相同时,pivot_table中有不需要的平均值
当有相同的值时,pivot_table表示平均 (默认情况下,原因aggfunc ='emane'
)
例如:
d=pd.DataFrame(data={
'x_values':[13.4,13.08,12.73,12.,33.,23.,12.],
'y_values': [1.54, 1.47,1.,2.,4.,4.,3.],
'experiment':['e', 'e', 'e', 'f', 'f','f','f']})
print(pd.pivot_table(d, index='x_values',
columns='experiment', values='y_values',sort=False))
返回:
experiment e f
x_values
13.40 1.54 NaN
13.08 1.47 NaN
12.73 1.00 NaN
12.00 NaN 2.5
33.00 NaN 4.0
23.00 NaN 4.0
您可以在f中看到一个新值(2.5是2。和3的平均值)。
但是我想保留我的熊猫,
experiment e f
x_values
13.40 1.54 NaN
13.08 1.47 NaN
12.73 1.00 NaN
12.00 NaN 2.0
33.00 NaN 4.0
23.00 NaN 4.0
12.00 NaN 3.0
我该怎么做?
我试图玩aggfunc = list
,然后是爆炸
,但是在这种情况下,订单丢失了...
谢谢
When there are identical values pivot_table takes the mean
(cause aggfunc='mean'
by default)
For instance:
d=pd.DataFrame(data={
'x_values':[13.4,13.08,12.73,12.,33.,23.,12.],
'y_values': [1.54, 1.47,1.,2.,4.,4.,3.],
'experiment':['e', 'e', 'e', 'f', 'f','f','f']})
print(pd.pivot_table(d, index='x_values',
columns='experiment', values='y_values',sort=False))
returns:
experiment e f
x_values
13.40 1.54 NaN
13.08 1.47 NaN
12.73 1.00 NaN
12.00 NaN 2.5
33.00 NaN 4.0
23.00 NaN 4.0
As you can see a new value in f appears (2.5 which is the mean of 2. and 3).
But I want to keep the list as it was in my pandas
experiment e f
x_values
13.40 1.54 NaN
13.08 1.47 NaN
12.73 1.00 NaN
12.00 NaN 2.0
33.00 NaN 4.0
23.00 NaN 4.0
12.00 NaN 3.0
How can I do it ?
I have tried to play with aggfunc=list
followed by an explode
but in this case the order is lost ...
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是我的解决方案。您真的不想在 pivot 上
x_values
(因为有dupes)。因此,添加一个新的唯一列(ID_COL
),并在x_values
和id_col
上添加枢轴。然后,您必须进行一些清理:这是输出:
Here's my solution. You don't really want to pivot on
x_values
(because there are dupes). So add a new unique column (id_col
) and pivot on bothx_values
andid_col
. Then you will have to do some cleanup:Here's the output:
解决方法是为每个唯一的实验值选择数据,然后加入所有这些数据:
结果:
A workaround would be to select data for each unique experiment value and then concat all these data:
Result:
您也可以只分配新变量并根据布尔蒙版填充它们:
如果您对列
实验
有多个属性,则可以迭代所有唯一值:这会导致所需的输出。
这种方法似乎比@STEF提供的方法更有效。但是,有更多的代码行。
You could also just assign new variables and fill them according to boolean masks:
If you have more than one attribute for the column
experiment
, you can iterate over all unique values:which results to the desired output.
This approach appears to be more efficient than the approach provided by @Stef. However, with the cost of more lines of code.