使用熊猫样式的循环突出显示每列的异常值
我想突出显示我的单元格离群值,每列的条件最小和最大离群值不同。这是我的形象 data 。
num_cols = ['X','Y','FFMC','DMC','DC','ISI','temp','RH','wind','rain','area']
Q1 = dataset[num_cols].quantile(0.25)
Q3 = dataset[num_cols].quantile(0.75)
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
我在此 solustionly”> solustion 上尝试了此代码:
def highlight_outlier(df_):
styles_df = pd.DataFrame('background-color: white',
index=df_.index,
columns=df_.columns)
for s in num_cols:
styles_df[s].apply(lambda x: 'background-color: yellow' if x < upper[s] or x < lower[s] else 'background-color: white')
return styles_df
dataset_sort = dataset.sort_values("outliers")
dataset_sort.style.apply(highlight_outlier,axis=None)
也尝试了此代码基于此代码,在此 Solution> Solution> solution> solution
def highlight_outlier(x):
c1 = 'background-color: yellow'
#empty DataFrame of styles
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#set new columns by condition
for col in num_cols:
df1.loc[(x[col] < upper), col] = c1
df1.loc[(x[col] > lower), col] = c1
return df1
dataset_sort = dataset.sort_values("outliers")
dataset_sort.style.apply(highlight_outlier,axis=None)
两者都失败了。样式后,如何仅显示5个数据?谢谢
i want to highlight my cell outlier with different condition minimum and maximum outlier for each column. this is my image of
data.
num_cols = ['X','Y','FFMC','DMC','DC','ISI','temp','RH','wind','rain','area']
Q1 = dataset[num_cols].quantile(0.25)
Q3 = dataset[num_cols].quantile(0.75)
IQR = Q3 - Q1
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
i tried this code base on this solustion:
def highlight_outlier(df_):
styles_df = pd.DataFrame('background-color: white',
index=df_.index,
columns=df_.columns)
for s in num_cols:
styles_df[s].apply(lambda x: 'background-color: yellow' if x < upper[s] or x < lower[s] else 'background-color: white')
return styles_df
dataset_sort = dataset.sort_values("outliers")
dataset_sort.style.apply(highlight_outlier,axis=None)
also tried this code based on this solution:
def highlight_outlier(x):
c1 = 'background-color: yellow'
#empty DataFrame of styles
df1 = pd.DataFrame('', index=x.index, columns=x.columns)
#set new columns by condition
for col in num_cols:
df1.loc[(x[col] < upper), col] = c1
df1.loc[(x[col] > lower), col] = c1
return df1
dataset_sort = dataset.sort_values("outliers")
dataset_sort.style.apply(highlight_outlier,axis=None)
both failed. and how can i show only 5 data after styling? thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在您的计算中,
下部
und上
是类型的pd.series。因此,您必须在righlight_outlier()
函数中使用循环中的迭代器,以避免索引问题。我在下面使用了upper [i]
。最小示例
In your calculation
lower
undupper
are of type pd.Series. Therefor you have to use an iterator in your loop inside thehighlight_outlier()
function to avoid an indexing problem. I usedupper[i]
below.Minimal Example