Python Plotnine(GGPLOT)每颜色添加平均线?

发布于 2025-02-02 12:53:47 字数 1072 浏览 5 评论 0原文

在Python中使用PlotNine,我想在我的图中添加虚线的水平线(散点图,但最好是与其他图类型兼容的答案),该曲线分别代表每种颜色的平均值。我想这样做,而无需手动计算自己的平均值或调整数据的其他部分(例如,为颜色值添加列等)。

此外,原始图是通过函数生成的(make_plot下面),然后将添加均值线,但需要具有与派生点相同的颜色。

将以下内容视为最小例子;

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

def make_plot(df, var_x, var_y, var_fill) :
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

我想添加4行,每条size一条线。可以使用ggplot在R中完全相同,如这个问题。添加geom_line(stat =“ hline”,yintercept =“ meante”,linetype =“ dashed”) to plot但是,错误plotnineError:'stat_hline:“'stat_hline “没有注册表。

非常感谢可以解决上述问题或完全提出另一种工作解决方案的答案。

Using plotnine in python, I'd like to add dashed horizontal lines to my plot (a scatterplot, but preferably an answer compatible with other plot types) representing the mean for every color separately. I'd like to do so without manually computing the mean values myself or adapting other parts of the data (e.g. adding columns for color values etc).

Additionally, the original plot is generated via a function (make_plot below) and the mean lines are to be added afterwards, yet need to have the same color as the points from which they are derived.

Consider the following as a minimal example;

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

def make_plot(df, var_x, var_y, var_fill) :
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

I'd like to add 4 lines, one for each Size. The exact same can be done in R using ggplot, as shown by this question. Adding geom_line(stat="hline", yintercept="mean", linetype="dashed") to plot however results in an error PlotnineError: "'stat_hline' Not in Registry. Make sure the module in which it is defined has been imported." that I am unable to resolve.

Answers that can resolve the aforementioned issue, or propose another working solution entirely, are greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

私野 2025-02-09 12:53:48

您可以首先将均值定义为向量,然后将其传递给您的函数:

import pandas as pd
import numpy as np
from plotnine import *
from random import randint



df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

a = df.groupby(['Size'])['MSE'].mean()  ### Defining yuor means
a = list(a)

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =a,linetype="dashed")
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

< img src =“ https://i.sstatic.net/7mcgb.png” alt =“在此处输入图像说明”>

请注意,两个行重合:

a = [0.6666666666666666, 0.5, 0.4666666666666666, 0.6666666666666666]

要在每个虚线中添加不同的颜色,您可以这样做:

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

### Generate a list of colors of the same length as your categories (Sizes)
color = []
n = len(list(set(df.Size)))

for i in range(n):
    color.append('#%06X' % randint(0, 0xFFFFFF))
######################################################

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =list(df.groupby(['Size'])['MSE'].mean()),linetype="dashed", color =b)
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

返回:

”在此处输入图像描述”

You can do it by first defining the means as a vector and then pass it to your function:

import pandas as pd
import numpy as np
from plotnine import *
from random import randint



df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

a = df.groupby(['Size'])['MSE'].mean()  ### Defining yuor means
a = list(a)

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =a,linetype="dashed")
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

which gives:

enter image description here

Note that two of the lines coincide:

a = [0.6666666666666666, 0.5, 0.4666666666666666, 0.6666666666666666]

To add different colors to each dashed line, you can do this:

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

### Generate a list of colors of the same length as your categories (Sizes)
color = []
n = len(list(set(df.Size)))

for i in range(n):
    color.append('#%06X' % randint(0, 0xFFFFFF))
######################################################

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =list(df.groupby(['Size'])['MSE'].mean()),linetype="dashed", color =b)
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

which returns:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文