在Altair小提琴图上绘制中位数和四分位数线

发布于 2025-01-26 13:19:50 字数 1218 浏览 1 评论 0原文

假设我有以下图(从 tutorial> :

import altair as alt
from vega_datasets import data

alt.Chart(data.cars()).transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).properties(
    width=100
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

我将如何在每个小提琴图上绘制四分位数和中间线?我是否必须定义另一个地块并将其分层放在小提琴图的顶部?如果线与分布的特定位置的小提琴图相同,那也很好。

Suppose I had the following plot (taken from the tutorial in the Altair documentation):

import altair as alt
from vega_datasets import data

alt.Chart(data.cars()).transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).properties(
    width=100
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

violin_plot

How would I got about drawing quartile and median lines on each of those violin plots? Would I have to define another plot and layer it on top of the violin plots? It would also be nice if the lines are the same width as the violin plot at the specific location on the distribution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦里兽 2025-02-02 13:19:50

是的,您会在刻度之前将它们分层。分别需要添加到分层图表和刻面图表中,这有点棘手,但是类似的东西可以奏效:

import altair as alt
from vega_datasets import data

violins = alt.Chart().transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
)

alt.layer(
    violins,
    alt.Chart().mark_rule().encode(
        y='median(Miles_per_Gallon)',
        x=alt.X(),
        color=alt.value('black')),
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

“在此处输入image

然后您可以为四分位数做同样的事情。我不确定如何将线路限制在手动放入值之外的区域的宽度上,我认为这也有些棘手。我建议将箱形放在小提琴中:

alt.layer(
    violins,
    alt.Chart().mark_boxplot(size=5, extent=0, outliers=False).encode(
        y='Miles_per_Gallon',
        x=alt.value(46),
        color=alt.value('black')
    )
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

”在此处输入图像描述

这类似于 Seaborn如何默认处理小提琴,这也是在 Hintze and Nelson在1997年的原始论文

Yes, you would layer them before faceting. It get a little tricky with what needs to be added to the layered and the faceted chart respectively, but something like this would work:

import altair as alt
from vega_datasets import data

violins = alt.Chart().transform_density(
    'Miles_per_Gallon',
    as_=['Miles_per_Gallon', 'density'],
    extent=[5, 50],
    groupby=['Origin']
).mark_area(orient='horizontal').encode(
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
)

alt.layer(
    violins,
    alt.Chart().mark_rule().encode(
        y='median(Miles_per_Gallon)',
        x=alt.X(),
        color=alt.value('black')),
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

enter image description here

Then you can do the same for the quartiles. I am not sure how you would limit the lines to the width of the area other than manually putting in the values, and I think that would be a bit tricky as well. I would suggest putting a boxplot inside the violin instead:

alt.layer(
    violins,
    alt.Chart().mark_boxplot(size=5, extent=0, outliers=False).encode(
        y='Miles_per_Gallon',
        x=alt.value(46),
        color=alt.value('black')
    )
).properties(
    width=100
).facet(
    data=data.cars(),
    column=alt.Column(
        'Origin:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

enter image description here

This is similar to how seaborn handles violinplots by default and it is also how they were described in the original paper by Hintze and Nelson in 1997.

enter image description here

枕头说它不想醒 2025-02-02 13:19:50

除 @joelostblom的出色答案外,还有一个小的更新。没有“魔术数字”的小提琴和盒子地块可以对齐。我这样做的方式是两个镜像密度图。

data = cars

violin_right = (
    alt.Chart(data, width=100)
    .transform_density(
        "Miles_per_Gallon",
        as_=["Miles_per_Gallon", "density"],
        extent=[5, 50],
        groupby=["Origin"]
    )
    .mark_area(orient="horizontal")
    .encode(
        alt.X("density:Q")
            .impute(None)
            .title(None)
            .axis(labels=False, grid=False, ticks=True),
        alt.Y("Miles_per_Gallon:Q"),
        alt.Color("Origin:N")
    )
)

violin_left = (
    violin_right
    .copy()
    .transform_calculate(density="-datum.density")
)

boxplot = (
    alt.Chart(data, width=100)
    .mark_boxplot(outliers=False, size=5, extent=1.5)
    .encode(y="Miles_per_Gallon:Q", color=alt.value("black"))
)

chart = (
    alt.layer(violin_left, violin_right, boxplot)
    .facet(alt.Column("Origin:N"))
    .configure_view(stroke=None)
)

创建的图表如下所示。

小提琴和盒子的垂直轴都以零点为中心。

A small update in addition to @joelostblom's great answer. It is possible to align the violins and box plots without "magic numbers." The way I did it was two mirrored density plots.

data = cars

violin_right = (
    alt.Chart(data, width=100)
    .transform_density(
        "Miles_per_Gallon",
        as_=["Miles_per_Gallon", "density"],
        extent=[5, 50],
        groupby=["Origin"]
    )
    .mark_area(orient="horizontal")
    .encode(
        alt.X("density:Q")
            .impute(None)
            .title(None)
            .axis(labels=False, grid=False, ticks=True),
        alt.Y("Miles_per_Gallon:Q"),
        alt.Color("Origin:N")
    )
)

violin_left = (
    violin_right
    .copy()
    .transform_calculate(density="-datum.density")
)

boxplot = (
    alt.Chart(data, width=100)
    .mark_boxplot(outliers=False, size=5, extent=1.5)
    .encode(y="Miles_per_Gallon:Q", color=alt.value("black"))
)

chart = (
    alt.layer(violin_left, violin_right, boxplot)
    .facet(alt.Column("Origin:N"))
    .configure_view(stroke=None)
)

The created chart looks as follows.

Violin plot with quartiles

The vertical axes of the violins and boxes are nicely centered at zero point.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文