将文本标签添加到一个点子集的情节散点图中

发布于 2025-01-21 11:03:20 字数 200 浏览 0 评论 0原文

我有一个plotly.express.scatter绘制了数千点。我想添加文本标签,但仅用于离群值(例如,远离趋势线)。

我该如何处理绘图?

我猜我需要列出要标记的点的列表,然后以某种方式将其传递给绘制(update_layout?)。我对做到这一点的好方法感兴趣。

任何帮助都赞赏。

I have a plotly.express.scatter plot with thousands of points. I'd like to add text labels, but only for outliers (eg, far away from a trendline).

How do I do this with plotly?

I'm guessing I need to make a list of points I want labeled and then pass this somehow to plotly (update_layout?). I'm interested in a good way to do this.

Any help appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故乡的云 2025-01-28 11:03:20

您有一个正确的想法:您需要具有离群值的坐标,并使用Plotly's 文本注释将文本标签添加到这些点。我不确定您要如何确定异常值,但是以下是使用提示数据集的示例。

import pandas as pd
from sklearn import linear_model
import plotly.express as px

df = px.data.tips()

## use linear model to determine outliers by residual
X = df["total_bill"].values.reshape(-1, 1)
y = df["tip"].values

regr = linear_model.LinearRegression()
regr.fit(X, y)

df["predicted_tip"] = regr.predict(X)
df["residual"] = df["tip"] - df["predicted_tip"]
residual_mean, residual_std = df["residual"].mean(), df["residual"].std()
df["residual_normalized"] = (((df["tip"] - df["predicted_tip"]) - residual_mean) / residual_std).abs()

## determine outliers using whatever method you like
outliers = df.loc[df["residual_normalized"] > 3.0, ["total_bill","tip"]]

fig = px.scatter(df, x="total_bill", y="tip", trendline="ols", trendline_color_override="red")

## add text to outliers using their (x,y) coordinates:
for x,y in outliers.itertuples(index=False):
    fig.add_annotation(
        x=x, y=y,
        text="outlier",
        showarrow=False,
        yshift=10
    )
fig.show()

You have the right idea: you'll want to have the coordinates of your outliers, and use Plotly's text annotations to add text labels to these points. I am not sure how you want to determine outliers, but the following is an example using the tips dataset.

import pandas as pd
from sklearn import linear_model
import plotly.express as px

df = px.data.tips()

## use linear model to determine outliers by residual
X = df["total_bill"].values.reshape(-1, 1)
y = df["tip"].values

regr = linear_model.LinearRegression()
regr.fit(X, y)

df["predicted_tip"] = regr.predict(X)
df["residual"] = df["tip"] - df["predicted_tip"]
residual_mean, residual_std = df["residual"].mean(), df["residual"].std()
df["residual_normalized"] = (((df["tip"] - df["predicted_tip"]) - residual_mean) / residual_std).abs()

## determine outliers using whatever method you like
outliers = df.loc[df["residual_normalized"] > 3.0, ["total_bill","tip"]]

fig = px.scatter(df, x="total_bill", y="tip", trendline="ols", trendline_color_override="red")

## add text to outliers using their (x,y) coordinates:
for x,y in outliers.itertuples(index=False):
    fig.add_annotation(
        x=x, y=y,
        text="outlier",
        showarrow=False,
        yshift=10
    )
fig.show()

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文