在ggplot2中,箱线图线的末尾代表什么?
我找不到箱线图线条端点代表什么的描述。
例如,以下是线条结束处上方和下方的点值。
(我意识到盒子的顶部和底部是第 25 个和第 75 个百分位,中心线是第 50 日)。我认为,由于线上方和下方有点,它们并不代表最大/最小值。
I can't find a description of what the end points of the lines of a boxplot represent.
For example, here are point values above and below where the lines end.
(I realize that the top and bottom of the box are 25th and 75th percentile, and the centerline is the 50th). I assume, as there are points above and below the lines that they do not represent the max/min values.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
箱线图末尾的“点”代表异常值。确定一个点是否为异常值有许多不同的规则,但 R 和 ggplot 使用的方法是“1.5 规则”。如果数据点为:
,则该点被归类为“异常值”。晶须定义为:
上晶须 = min(max(x), Q_3 + 1.5 * IQR)
下晶须 = max(min(x), Q_1 – 1.5 * IQR)
其中 IQR = Q_3 – Q_1,盒子长度。因此上须线位于最大 x 值和 Q_3 + 1.5 IQR 中较小的位置,
而下须线位于最小 x 值和 Q_1 – 1.5 IQR 中较大的位置。
其他信息
示例
考虑以下示例
这给出了以下图:
当我们将范围从 1.7 减小到 1.5 时,我们减小了晶须的长度。但是,
range=0
是一种特殊情况 - 它相当于“range=infinity”The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:
then that point is classed as an "outlier". The whiskers are defined as:
upper whisker = min(max(x), Q_3 + 1.5 * IQR)
lower whisker = max(min(x), Q_1 – 1.5 * IQR)
where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.
Additional information
Example
Consider the following example
This gives the following plot:
As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However,
range=0
is a special case - it's equivalent to "range=infinity"我认为 ggplot 使用标准默认值,与 boxplot 相同:“晶须延伸到最极端的数据点,该数据点距框的长度不超过框的 [1.5] 倍”
请参阅: boxplot.stats
I think ggplot using the standard defaults, the same as boxplot: "the whiskers extend to the most extreme data point which is no more than [1.5] times the length of the box away from the box"
See: boxplot.stats
P1IMSA 教程 8 - 了解箱线图视频提供了一个直观的步骤 - (Tukey) 箱线图和须线图的逐步解释。
在 4 分 23 秒处,我解释了晶须末端的含义及其与 1.5*IQR 的关系。
尽管视频中显示的图表是使用 D3.js 而不是 R 渲染的,但其解释与提到的箱线图的 R 实现一致。
P1IMSA Tutorial 8 - Understanding Box and Whisker Plots video offers a visual step-by-step explanation of (Tukey) box and whisker plots.
At 4m 23s I explain the meaning of the whisker ends and its relationship to the 1.5*IQR.
Although the chart shown in the video was rendered using D3.js rather than R, its explanations jibe with the R implementations of boxplots mentioned.
正如 @TemplateRex 在评论中强调的那样,ggplot 不会在上/下四分位数加/减 1.5 倍 IQR 处绘制胡须。它实际上在 max(x[x < Q3 + 1.5 * IQR]) 和 min(x[x > Q1 + 1.5 * IQR]) 处绘制它们。例如,这是使用 geom_boxplot 绘制的图,其中我在值 Q1 - 1.5*IQR 处添加了一条虚线:
Q1 = 52
Q3 = 65
Q1 - 1.5 * IQR = 52 - 13 *1.5 = 32.5(虚线)
下须线 = min(x[x > Q1 + 1.5 * IQR]) = 35(其中 x 是用于创建箱线图的数据,异常值位于 x = 27 处)。
MWE
请注意,这不是我用来生成上面图像的确切代码,但它已经说明了要点。
As highlighted by @TemplateRex in a comment, ggplot doesn't draw the whiskers at the upper/lower quartile plus/minus 1.5 times the IQR. It actually draws them at max(x[x < Q3 + 1.5 * IQR]) and min(x[x > Q1 + 1.5 * IQR]). For example, here is a plot drawn using geom_boxplot where I've added a dashed line at the value Q1 - 1.5*IQR:
Q1 = 52
Q3 = 65
Q1 - 1.5 * IQR = 52 - 13*1.5 = 32.5 (dashed line)
Lower whisker = min(x[x > Q1 + 1.5 * IQR]) = 35 (where x is the data used to create the boxplot, outlier is at x = 27).
MWE
Note this isn't the exact code I used to produce the image above but it gets the point over.