如何在 R 中为我的数据拟合平滑曲线?
我正在尝试在 R 中绘制一条平滑的曲线。我有以下简单的玩具数据:
> x
[1] 1 2 3 4 5 6 7 8 9 10
> y
[1] 2 4 6 8 7 12 14 16 18 20
现在,当我用标准命令绘制它时,它看起来凹凸不平,当然:
> plot(x,y, type='l', lwd=2, col='red')
如何使曲线平滑,以便使用估计值将 3 个边缘圆化?我知道有很多方法可以拟合平滑曲线,但我不确定哪种方法最适合这种类型的曲线以及如何在 R 中编写它。
I'm trying to draw a smooth curve in R
. I have the following simple toy data:
> x
[1] 1 2 3 4 5 6 7 8 9 10
> y
[1] 2 4 6 8 7 12 14 16 18 20
Now when I plot it with a standard command it looks bumpy and edgy, of course:
> plot(x,y, type='l', lwd=2, col='red')
How can I make the curve smooth so that the 3 edges are rounded using estimated values? I know there are many methods to fit a smooth curve but I'm not sure which one would be most appropriate for this type of curve and how you would write it in R
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我非常喜欢使用
loess()
进行平滑处理:Venables 和 Ripley 的 MASS 书中有一整节关于平滑处理的内容,其中还涵盖了样条线和多项式 - 但
loess()
仅仅只是关于平滑处理每个人的最爱。I like
loess()
a lot for smoothing:Venables and Ripley's MASS book has an entire section on smoothing that also covers splines and polynomials -- but
loess()
is just about everybody's favourite.也许 smooth.spline 是一个选项,您可以在此处设置平滑参数(通常在 0 和 1 之间),
您也可以在 smooth.spline 对象上使用预测。该函数带有基R,参见
?smooth.spline 了解详细信息。
Maybe smooth.spline is an option, You can set a smoothing parameter (typically between 0 and 1) here
you can also use predict on smooth.spline objects. The function comes with base R, see
?smooth.spline for details.
为了使其真正平滑...
这种样式插入了许多额外的点,并为您提供了非常平滑的曲线。这似乎也是 ggplot 所采用的方法。如果标准平滑度很好,则可以直接使用。
In order to get it REALLY smoooth...
This style interpolates lots of extra points and gets you a curve that is very smooth. It also appears to be the the approach that ggplot takes. If the standard level of smoothness is fine you can just use.
ggplot2 包中的 qplot() 函数使用起来非常简单,并提供了一个包含置信带的优雅解决方案。例如,
产生
the qplot() function in the ggplot2 package is very simple to use and provides an elegant solution that includes confidence bands. For instance,
produces
正如德克所说,LOESS 是一个非常好的方法。
另一种选择是使用贝塞尔样条线,如果您没有很多数据点,在某些情况下它可能比 LOESS 效果更好。
在这里您可以找到一个示例:http://rosettacode.org/wiki/Cubic_bezier_curves#R
LOESS is a very good approach, as Dirk said.
Another option is using Bezier splines, which may in some cases work better than LOESS if you don't have many data points.
Here you'll find an example: http://rosettacode.org/wiki/Cubic_bezier_curves#R
其他答案都是很好的方法。然而,R 中还有一些其他选项尚未提及,包括
lowess
和approx
,它们可能会提供更好的拟合或更快的性能。使用备用数据集可以更轻松地展示其优点:
以下是与生成它的 sigmoid 曲线重叠的数据:
< img src="https://i.sstatic.net/N70sI.png" alt="Data">
在观察群体中的二元行为时,此类数据很常见。例如,这可能是客户是否购买了某些商品(y 轴上的二进制 1/0)与他们在网站上花费的时间(x 轴)的关系图。
使用大量的点来更好地展示这些函数的性能差异。
Smooth
、spline
和smooth.spline
在使用我尝试过的任何参数集的数据集上都会产生乱码,这可能是由于它们的原因倾向于映射到每个点,这不适用于噪声数据。loess
、lowess
和approx
函数都产生可用的结果,尽管只是勉强达到approx
。这是每个使用轻度优化参数的代码:结果:
如您所见,
lowess
与原始生成曲线产生近乎完美的拟合。Loess
很接近,但在两个尾部都经历了奇怪的偏差。尽管您的数据集会有很大不同,但我发现其他数据集的表现类似,
loess
和lowess
都能够产生良好的结果。当您查看基准测试时,差异会变得更加明显:Loess
速度极慢,花费的时间是大约
的 100 倍。Lowess
产生的结果比approx
更好,同时运行速度仍然相当快(比 loess 快 15 倍)。随着点数的增加,
Loess
也变得越来越陷入困境,在 50,000 左右变得无法使用。编辑:其他研究表明,
loess
更适合某些数据集。如果您正在处理小型数据集或不考虑性能,请尝试这两个函数并比较结果。The other answers are all good approaches. However, there are a few other options in R that haven't been mentioned, including
lowess
andapprox
, which may give better fits or faster performance.The advantages are more easily demonstrated with an alternate dataset:
Here is the data overlaid with the sigmoid curve that generated it:
This sort of data is common when looking at a binary behavior among a population. For example, this might be a plot of whether or not a customer purchased something (a binary 1/0 on the y-axis) versus the amount of time they spent on the site (x-axis).
A large number of points are used to better demonstrate the performance differences of these functions.
Smooth
,spline
, andsmooth.spline
all produce gibberish on a dataset like this with any set of parameters I have tried, perhaps due to their tendency to map to every point, which does not work for noisy data.The
loess
,lowess
, andapprox
functions all produce usable results, although just barely forapprox
. This is the code for each using lightly optimized parameters:And the results:
As you can see,
lowess
produces a near perfect fit to the original generating curve.Loess
is close, but experiences a strange deviation at both tails.Although your dataset will be very different, I have found that other datasets perform similarly, with both
loess
andlowess
capable of producing good results. The differences become more significant when you look at benchmarks:Loess
is extremely slow, taking 100x as long asapprox
.Lowess
produces better results thanapprox
, while still running fairly quickly (15x faster than loess).Loess
also becomes increasingly bogged down as the number of points increases, becoming unusable around 50,000.EDIT: Additional research shows that
loess
gives better fits for certain datasets. If you are dealing with a small dataset or performance is not a consideration, try both functions and compare the results.在 ggplot2 中,您可以通过多种方式进行平滑,例如:
In ggplot2 you can do smooths in a number of ways, for example:
我没有看到这个方法,所以如果其他人想要这样做,我发现 ggplot 文档建议了一种使用
gam
方法的技术,该技术产生与loess
当处理小数据集时。首先用loess方法和auto公式
第二个使用建议公式的 gam 方法
I didn't see this method shown, so if someone else is looking to do this I found that ggplot documentation suggested a technique for using the
gam
method that produced similar results toloess
when working with small data sets.First with the loess method and auto formula
Second with the gam method with suggested formula
另一种选择是使用 ggpubr 中的 ggscatter 函数> 包。通过指定
add="loess"
,您将获得数据的平滑线。在上面的链接中,您可以找到此功能的更多可能性。以下是使用mtcars
数据集的可重现示例:创建于 2022 年 08 月 - 28 与 reprex v2.0.2
Another option is using the ggscatter function from the
ggpubr
package. By specifyingadd="loess"
, you will get a smoothed line through your data. In the link above you can find more possibilities with this function. Here is a reproducible example using themtcars
dataset:Created on 2022-08-28 with reprex v2.0.2