如何在 R 中为我的数据拟合平滑曲线?

发布于 2024-09-13 22:51:36 字数 337 浏览 6 评论 0原文

我正在尝试在 R 中绘制一条平滑的曲线。我有以下简单的玩具数据:

> x
 [1]  1  2  3  4  5  6  7  8  9 10
> y
 [1]  2  4  6  8  7 12 14 16 18 20

现在,当我用标准命令绘制它时,它看起来凹凸不平,当然:

> plot(x,y, type='l', lwd=2, col='red')

如何使曲线平滑,以便使用估计值将 3 个边缘圆化?我知道有很多方法可以拟合平滑曲线,但我不确定哪种方法最适合这种类型的曲线以及如何在 R 中编写它。

I'm trying to draw a smooth curve in R. I have the following simple toy data:

> x
 [1]  1  2  3  4  5  6  7  8  9 10
> y
 [1]  2  4  6  8  7 12 14 16 18 20

Now when I plot it with a standard command it looks bumpy and edgy, of course:

> plot(x,y, type='l', lwd=2, col='red')

How can I make the curve smooth so that the 3 edges are rounded using estimated values? I know there are many methods to fit a smooth curve but I'm not sure which one would be most appropriate for this type of curve and how you would write it in R.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

半衾梦 2024-09-20 22:51:36

我非常喜欢使用 loess() 进行平滑处理:

x <- 1:10
y <- c(2,4,6,8,7,12,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
lines(predict(lo), col='red', lwd=2)

Venables 和 Ripley 的 MASS 书中有一整节关于平滑处理的内容,其中还涵盖了样条线和多项式 - 但 loess() 仅仅只是关于平滑处理每个人的最爱。

I like loess() a lot for smoothing:

x <- 1:10
y <- c(2,4,6,8,7,12,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
lines(predict(lo), col='red', lwd=2)

Venables and Ripley's MASS book has an entire section on smoothing that also covers splines and polynomials -- but loess() is just about everybody's favourite.

童话 2024-09-20 22:51:36

也许 smooth.spline 是一个选项,您可以在此处设置平滑参数(通常在 0 和 1 之间),

smoothingSpline = smooth.spline(x, y, spar=0.35)
plot(x,y)
lines(smoothingSpline)

您也可以在 smooth.spline 对象上使用预测。该函数带有基R,参见
?smooth.spline 了解详细信息。

Maybe smooth.spline is an option, You can set a smoothing parameter (typically between 0 and 1) here

smoothingSpline = smooth.spline(x, y, spar=0.35)
plot(x,y)
lines(smoothingSpline)

you can also use predict on smooth.spline objects. The function comes with base R, see
?smooth.spline for details.

锦上情书 2024-09-20 22:51:36

为了使其真正平滑...

x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
xl <- seq(min(x),max(x), (max(x) - min(x))/1000)
lines(xl, predict(lo,xl), col='red', lwd=2)

这种样式插入了许多额外的点,并为您提供了非常平滑的曲线。这似乎也是 ggplot 所采用的方法。如果标准平滑度很好,则可以直接使用。

scatter.smooth(x, y)

In order to get it REALLY smoooth...

x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)
lo <- loess(y~x)
plot(x,y)
xl <- seq(min(x),max(x), (max(x) - min(x))/1000)
lines(xl, predict(lo,xl), col='red', lwd=2)

This style interpolates lots of extra points and gets you a curve that is very smooth. It also appears to be the the approach that ggplot takes. If the standard level of smoothness is fine you can just use.

scatter.smooth(x, y)
三岁铭 2024-09-20 22:51:36

ggplot2 包中的 qplot() 函数使用起来非常简单,并提供了一个包含置信带的优雅解决方案。例如,

qplot(x,y, geom='smooth', span =0.5)

产生
在此处输入图像描述

the qplot() function in the ggplot2 package is very simple to use and provides an elegant solution that includes confidence bands. For instance,

qplot(x,y, geom='smooth', span =0.5)

produces
enter image description here

好菇凉咱不稀罕他 2024-09-20 22:51:36

正如德克所说,LOESS 是一个非常好的方法。

另一种选择是使用贝塞尔样条线,如果您没有很多数据点,在某些情况下它可能比 LOESS 效果更好。

在这里您可以找到一个示例:http://rosettacode.org/wiki/Cubic_bezier_curves#R

# x, y: the x and y coordinates of the hull points
# n: the number of points in the curve.
bezierCurve <- function(x, y, n=10)
    {
    outx <- NULL
    outy <- NULL

    i <- 1
    for (t in seq(0, 1, length.out=n))
        {
        b <- bez(x, y, t)
        outx[i] <- b$x
        outy[i] <- b$y

        i <- i+1
        }

    return (list(x=outx, y=outy))
    }

bez <- function(x, y, t)
    {
    outx <- 0
    outy <- 0
    n <- length(x)-1
    for (i in 0:n)
        {
        outx <- outx + choose(n, i)*((1-t)^(n-i))*t^i*x[i+1]
        outy <- outy + choose(n, i)*((1-t)^(n-i))*t^i*y[i+1]
        }

    return (list(x=outx, y=outy))
    }

# Example usage
x <- c(4,6,4,5,6,7)
y <- 1:6
plot(x, y, "o", pch=20)
points(bezierCurve(x,y,20), type="l", col="red")

LOESS is a very good approach, as Dirk said.

Another option is using Bezier splines, which may in some cases work better than LOESS if you don't have many data points.

Here you'll find an example: http://rosettacode.org/wiki/Cubic_bezier_curves#R

# x, y: the x and y coordinates of the hull points
# n: the number of points in the curve.
bezierCurve <- function(x, y, n=10)
    {
    outx <- NULL
    outy <- NULL

    i <- 1
    for (t in seq(0, 1, length.out=n))
        {
        b <- bez(x, y, t)
        outx[i] <- b$x
        outy[i] <- b$y

        i <- i+1
        }

    return (list(x=outx, y=outy))
    }

bez <- function(x, y, t)
    {
    outx <- 0
    outy <- 0
    n <- length(x)-1
    for (i in 0:n)
        {
        outx <- outx + choose(n, i)*((1-t)^(n-i))*t^i*x[i+1]
        outy <- outy + choose(n, i)*((1-t)^(n-i))*t^i*y[i+1]
        }

    return (list(x=outx, y=outy))
    }

# Example usage
x <- c(4,6,4,5,6,7)
y <- 1:6
plot(x, y, "o", pch=20)
points(bezierCurve(x,y,20), type="l", col="red")
ζ澈沫 2024-09-20 22:51:36

其他答案都是很好的方法。然而,R 中还有一些其他选项尚未提及,包括 lowessapprox,它们可能会提供更好的拟合或更快的性能。

使用备用数据集可以更轻松地展示其优点:

sigmoid <- function(x)
{
  y<-1/(1+exp(-.15*(x-100)))
  return(y)
}

dat<-data.frame(x=rnorm(5000)*30+100)
dat$y<-as.numeric(as.logical(round(sigmoid(dat$x)+rnorm(5000)*.3,0)))

以下是与生成它的 sigmoid 曲线重叠的数据:

< img src="https://i.sstatic.net/N70sI.png" alt="Data">

在观察群体中的二元行为时,此类数据很常见。例如,这可能是客户是否购买了某些商品(y 轴上的二进制 1/0)与他们在网站上花费的时间(x 轴)的关系图。

使用大量的点来更好地展示这些函数的性能差异。

Smoothsplinesmooth.spline 在使用我尝试过的任何参数集的数据集上都会产生乱码,这可能是由于它们的原因倾向于映射到每个点,这不适用于噪声数据。

loesslowessapprox 函数都产生可用的结果,尽管只是勉强达到 approx。这是每个使用轻度优化参数的代码:

loessFit <- loess(y~x, dat, span = 0.6)
loessFit <- data.frame(x=loessFit$x,y=loessFit$fitted)
loessFit <- loessFit[order(loessFit$x),]

approxFit <- approx(dat,n = 15)

lowessFit <-data.frame(lowess(dat,f = .6,iter=1))

结果:

plot(dat,col='gray')
curve(sigmoid,0,200,add=TRUE,col='blue',)
lines(lowessFit,col='red')
lines(loessFit,col='green')
lines(approxFit,col='purple')
legend(150,.6,
       legend=c("Sigmoid","Loess","Lowess",'Approx'),
       lty=c(1,1),
       lwd=c(2.5,2.5),col=c("blue","green","red","purple"))

Fits

如您所见,lowess 与原始生成曲线产生近乎完美的拟合。 Loess 很接近,但在两个尾部都经历了奇怪的偏差。

尽管您的数据集会有很大不同,但我发现其他数据集的表现类似,loesslowess 都能够产生良好的结果。当您查看基准测试时,差异会变得更加明显:

> microbenchmark::microbenchmark(loess(y~x, dat, span = 0.6),approx(dat,n = 20),lowess(dat,f = .6,iter=1),times=20)
Unit: milliseconds
                           expr        min         lq       mean     median        uq        max neval cld
  loess(y ~ x, dat, span = 0.6) 153.034810 154.450750 156.794257 156.004357 159.23183 163.117746    20   c
            approx(dat, n = 20)   1.297685   1.346773   1.689133   1.441823   1.86018   4.281735    20 a  
 lowess(dat, f = 0.6, iter = 1)   9.637583  10.085613  11.270911  11.350722  12.33046  12.495343    20  b 

Loess 速度极慢,花费的时间是 大约 的 100 倍。 Lowess 产生的结果比 approx 更好,同时运行速度仍然相当快(比 loess 快 15 倍)。

随着点数的增加,Loess 也变得越来越陷入困境,在 50,000 左右变得无法使用。

编辑:其他研究表明,loess 更适合某些数据集。如果您正在处理小型数据集或不考虑性能,请尝试这两个函数并比较结果。

The other answers are all good approaches. However, there are a few other options in R that haven't been mentioned, including lowess and approx, which may give better fits or faster performance.

The advantages are more easily demonstrated with an alternate dataset:

sigmoid <- function(x)
{
  y<-1/(1+exp(-.15*(x-100)))
  return(y)
}

dat<-data.frame(x=rnorm(5000)*30+100)
dat$y<-as.numeric(as.logical(round(sigmoid(dat$x)+rnorm(5000)*.3,0)))

Here is the data overlaid with the sigmoid curve that generated it:

Data

This sort of data is common when looking at a binary behavior among a population. For example, this might be a plot of whether or not a customer purchased something (a binary 1/0 on the y-axis) versus the amount of time they spent on the site (x-axis).

A large number of points are used to better demonstrate the performance differences of these functions.

Smooth, spline, and smooth.spline all produce gibberish on a dataset like this with any set of parameters I have tried, perhaps due to their tendency to map to every point, which does not work for noisy data.

The loess, lowess, and approx functions all produce usable results, although just barely for approx. This is the code for each using lightly optimized parameters:

loessFit <- loess(y~x, dat, span = 0.6)
loessFit <- data.frame(x=loessFit$x,y=loessFit$fitted)
loessFit <- loessFit[order(loessFit$x),]

approxFit <- approx(dat,n = 15)

lowessFit <-data.frame(lowess(dat,f = .6,iter=1))

And the results:

plot(dat,col='gray')
curve(sigmoid,0,200,add=TRUE,col='blue',)
lines(lowessFit,col='red')
lines(loessFit,col='green')
lines(approxFit,col='purple')
legend(150,.6,
       legend=c("Sigmoid","Loess","Lowess",'Approx'),
       lty=c(1,1),
       lwd=c(2.5,2.5),col=c("blue","green","red","purple"))

Fits

As you can see, lowess produces a near perfect fit to the original generating curve. Loess is close, but experiences a strange deviation at both tails.

Although your dataset will be very different, I have found that other datasets perform similarly, with both loess and lowess capable of producing good results. The differences become more significant when you look at benchmarks:

> microbenchmark::microbenchmark(loess(y~x, dat, span = 0.6),approx(dat,n = 20),lowess(dat,f = .6,iter=1),times=20)
Unit: milliseconds
                           expr        min         lq       mean     median        uq        max neval cld
  loess(y ~ x, dat, span = 0.6) 153.034810 154.450750 156.794257 156.004357 159.23183 163.117746    20   c
            approx(dat, n = 20)   1.297685   1.346773   1.689133   1.441823   1.86018   4.281735    20 a  
 lowess(dat, f = 0.6, iter = 1)   9.637583  10.085613  11.270911  11.350722  12.33046  12.495343    20  b 

Loess is extremely slow, taking 100x as long as approx. Lowess produces better results than approx, while still running fairly quickly (15x faster than loess).

Loess also becomes increasingly bogged down as the number of points increases, becoming unusable around 50,000.

EDIT: Additional research shows that loess gives better fits for certain datasets. If you are dealing with a small dataset or performance is not a consideration, try both functions and compare the results.

你的他你的她 2024-09-20 22:51:36

在 ggplot2 中,您可以通过多种方式进行平滑,例如:

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "gam", formula = y ~ poly(x, 2)) 
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "loess", span = 0.3, se = FALSE) 

在此处输入图像描述
输入图片描述这里

In ggplot2 you can do smooths in a number of ways, for example:

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "gam", formula = y ~ poly(x, 2)) 
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  geom_smooth(method = "loess", span = 0.3, se = FALSE) 

enter image description here
enter image description here

违心° 2024-09-20 22:51:36

我没有看到这个方法,所以如果其他人想要这样做,我发现 ggplot 文档建议了一种使用 gam 方法的技术,该技术产生与 loess 当处理小数据集时。

library(ggplot2)
x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)

df <- data.frame(x,y)
r <- ggplot(df, aes(x = x, y = y)) + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"))+geom_point()
r

首先用loess方法和auto公式
第二个使用建议公式的 gam 方法

I didn't see this method shown, so if someone else is looking to do this I found that ggplot documentation suggested a technique for using the gam method that produced similar results to loess when working with small data sets.

library(ggplot2)
x <- 1:10
y <- c(2,4,6,8,7,8,14,16,18,20)

df <- data.frame(x,y)
r <- ggplot(df, aes(x = x, y = y)) + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"))+geom_point()
r

First with the loess method and auto formula
Second with the gam method with suggested formula

不可一世的女人 2024-09-20 22:51:36

另一种选择是使用 ggpubr 中的 ggscatter 函数> 包。通过指定 add="loess",您将获得数据的平滑线。在上面的链接中,您可以找到此功能的更多可能性。以下是使用 mtcars 数据集的可重现示例:

library(ggpubr)
ggscatter(data = mtcars,
          x = "wt",
          y = "mpg",
          add = "loess",
          conf.int = TRUE)
#> `geom_smooth()` using formula 'y ~ x'

创建于 2022 年 08 月 - 28 与 reprex v2.0.2

Another option is using the ggscatter function from the ggpubr package. By specifying add="loess", you will get a smoothed line through your data. In the link above you can find more possibilities with this function. Here is a reproducible example using the mtcars dataset:

library(ggpubr)
ggscatter(data = mtcars,
          x = "wt",
          y = "mpg",
          add = "loess",
          conf.int = TRUE)
#> `geom_smooth()` using formula 'y ~ x'

Created on 2022-08-28 with reprex v2.0.2

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文