关于计算描述数据上限的函数的建议

发布于 2024-10-10 10:14:55 字数 203 浏览 0 评论 0原文

我有一个数据集的散点图,我有兴趣计算数据的上限。我不知道这是否是标准的统计方法,所以我正在考虑做的是将 X 轴数据分成小范围,计算这些范围的最大值,然后尝试识别一个函数来描述这些点。 R 中有一个函数可以做到这一点吗?

如果相关的话有92611分。

替代文本

I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this?

If it's relevant there are 92611 points.

alt text

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

勿忘初心 2024-10-17 10:14:55

您可能想研究分位数回归,可在 中找到quantreg 包。这是否有用取决于您是否想要“窗口”内的绝对最大值,即某些极端分位数(例如第 95 或第 99)是否可以接受?如果您不熟悉分位数回归,请考虑线性回归,它适合期望或平均响应的模型,以模型协变量为条件。中间分位数 (0.5) 的分位数回归将根据模型协变量将模型拟合到中值响应。

这是一个使用 quantreg 包的示例,向您展示我的意思。首先,生成一些类似于您显示的数据的虚拟数据:

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

接下来,将模型拟合到第 99 个百分位数(或 0.99 分位数):

mod <- rq(Y ~ log(X), data = DF, tau = .99)

为了生成“拟合线”,我们根据模型在 中以 100 个等距值进行预测>X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

并将拟合模型添加到图中:

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

这应该给您:

quantile regression output

You might like to look into quantile regression, which is available in the quantreg package. Whether this is useful will depend on whether you want the absolute maximum within your "windows" are whether some extreme quantile, say 95th or 99th, is acceptable? If you are not familiar with quantile regression, then consider the linear regression which fits a model for the expectation or mean response, conditional upon the model covariates. Quantile regression for the middle quantile (0.5) would fit a model to the median response, conditional upon the model covariates.

Here is an example using the quantreg package, to show you what I mean. First, generate some dummy data similar to the data you show:

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

Next, fit the model to the 99th percentile (or the 0.99 quantile):

mod <- rq(Y ~ log(X), data = DF, tau = .99)

To generate the "fitted line", we predict from the model at 100 equally spaced values in X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

and add the fitted model to the plot:

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

This should give you this:

quantile regression output

两仪 2024-10-17 10:14:55

我会支持加文使用分位数回归的提名。您的数据可能会使用 X 和 Y 进行模拟,每个 X 和 Y 均呈对数正态分布。如果运行:

x <- rlnorm(1000, log(300), sdlog=1)
y<- rlnorm(1000, log(7), sdlog=1)
plot(x,y, cex=0.3)

alt text

您可能会考虑使用 qqplot (在基本绘图函数中)查看它们的单独分布,记住此类分布的尾部可能会以令人惊讶的方式表现。您应该对大部分值与特定分布的拟合程度比对极端值更感兴趣……当然,除非您的应用程序涉及金融或保险领域。我们不希望因为对尾部行为的不良建模​​假设而再次出现全球金融危机,不是吗?

qqplot(x, rlnorm(10000, log(300), sdlog=1) )

I would second Gavin's nomination for using quantile regression. Your data might be simulated with your X and Y each log-normally distributed. You can see what a plot of the joint distribution of two independent (no imposed correlation, but not necessarily cor(x,y)==0) log-normal variates looks like if you run:

x <- rlnorm(1000, log(300), sdlog=1)
y<- rlnorm(1000, log(7), sdlog=1)
plot(x,y, cex=0.3)

alt text

You might consider looking at their individual distributions with qqplot (in the base plotting functions) remembering that the tails of such distrubutions can behave in surprising manner. You should be more interested in how well the bulk of the values fit a particular distribution than the extremes ... unless of course your applications are in finance or insurance. Don't want another global financial crisis because of poor modeling assumptions about tail behavior, now do we?

qqplot(x, rlnorm(10000, log(300), sdlog=1) )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文