当前位置：文江博客话题详情

关于计算描述数据上限的函数的建议

发布于 2024-10-10 10:14:55 字数 203 浏览 0 评论 0原文

我有一个数据集的散点图，我有兴趣计算数据的上限。我不知道这是否是标准的统计方法，所以我正在考虑做的是将 X 轴数据分成小范围，计算这些范围的最大值，然后尝试识别一个函数来描述这些点。 R 中有一个函数可以做到这一点吗？

如果相关的话有92611分。

替代文本

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

勿忘初心 2024-10-17 10:14:55

您可能想研究分位数回归，可在 中找到quantreg 包。这是否有用取决于您是否想要“窗口”内的绝对最大值，即某些极端分位数（例如第 95 或第 99）是否可以接受？如果您不熟悉分位数回归，请考虑线性回归，它适合期望或平均响应的模型，以模型协变量为条件。中间分位数 (0.5) 的分位数回归将根据模型协变量将模型拟合到中值响应。

这是一个使用 quantreg 包的示例，向您展示我的意思。首先，生成一些类似于您显示的数据的虚拟数据：

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

接下来，将模型拟合到第 99 个百分位数（或 0.99 分位数）：

mod <- rq(Y ~ log(X), data = DF, tau = .99)

为了生成“拟合线”，我们根据模型在 中以 100 个等距值进行预测>X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

并将拟合模型添加到图中：

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

这应该给您：

quantile regression output

You might like to look into quantile regression, which is available in the quantreg package. Whether this is useful will depend on whether you want the absolute maximum within your "windows" are whether some extreme quantile, say 95th or 99th, is acceptable? If you are not familiar with quantile regression, then consider the linear regression which fits a model for the expectation or mean response, conditional upon the model covariates. Quantile regression for the middle quantile (0.5) would fit a model to the median response, conditional upon the model covariates.

Here is an example using the quantreg package, to show you what I mean. First, generate some dummy data similar to the data you show:

set.seed(1)
N <- 5000
DF <- data.frame(Y = rev(sort(rlnorm(N, -0.9))) + rnorm(N),
                 X = seq_len(N))
plot(Y ~ X, data = DF)

Next, fit the model to the 99th percentile (or the 0.99 quantile):

mod <- rq(Y ~ log(X), data = DF, tau = .99)

To generate the "fitted line", we predict from the model at 100 equally spaced values in X

pDF <- data.frame(X = seq(1, 5000, length = 100))
pDF <- within(pDF, Y <- predict(mod, newdata = pDF))

and add the fitted model to the plot:

lines(Y ~ X, data = pDF, col = "red", lwd = 2)

This should give you this:

quantile regression output

回复收藏 0 原文

两仪 2024-10-17 10:14:55

我会支持加文使用分位数回归的提名。您的数据可能会使用 X 和 Y 进行模拟，每个 X 和 Y 均呈对数正态分布。如果运行：

x <- rlnorm(1000, log(300), sdlog=1)
y<- rlnorm(1000, log(7), sdlog=1)
plot(x,y, cex=0.3)

alt text

您可能会考虑使用 qqplot （在基本绘图函数中）查看它们的单独分布，记住此类分布的尾部可能会以令人惊讶的方式表现。您应该对大部分值与特定分布的拟合程度比对极端值更感兴趣……当然，除非您的应用程序涉及金融或保险领域。我们不希望因为对尾部行为的不良建模假设而再次出现全球金融危机，不是吗？

qqplot(x, rlnorm(10000, log(300), sdlog=1) )

I would second Gavin's nomination for using quantile regression. Your data might be simulated with your X and Y each log-normally distributed. You can see what a plot of the joint distribution of two independent (no imposed correlation, but not necessarily cor(x,y)==0) log-normal variates looks like if you run:

x <- rlnorm(1000, log(300), sdlog=1)
y<- rlnorm(1000, log(7), sdlog=1)
plot(x,y, cex=0.3)

alt text

You might consider looking at their individual distributions with qqplot (in the base plotting functions) remembering that the tails of such distrubutions can behave in surprising manner. You should be more interested in how well the bulk of the values fit a particular distribution than the extremes ... unless of course your applications are in finance or insurance. Don't want another global financial crisis because of poor modeling assumptions about tail behavior, now do we?

qqplot(x, rlnorm(10000, log(300), sdlog=1) )

回复收藏 0 原文

~没有更多了~

关于作者

泛滥成性

暂无简介

0 文章

0 评论

1005 人气

关注发私信

友情链接

文江博客

关于计算描述数据上限的函数的建议

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

关于计算描述数据上限的函数的建议

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。