每次返回的密度值

发布于 2024-10-08 03:08:16 字数 703 浏览 0 评论 0原文

我有一个看起来像这样的数据框“foo”,

Date       Return
1998-01-01  0.02
1998-01-02  0.04
1998-01-03 -0.02
1998-01-04 -0.01
1998-01-05  0.02
...
1998-02-01  0.1
1998-02-02 -0.2
1998-02-03 -0.1
etc.

我想向该数据框添加一个新列,显示相应返回的密度值。我尝试过:

foo$density <- for(i in 1:length(foo$Return)) density(foo$Return, 
from = foo$Return[i], to = foo$Return[i], n = 1)$y

但没有成功。我真的很难将“函数”应用到每一行。但也许还有另一种方法可以做到这一点,而不是使用密度()?

我本质上想做的是将密度()中的拟合密度值提取到 foo 中的返回值。如果我只是做plot(密度(foo$Return)),它会给我曲线,但是我希望将密度值附加到返回值上。

@Joris:

foo$density <- density(foo$Return, n=nrow(foo$Return))$y 

计算一些东西,但似乎返回错误的密度值。

谢谢你帮我! 达尼

I have a dataframe "foo" looking like this

Date       Return
1998-01-01  0.02
1998-01-02  0.04
1998-01-03 -0.02
1998-01-04 -0.01
1998-01-05  0.02
...
1998-02-01  0.1
1998-02-02 -0.2
1998-02-03 -0.1
etc.

I would like to add to this dataframe a new column showing me the density value of the corresponding return. I tried:

foo$density <- for(i in 1:length(foo$Return)) density(foo$Return, 
from = foo$Return[i], to = foo$Return[i], n = 1)$y

But it didn't work. I really have difficulty applying a "function" to each row. But maybe there is also another way to do it, not using density()?

What I essentially would like to do is to extract the fitted density values from density() to the returns in foo. If I just do plot(density(foo$Return)) it gives me the curve, however I would like to have the density values attached to the returns.

@Joris:

foo$density <- density(foo$Return, n=nrow(foo$Return))$y 

calculates something, however seems to return wrong density values.

Thank you for helping me out!
Dani

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

余厌 2024-10-15 03:08:16

再想一想,忘记密度函数,我突然意识到你想做什么。大多数密度函数返回一个网格,因此不会为您提供精确点的评估。如果你想要这样,你可以使用 sm 包:

require(sm)
foo <- data.frame(Return=rpois(100,5))
foo$density <- sm.density(foo$Return,eval.points=foo$Return)$estimate
# the plot
id <- order(foo$Return)
hist(foo$Return,freq=F)
lines(foo$Return[id],foo$density[id],col="red")

如果不同值的数量不是那么大,你可以使用 ave() :

foo$counts <- ave(foo$Return,foo$Return,FUN=length)

如果目的是绘制 密度函数,不需要像你那样计算它。只需使用

plot(density(foo$Return))

Or 在下面添加直方图(注意选项 freq=F

hist(foo$Return,freq=F)
lines(density(foo$Return),col="red")

On second thought, forget about the density function, I suddenly realized what you wanted to do. Most density functions return a grid, so don't give you the evaluation in the exact points. If you want that, you can eg use the sm package:

require(sm)
foo <- data.frame(Return=rpois(100,5))
foo$density <- sm.density(foo$Return,eval.points=foo$Return)$estimate
# the plot
id <- order(foo$Return)
hist(foo$Return,freq=F)
lines(foo$Return[id],foo$density[id],col="red")

If the number of different values is not that big, you can use ave() :

foo$counts <- ave(foo$Return,foo$Return,FUN=length)

If the purpose is to plot a density function, there's no need to calculate it like you did. Just use

plot(density(foo$Return))

Or, to add a histogram underneath (mind the option freq=F)

hist(foo$Return,freq=F)
lines(density(foo$Return),col="red")
残疾 2024-10-15 03:08:16

sm.密度 的替代方法是在比默认更精细的网格上评估密度,并使用 approxapproxfun 给出插值您想要的Returns 处的密度。这是一个使用虚拟数据的示例:

set.seed(1)
foo <- data.frame(Date = seq(as.Date("2010-01-01"), as.Date("2010-12-31"),
                             by = "days"),
                  Returns = rnorm(365))
head(foo)
## compute the density, on fin grid (512*8 points)
dens <- with(foo, density(Returns, n = 512 * 8))

此时,我们可以使用 approx() 来插值返回密度的 xy 分量,但我更喜欢 approxfun() ,它做同样的事情,但返回一个我们可以用来进行插值的函数。首先,生成插值函数:

## x and y are components of dens, see str(dens)
BAR <- with(dens, approxfun(x = x, y = y))

现在您可以使用 BAR() 返回您希望的任意点的插值密度,例如第一个 Returns

> with(foo, BAR(Returns[1]))
[1] 0.3268715

要完成该示例,在Returns中添加每个数据的密度:

> foo <- within(foo, Density <- BAR(Returns))
> head(foo)
        Date    Returns   Density
1 2010-01-01 -0.6264538 0.3268715
2 2010-01-02  0.1836433 0.3707068
3 2010-01-03 -0.8356286 0.2437966
4 2010-01-04  1.5952808 0.1228251
5 2010-01-05  0.3295078 0.3585224
6 2010-01-06 -0.8204684 0.2490127

要查看插值的效果,我们可以绘制密度和插值版本并进行比较。请注意,我们必须对 Returns 进行排序,因为要达到我们想要的效果,lines 需要按递增顺序查看数据:

plot(dens)
with(foo, lines(sort(Returns), BAR(sort(Returns)), col = "red"))

这给出了类似这样的结果:
密度(黑色)和插值版本(红色)

只要在足够精细的一组点上评估密度(上例中为 512*8)您不应该有任何问题,并且很难区分插值版本和真实版本之间的差异。如果您的 Returns 值中有“间隙”,那么您可能会发现,由于 lines() 只是连接您要求其绘制的点,因此直线段可能不遵循间隙位置处的黑色密度。这只是间隙和lines()如何工作的人为因素,而不是插值的问题。

An alternative to sm.density is to evaluate the density on a finer grid than default, and use approx or approxfun to give the interpolated values of the density at the Returns you want. Here is an example with dummy data:

set.seed(1)
foo <- data.frame(Date = seq(as.Date("2010-01-01"), as.Date("2010-12-31"),
                             by = "days"),
                  Returns = rnorm(365))
head(foo)
## compute the density, on fin grid (512*8 points)
dens <- with(foo, density(Returns, n = 512 * 8))

At this point, we could use approx() to interpolate the x and y components of the returned density, but I prefer approxfun() which does the same thing, but returns a function which we can then use to do the interpolation. First, generate the interpolation function:

## x and y are components of dens, see str(dens)
BAR <- with(dens, approxfun(x = x, y = y))

Now you can use BAR() to return the interpolated density at any point you wish, e.g. for the first Returns:

> with(foo, BAR(Returns[1]))
[1] 0.3268715

To finish the example, add the density for each datum in Returns:

> foo <- within(foo, Density <- BAR(Returns))
> head(foo)
        Date    Returns   Density
1 2010-01-01 -0.6264538 0.3268715
2 2010-01-02  0.1836433 0.3707068
3 2010-01-03 -0.8356286 0.2437966
4 2010-01-04  1.5952808 0.1228251
5 2010-01-05  0.3295078 0.3585224
6 2010-01-06 -0.8204684 0.2490127

To see how well the interpolation is doing, we can plot the density and the interpolated version and compare. Note we have to sort Returns because to achieve the effect we want, lines needs to see the data in increasing order:

plot(dens)
with(foo, lines(sort(Returns), BAR(sort(Returns)), col = "red"))

Which gives something like this:
Density (in black) and interpolated version (in red)

As long as the density is evaluated at sufficiently fine a set of points (512*8 in the above example) you shouldn't have any problems and will be hard pushed to tell the difference between the interpolated version and the real thing. If you have "gaps" in the values of your Returns then you might find that, as lines() just joins the points you ask it to plot, that straight line segments might not follow the black density at the locations of the gaps. This is just an artefact of the gaps and how lines() works, not a problem with the interpolation.

月野兔 2024-10-15 03:08:16

如果我们忽略 @Joris 专业回答的密度问题,那么您似乎还没有掌握如何设置循环。从循环中返回的是值NULL。这是插入到 foo$密度 中的值,它不会起作用,因为它是 NULL,这意味着它是一个空组件,即它不就 R 而言,不存在。有关更多详细信息,请参阅?'for'

> bar <- for(i in 1:10) {
+     i + 1
+ }
> bar
NULL

> foo <- data.frame(A = 1:10, B = LETTERS[1:10])
> foo$density <- for(i in seq_len(nrow(foo))) {
+     i + 1
+ }
> head(foo) ## No `density`
  A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F

如果要为循环的每次迭代插入返回值,则必须在循环内部进行赋值,这意味着您应该在进入循环之前预先分配存储空间,例如在上面的循环中,如果我们想要在 1,...,10 中得到 i + 1 for i,我们可以这样做:

> bar <- numeric(length = 10)
> for(i in seq_along(bar)) {
+     bar[i] <- i + 1
+ }
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

当然,你不会做这样的计算这是通过循环进行的,因为 R 是矢量化的,并且可以处理数字向量,而不是像在 C 或其他编程语言中那样必须逐个元素地对每个计算元素进行编码。

> bar <- 1:10 + 1
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

请注意,R 已将 1 转换为足够长度的 1 向量,以允许计算继续进行,这在 R 中称为回收 -说话。

有时,您可能需要使用循环或使用 s|l|t|apply() 系列之一来迭代对象,但大多数情况下您会发现一个适用于整个向量的函数一次性收集大量数据。这是 R 相对于其他编程语言的优势之一,但确实需要您进入矢量化模式。

If we ignore the density issue, which @Joris expertly answers, you don't seem to have grasped how to set up a loop. What you are returning from the loop is the value NULL. This is the value that is being inserted in foo$density and that won't not work because it is the NULL, which means it is an empty component, i.e. it doesn't exists as far as R is concerned. See ?'for' for further details.

> bar <- for(i in 1:10) {
+     i + 1
+ }
> bar
NULL

> foo <- data.frame(A = 1:10, B = LETTERS[1:10])
> foo$density <- for(i in seq_len(nrow(foo))) {
+     i + 1
+ }
> head(foo) ## No `density`
  A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F

If you want to insert the return value for each iteration of the loop, you must do the assignment inside the loop, and that means you should pre-allocate the storage space before you enter the loop, e.g. the above loop if we wanted to have i + 1 for i in 1,...,10, we could do this:

> bar <- numeric(length = 10)
> for(i in seq_along(bar)) {
+     bar[i] <- i + 1
+ }
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

Of course, you would not do such a calculation as this via a loop, because R is vectorized and will work with vectors of numbers rather than you having to code each computation element by element as you might in C or other programming languages.

> bar <- 1:10 + 1
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

Notice that R has turned 1 into a vector of 1s of sufficient length to allow the computation to proceed, something known as recycling in R-speak.

Sometimes, you might need to iterate over an object with a loop or using one of the s|l|t|apply() family, but most often you will find a function that works for an entire vector of data in one go. This is one of the advantages of R over other programming languages, but does require you to get your head into vectorized mode.

孤城病女 2024-10-15 03:08:16

用它来获取密度值。

foo$density <- density(foo$Return, n=length(foo$Return))$y

Use this to obtain density values.

foo$density <- density(foo$Return, n=length(foo$Return))$y
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文