如何填充 R 数据框的行,其中每一行代表一天,并为一年中的每一天使用一个通用值?

发布于 2024-12-28 06:05:59 字数 1524 浏览 3 评论 0原文

R:如何填充数据框的行,其中每一行代表一天,每年有一个公共值?

我有一个数据框,其中包含日期列、价格列以及从这两列派生的各种其他列。其中一列计算给定年份中的每一天,相对于该年年初的价格变化百分比(这与之前的问题有关)。

我想添加一列,用于保存给定一年中的每一天,该年全年价格的百分比变化。因此,如果价格从 2009 年第一天到最后一天上涨 10%,则 2009 年所有天的列应保持 10%(或 0.1)的值。如果价格在 2010 年的第一天和最后一天之间下跌了 2%,则 2010 年每一天的列应保留值 -0.02,依此类推。

到目前为止我的代码是:

require(lubridate)
require(plyr)
# generate data
set.seed(12345)
df <- data.frame(date=seq(as.Date("2009/1/1"), by="day", length.out=1115),price=runif(1115, min=100, max=200))
# remove weekend days
df <- df[!(weekdays(as.Date(df$date)) %in% c('Saturday','Sunday')),]
# add some columns for later
df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df$month <- as.numeric(format(as.Date(df$date), format="%m"))
df$day <- as.numeric(format(as.Date(df$date), format="%d"))
df$daythisyear <- as.numeric(format(as.Date(df$date), format="%j"))
df <- transform(df, doy = as.Date(paste(2000, month, day, sep="/")))
df <- ddply(df, .(year), transform, pctchg = ((price/price[1])-1))

我意识到我可以通过使用另一个数据框来获得年度(同比)变化,如下所示:

df.yr <- ddply(df, .(year), function(x) (x[nrow(x),2]/x[1,2])-1)

...但我不知道如何添加数字现有数据框中某一列的年份,特别是考虑到(如果您使用的是 4 年的数据)只有 4 行,每年一行,而用于导出的每日数据的数据框中大约有 800 行这 4 行 - 你会得到不匹配的结果。

使用从数据框的最后一行开始并向后移动 daythisyear 列来实现此目的很简单(如果当前行的 daythisyear 大于下一行的 daythisyear,则年份发生变化,因此从该行获取新值以在要添加的列中使用等)。尽管如此,我确信一定有一种更通俗的 R 语言方法,使用 apply 函数或 ddply,到目前为止我一直在刻意避免解决这个问题。所以我的问题是:

问:如何计算列值的年度变化,然后将该值作为新列插入到当年的每一行中?

R: how can I populate the rows of a data frame, in which each row represents a day, with a single common value for each year?

I have a data frame consisting of a date column, a price column and then various other columns derived from those two columns. One of the columns calculates, for each day in a given year, the percentage change in the price from the beginning of that year (this is related to an earlier question).

I want to add a column that holds, for each day of a given year, the percentage change in the price for the whole of that year. So, if the price rose 10% from the first to the last day of 2009, the column for all the days of 2009 should hold the value 10% (or 0.1). If the price fell by 2% between the first and last days of 2010, the column for each day in 2010 should hold the value -0.02 and so on.

The code I have so far is:

require(lubridate)
require(plyr)
# generate data
set.seed(12345)
df <- data.frame(date=seq(as.Date("2009/1/1"), by="day", length.out=1115),price=runif(1115, min=100, max=200))
# remove weekend days
df <- df[!(weekdays(as.Date(df$date)) %in% c('Saturday','Sunday')),]
# add some columns for later
df$year <- as.numeric(format(as.Date(df$date), format="%Y"))
df$month <- as.numeric(format(as.Date(df$date), format="%m"))
df$day <- as.numeric(format(as.Date(df$date), format="%d"))
df$daythisyear <- as.numeric(format(as.Date(df$date), format="%j"))
df <- transform(df, doy = as.Date(paste(2000, month, day, sep="/")))
df <- ddply(df, .(year), transform, pctchg = ((price/price[1])-1))

I realise that I can get the annual (year-on-year) change by using another data frame, something like this:

df.yr <- ddply(df, .(year), function(x) (x[nrow(x),2]/x[1,2])-1)

...but I can't work out how to add the figures for the years to a column in an existing data frame, particularly given that (if you are working with 4 years of data) there are only 4 rows, one for each year, compared to about 800 in the data frame of daily data used to derive those 4 rows - you get a mismatch.

It is straightforward to use a for loop starting at the last row of the data frame and moving back up the daythisyear column to achieve this (if daythisyear on current row is larger than daythisyear on the row below, you have a change in year, so take new value from that row to use in the column being added etc). Nevertheless, I feel sure there must be a more R-colloquial approach using an apply function or ddply, which I have so far studiously avoided tackling. So my question is:

Q. How do I calculate the annual change in the value of a column and then insert that value, as a new column, into every row for that year?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

似狗非友 2025-01-04 06:05:59

我还没有转换为 ddply 用户,当它是明显的解决方案时,我更喜欢使用 ave 。我怀疑这段代码会翻译成:

 df$pctYrChng <- ave(df$price, df$year, FUN=function(x) tail(x,1)/head(x,1) - 1)
 unique(df$pctYrChng)
#[1] -0.03259032 -0.05781901  0.35932519  0.04246669

I've not yet converted to being a ddply user, preferring instead to use ave when it is the obvious solution. I suspect that this code would translate across:

 df$pctYrChng <- ave(df$price, df$year, FUN=function(x) tail(x,1)/head(x,1) - 1)
 unique(df$pctYrChng)
#[1] -0.03259032 -0.05781901  0.35932519  0.04246669
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文