面板数据的滞后并引导数据范围的变量在1个月和6个工作日
我有一个大型面板数据集,我想在1个月和6个工作日之前滞后并领导变量。 我知道,例如,来自dplyr
有lag
或lead
函数。但是,我还需要根据面板数据中的“名称”对数据进行分组。
我的数据看起来像:
structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019",
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019",
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A",
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L,
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L,
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L,
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))
这只是一个简化的版本。我的真实数据持续时间为2018年1月8日至31.12.2021。我只能在1个月和6个工作日之前落后和领导称为“评级”的变量?
我的困难是我有1个月零6个工作日,而不仅仅是数据框架中的一个变量。所有其他变量不应调整。
到目前为止,我尝试了这一点:
Data_2 <- Data %>%
group_by(Name) %>%
lag('Rating')
Data_3 <- Data %>%
group_by(Name) %>%
lead('Rating')
但这不是我的目标。
编辑:
在铅的情况下,我的输出应该像这样: (我只是使用前5行来说明)
structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018",
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here",
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here",
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))
因此,对于第1行,我增加了1个月和6个工作日,这给了我10.09.2018等。然后,“评分”将是2018年1月8日的“评分”,但“大小”将是实际上也在2018年10月10日报告的数字。 然后,我想做同样的事情,但是向后1个月零6个工作日。
I have a large panel data set and I would like to lag and lead a variable by 1 month and 6 business days.
I know, for instance, from dplyr
there is the lag
or lead
function. However, I also need to group by data based on the "Names" in the panel data.
My data look like this:
structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019",
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019",
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018",
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A",
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L,
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L,
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L,
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L,
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))
It is just a simplified version. My real data lasts from 01.08.2018 to 31.12.2021. How can I only lag and lead the variable called "Rating" by 1 month and 6 business days?
My difficulty is that I have 1 month and 6 business days and not just one variable in the dataframe. All the other variables should not be adjusted.
So far I tried this:
Data_2 <- Data %>%
group_by(Name) %>%
lag('Rating')
Data_3 <- Data %>%
group_by(Name) %>%
lead('Rating')
But this is not what I am aiming for.
EDIT:
My output should look like this in the case of lead:
(I just used the first 5 rows to illustrate)
structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018",
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here",
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here",
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))
So for row 1 I added 1 month and 6 business days which gives me 10.09.2018 and so on. The "Rating" will then be the one from 01.08.2018 but the "Size" will be the figure that was actually also reported on 10.09.2018.
Then, I would like to do the same but go backwards 1 month and 6 business days.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一种适用于“ X天后”的方法。在这种情况下,我在2天后使用了您的数据来证明您的数据,但是35天后,可以在5周后进行#,这可能是一周中的同一天,因此大部分时间都应该是另一个“营业日”。
结果
Here's an approach that would work for "x days later." In this case I use 2 days later to demonstrate on your data, but 35 days later might be good to get the 5 week later #, with same day of week and so should be another "business day" most of the time.
Result