面板数据的滞后并引导数据范围的变量在1个月和6个工作日

发布于 2025-01-26 22:14:26 字数 1899 浏览 3 评论 0原文

我有一个大型面板数据集，我想在1个月和6个工作日之前滞后并领导变量。我知道，例如，来自dplyr有lag或lead函数。但是，我还需要根据面板数据中的“名称”对数据进行分组。

我的数据看起来像：

structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019", 
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019", 
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A", 
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L, 
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L, 
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L, 
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))

这只是一个简化的版本。我的真实数据持续时间为2018年1月8日至31.12.2021。我只能在1个月和6个工作日之前落后和领导称为“评级”的变量？

我的困难是我有1个月零6个工作日，而不仅仅是数据框架中的一个变量。所有其他变量不应调整。

到目前为止，我尝试了这一点：

Data_2  <- Data  %>%
  group_by(Name)  %>%
  lag('Rating')

Data_3  <- Data  %>%
  group_by(Name)  %>%
  lead('Rating')

但这不是我的目标。

编辑：

在铅的情况下，我的输出应该像这样：（我只是使用前5行来说明）

structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018", 
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here", 
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here", 
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))

因此，对于第1行，我增加了1个月和6个工作日，这给了我10.09.2018等。然后，“评分”将是2018年1月8日的“评分”，但“大小”将是实际上也在2018年10月10日报告的数字。然后，我想做同样的事情，但是向后1个月零6个工作日。

原文

I have a large panel data set and I would like to lag and lead a variable by 1 month and 6 business days.
I know, for instance, from dplyr there is the lag or lead function. However, I also need to group by data based on the "Names" in the panel data.

My data look like this:

structure(list(Date = c("01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "06.04.2019", "07.04.2019", "08.04.2019", 
"01.08.2018", "02.08.2018", "03.08.2018", "04.08.2018", "06.04.2019", 
"07.04.2019", "08.04.2019", "01.08.2018", "02.08.2018", "03.08.2018", 
"04.08.2018", "05.08.2018", "07.04.2019", "08.04.2019"), Name = c("A", 
"A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C"), Rating = c(1L, 1L, 1L, 
3L, 3L, 4L, 4L, 4L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 
5L, 5L, 5L), Size = c(1234L, 24123L, 23L, 1L, 23L, 3L, 23L, 4L, 
323L, 3424L, 523L, 234L, 35L, 354L, 45L, 23L, 46L, 456L, 546L, 
24L, 134L, 1L)), class = "data.frame", row.names = c(NA, -22L
))

It is just a simplified version. My real data lasts from 01.08.2018 to 31.12.2021. How can I only lag and lead the variable called "Rating" by 1 month and 6 business days?

My difficulty is that I have 1 month and 6 business days and not just one variable in the dataframe. All the other variables should not be adjusted.

So far I tried this:

Data_2  <- Data  %>%
  group_by(Name)  %>%
  lag('Rating')

Data_3  <- Data  %>%
  group_by(Name)  %>%
  lead('Rating')

But this is not what I am aiming for.

EDIT:

My output should look like this in the case of lead:
(I just used the first 5 rows to illustrate)

structure(list(Date = c("10.09.2018", "11.09.2018", "12.09.2018", 
"13.09.2018", "14.09.2018"), Name = c("A", "A", "A", "A", "A"
), Rating = c(1L, 1L, 1L, 3L, 3L), Size = c("Size from 10.09.2018 would be here", 
"Size from 11.09.2018 would be here", "Size from 12.09.2018 would be here", 
"Size from 13.09.2018 would be here", "Size from 14.09.2018 would be here"
)), class = "data.frame", row.names = c(NA, -5L))

So for row 1 I added 1 month and 6 business days which gives me 10.09.2018 and so on. The "Rating" will then be the one from 01.08.2018 but the "Size" will be the figure that was actually also reported on 10.09.2018.
Then, I would like to do the same but go backwards 1 month and 6 business days.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沩ん囻菔务 2025-02-02 22:14:26

这是一种适用于“ X天后”的方法。在这种情况下，我在2天后使用了您的数据来证明您的数据，但是35天后，可以在5周后进行＃，这可能是一周中的同一天，因此大部分时间都应该是另一个“营业日”。

# Convert dates to a date format that can be calculated upon
Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
         
Data2 %>%
  mutate(Date_future = Date + 2) %>%
  left_join(Data2, by = c("Name", "Date_future" = "Date"), 
            suffix = c("_now", "_future"))
  # pipe into line below to just show selected columns
  # select(Date_future, Name, Rating_now, Size_future)

结果

        Date Name Rating_now Size_now Date_future Rating_future Size_future
1  2018-08-01    A          1     1234  2018-08-03             1          23
2  2018-08-02    A          1    24123  2018-08-04             3           1
3  2018-08-03    A          1       23  2018-08-05             3          23
4  2018-08-04    A          3        1  2018-08-06            NA          NA
5  2018-08-05    A          3       23  2018-08-07            NA          NA
6  2019-04-06    A          4        3  2019-04-08             4           4
7  2019-04-07    A          4       23  2019-04-09            NA          NA
8  2019-04-08    A          4        4  2019-04-10            NA          NA
9  2018-08-01    A          3      323  2018-08-03             1          23
10 2018-08-02    B          3     3424  2018-08-04             2         234
11 2018-08-03    B          2      523  2018-08-05            NA          NA
12 2018-08-04    B          2      234  2018-08-06            NA          NA
13 2019-04-06    B          2       35  2019-04-08             1          45
14 2019-04-07    B          1      354  2019-04-09            NA          NA
15 2019-04-08    B          1       45  2019-04-10            NA          NA
16 2018-08-01    C          1       23  2018-08-03             3         456
17 2018-08-02    C          3       46  2018-08-04             3         546
18 2018-08-03    C          3      456  2018-08-05             5          24
19 2018-08-04    C          3      546  2018-08-06            NA          NA
20 2018-08-05    C          5       24  2018-08-07            NA          NA
21 2019-04-07    C          5      134  2019-04-09            NA          NA
22 2019-04-08    C          5        1  2019-04-10            NA          NA

Here's an approach that would work for "x days later." In this case I use 2 days later to demonstrate on your data, but 35 days later might be good to get the 5 week later #, with same day of week and so should be another "business day" most of the time.

# Convert dates to a date format that can be calculated upon
Data2 <- Data %>% mutate(Date = lubridate::dmy(Date))
         
Data2 %>%
  mutate(Date_future = Date + 2) %>%
  left_join(Data2, by = c("Name", "Date_future" = "Date"), 
            suffix = c("_now", "_future"))
  # pipe into line below to just show selected columns
  # select(Date_future, Name, Rating_now, Size_future)

Result

        Date Name Rating_now Size_now Date_future Rating_future Size_future
1  2018-08-01    A          1     1234  2018-08-03             1          23
2  2018-08-02    A          1    24123  2018-08-04             3           1
3  2018-08-03    A          1       23  2018-08-05             3          23
4  2018-08-04    A          3        1  2018-08-06            NA          NA
5  2018-08-05    A          3       23  2018-08-07            NA          NA
6  2019-04-06    A          4        3  2019-04-08             4           4
7  2019-04-07    A          4       23  2019-04-09            NA          NA
8  2019-04-08    A          4        4  2019-04-10            NA          NA
9  2018-08-01    A          3      323  2018-08-03             1          23
10 2018-08-02    B          3     3424  2018-08-04             2         234
11 2018-08-03    B          2      523  2018-08-05            NA          NA
12 2018-08-04    B          2      234  2018-08-06            NA          NA
13 2019-04-06    B          2       35  2019-04-08             1          45
14 2019-04-07    B          1      354  2019-04-09            NA          NA
15 2019-04-08    B          1       45  2019-04-10            NA          NA
16 2018-08-01    C          1       23  2018-08-03             3         456
17 2018-08-02    C          3       46  2018-08-04             3         546
18 2018-08-03    C          3      456  2018-08-05             5          24
19 2018-08-04    C          3      546  2018-08-06            NA          NA
20 2018-08-05    C          5       24  2018-08-07            NA          NA
21 2019-04-07    C          5      134  2019-04-09            NA          NA
22 2019-04-08    C          5        1  2019-04-10            NA          NA

回复收藏 0 原文

~没有更多了~