在两个特定行之间计数行

发布于 2025-01-26 16:21:12 字数 945 浏览 1 评论 0原文

df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01", 
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10, 
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))

因此,我拥有4000 6000不同资产的投资者,对于每个投资者,我都有两个不同的变量的交易运营:运营告诉我他是否正在买卖;投资组合告诉我他在投资组合中有多少。

我想做的是计算职位在投资组合中保持开放的天数,因此我要计算投资组合返回零的一天的差异与投资组合获得的一天的差异(不是那样)可能获得负资产组合)。

因此,在上面的数据集中,我将计算row2 -row1 ==&gt; 2010-01-02-2010-01-01第5行-3 ==&gt; 2010-01-19-2010-01-03等...

我想为所有投资者进行此计算我在数据集中拥有的所有行中都有资产,这些行我发现portfolio&gt; 0

因此,我的数据集将有一个名为disuration的列,在这种情况下,该列是c(0,1,0,5,16)(所以我当然我当然还必须计算RAW1 -RAW1RAW3 -RAW3 -RAW3) 因此,我的问题是每次重新启动计数portfolio返回到零。

df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01", 
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10, 
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))

So I have 4000 investors with 6000 different assets, for each investor I have his trading operations in two different variables: operation tells me if he is buying/selling; portfolio tells me how much he has in the portfolio.

What I want to do is computing the number of days a position stays open in the portfolio, so I though about computing the difference between the day in which the portfolio goes back to zero and the day in which the portfolio went positive (it is not possible to get negative portfolio).

so in the dataset above I would count row2 - row1 ==> 2010-01-02 - 2010-01-01
and row 5 - row 3 ==> 2010-01-19 - 2010-01-03 and so on...

I want to do this computation for all the investor & asset I have in my dataset for all the rows in which I find that portfolio > 0.

So my dataset will have a further column called duration which would be equal, in this case to c(0,1,0,5,16) (so of course i also had to compute raw1 - raw1 and raw3 - raw3)
Hence my problem is to restart the count everytime portfolio goes back to zero.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

撩人痒 2025-02-02 16:21:12
library(dplyr)

df %>% 
  mutate(datetime = as.Date(datetime, "%Y-%m-%d")) %>% 
  group_by(investor, asset) %>% 
  arrange(datetime) %>% 
  mutate(grp.pos = cumsum(lag(portfolio, default = 1) == 0)) %>%
  group_by(investor, asset, grp.pos) %>% 
  mutate(`Open (#days)` = datetime - datetime[1])

#> # A tibble: 5 x 6
#> # Groups:   investor, asset, grp.pos [2]
#>   investor asset datetime   portfolio grp.pos `Open (#days)`
#>   <chr>    <chr> <date>         <dbl>   <int> <drtn>        
#> 1 INV_1    x     2010-01-01        10       0  0 days       
#> 2 INV_1    x     2010-01-02         0       0  1 days       
#> 3 INV_1    x     2010-01-03         5       1  0 days       
#> 4 INV_1    x     2010-01-08         2       1  5 days       
#> 5 INV_1    x     2010-01-19         0       1 16 days

数据:

df <- structure(list(investor = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), 
                     asset = c("x", "x", "x", "x", "x"), 
                     datetime = c("2010-01-01", "2010-01-02", "2010-01-03", 
                                  "2010-01-08", "2010-01-19"), 
                     portfolio = c(10, 0, 5, 2, 0)), 
                     operation = c(10, -10, 5, -3, -2), 
                class = "data.frame", row.names = c(NA, -5L))
library(dplyr)

df %>% 
  mutate(datetime = as.Date(datetime, "%Y-%m-%d")) %>% 
  group_by(investor, asset) %>% 
  arrange(datetime) %>% 
  mutate(grp.pos = cumsum(lag(portfolio, default = 1) == 0)) %>%
  group_by(investor, asset, grp.pos) %>% 
  mutate(`Open (#days)` = datetime - datetime[1])

#> # A tibble: 5 x 6
#> # Groups:   investor, asset, grp.pos [2]
#>   investor asset datetime   portfolio grp.pos `Open (#days)`
#>   <chr>    <chr> <date>         <dbl>   <int> <drtn>        
#> 1 INV_1    x     2010-01-01        10       0  0 days       
#> 2 INV_1    x     2010-01-02         0       0  1 days       
#> 3 INV_1    x     2010-01-03         5       1  0 days       
#> 4 INV_1    x     2010-01-08         2       1  5 days       
#> 5 INV_1    x     2010-01-19         0       1 16 days

Data:

df <- structure(list(investor = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), 
                     asset = c("x", "x", "x", "x", "x"), 
                     datetime = c("2010-01-01", "2010-01-02", "2010-01-03", 
                                  "2010-01-08", "2010-01-19"), 
                     portfolio = c(10, 0, 5, 2, 0)), 
                     operation = c(10, -10, 5, -3, -2), 
                class = "data.frame", row.names = c(NA, -5L))
国产ˉ祖宗 2025-02-02 16:21:12

这是我们可以做到的一种方式,如果需要,ass

首先,我们可以通过Inv> Inv将其用于原始数据集进行分组。然后将dateTime转换为日期格式以轻松进行计算(在这里我们使用ymd()函数)。

下一步可以以不同的方式完成:

主要思想是将组的最后一行指示的列portfolio分组。为此,我们在此中安排datetime降级表格可以轻松地使用cumsum == 0应用分组ID。

重新排列DateTime我们可以按预期计算最后一个:

library(dplyr)
library(lubridate)

df %>% 
  group_by(inv) %>% 
  mutate(datetime = ymd(datetime)) %>% 
  arrange(desc(datetime)) %>% 
  group_by(position_Group = cumsum(portfolio==0)) %>% 
  arrange(datetime) %>% 
  mutate(position_open = last(datetime)-first(datetime)) %>% 
  ungroup()
  inv   ass   datetime   portfolio operation id_Group position_open
  <chr> <chr> <date>         <dbl>     <dbl>    <int> <drtn>       
1 INV_1 x     2010-01-01        10        10        2  1 days      
2 INV_1 x     2010-01-02         0       -10        2  1 days      
3 INV_1 x     2010-01-03         5         5        1 16 days      
4 INV_1 x     2010-01-08         2        -3        1 16 days      
5 INV_1 x     2010-01-19         0        -2        1 16 days    

Here is a way how we could do it, that is expandable if necessary for ass

First we group by inv to use for the original dataset. Then transform datetime to date format to do calculations easily (here we use ymd() function).

The next step could be done in different ways:

Main idea is to group the column portfolio indicated by the last row of the group that is 0. For this we arrange datetime in descending form to easily apply the grouping id with cumsum == 0.

After rearranging datetime we can calculate the last from the first as intended:

library(dplyr)
library(lubridate)

df %>% 
  group_by(inv) %>% 
  mutate(datetime = ymd(datetime)) %>% 
  arrange(desc(datetime)) %>% 
  group_by(position_Group = cumsum(portfolio==0)) %>% 
  arrange(datetime) %>% 
  mutate(position_open = last(datetime)-first(datetime)) %>% 
  ungroup()
  inv   ass   datetime   portfolio operation id_Group position_open
  <chr> <chr> <date>         <dbl>     <dbl>    <int> <drtn>       
1 INV_1 x     2010-01-01        10        10        2  1 days      
2 INV_1 x     2010-01-02         0       -10        2  1 days      
3 INV_1 x     2010-01-03         5         5        1 16 days      
4 INV_1 x     2010-01-08         2        -3        1 16 days      
5 INV_1 x     2010-01-19         0        -2        1 16 days    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文