在两个特定行之间计数行
df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01",
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10,
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))
因此,我拥有4000
6000
不同资产的投资者,对于每个投资者,我都有两个不同的变量的交易运营:运营告诉我他是否正在买卖;投资组合告诉我他在投资组合中有多少。
我想做的是计算职位在投资组合中保持开放的天数,因此我要计算投资组合返回零的一天的差异与投资组合获得的一天的差异(不是那样)可能获得负资产组合)。
因此,在上面的数据集中,我将计算row2 -row1 ==&gt; 2010-01-02-2010-01-01
和第5行-3 ==&gt; 2010-01-19-2010-01-03
等...
我想为所有投资者进行此计算我在数据集中拥有的所有行中都有资产,这些行我发现portfolio&gt; 0
。
因此,我的数据集将有一个名为disuration
的列,在这种情况下,该列是c(0,1,0,5,16)
(所以我当然我当然还必须计算RAW1 -RAW1
和RAW3 -RAW3 -RAW3
) 因此,我的问题是每次重新启动计数portfolio
返回到零。
df <- structure(list(inv = c("INV_1", "INV_1", "INV_1", "INV_1", "INV_1"), ass = c("x", "x", "x", "x", "x"), datetime = c("2010-01-01",
"2010-01-02", "2010-01-03", "2010-01-08", "2010-01-19"), portfolio = c(10,
0, 5, 2, 0)), operation = c(10, -10, 5, -3, -2), class = "data.frame", row.names = c(NA, -5L))
So I have 4000
investors with 6000
different assets, for each investor I have his trading operations in two different variables: operation tells me if he is buying/selling; portfolio tells me how much he has in the portfolio.
What I want to do is computing the number of days a position stays open in the portfolio, so I though about computing the difference between the day in which the portfolio goes back to zero and the day in which the portfolio went positive (it is not possible to get negative portfolio).
so in the dataset above I would count row2 - row1 ==> 2010-01-02 - 2010-01-01
and row 5 - row 3 ==> 2010-01-19 - 2010-01-03
and so on...
I want to do this computation for all the investor & asset I have in my dataset for all the rows in which I find that portfolio > 0
.
So my dataset will have a further column called duration
which would be equal, in this case to c(0,1,0,5,16)
(so of course i also had to compute raw1 - raw1
and raw3 - raw3
)
Hence my problem is to restart the count everytime portfolio
goes back to zero.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
数据:
Data:
这是我们可以做到的一种方式,如果需要,
ass
首先,我们可以通过
Inv> Inv
将其用于原始数据集进行分组。然后将dateTime
转换为日期格式以轻松进行计算(在这里我们使用ymd()
函数)。下一步可以以不同的方式完成:
主要思想是将组的最后一行指示的列
portfolio
分组。为此,我们在此中安排datetime
降级表格可以轻松地使用cumsum == 0
应用分组ID。重新排列
DateTime
我们可以按预期计算最后一个:Here is a way how we could do it, that is expandable if necessary for
ass
First we group by
inv
to use for the original dataset. Then transformdatetime
to date format to do calculations easily (here we useymd()
function).The next step could be done in different ways:
Main idea is to group the column
portfolio
indicated by the last row of the group that is 0. For this we arrangedatetime
in descending form to easily apply the grouping id withcumsum == 0
.After rearranging
datetime
we can calculate the last from the first as intended: