随时间变化的条件计数?
我想计算二进制因子变量的变化次数。对于每个用户 ID,此变量可能会不时地来回更改多次。现在我想计算给定时间跨度内每个用户 ID 对该变量的更改次数。
数据按 id、年份、月份、myfactor 排序。我在 MySQL 中尝试过,但到目前为止还没有成功。 R 有没有简单的方法?我想在我的 data.frame 中添加另一列并逐步添加条件...也许一些 %in% 的东西?
提前感谢您的建议...
嗯,当然...这里有一些例子 – 抱歉没有立即提供它,我的头很痛;):
myf Year month userid
1 A 2005 1 260
2 B 2005 2 260
3 B 2005 4 260
4 A 2005 5 260
5 B 2005 6 260
6 B 2005 1 261
如果这是我的数据集,我想更新更改列,计算每个用户 myf 的更改次数。基本上我喜欢以:
user changes
260 3
260 0
等等......
HTH
I´d like to count the number of changes of binary factor variable. This variable can change from time to time back and forth multiple times for every user id. Now I´d like to count he number of changes per user id to this variable over a given timespan.
The data is sorted by id,year,month,myfactor. I tried this in MySQL but had no success so far.
Is there an easy way to do it in R? I though about adding another column to my data.frame and adding up conditions step by step... Maybe some %in% stuff ?
Thx in advance for suggestions...
Hmm, of course... here´s some example – sorry for not providing it immediately, my head hurts ;):
myf Year month userid
1 A 2005 1 260
2 B 2005 2 260
3 B 2005 4 260
4 A 2005 5 260
5 B 2005 6 260
6 B 2005 1 261
if this is my dataset, I want to update the changes column, counting the number of changes of myf per user. Basically id like to end up with:
user changes
260 3
260 0
and so forth...
HTH
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
另一项编辑:
鉴于您对其他解决方案的回应,您可以在一行中得到您想要的内容:
在这种情况下不需要合并。
“在给定时间跨度内”意味着您可以选择一个时间跨度,然后应用该函数。约书亚的回答是最快的方法。有一个更通用的函数
rle
,可以为您提供有关游程长度和值的更多信息。请务必检查一下。根据 Joshuas 的回答,此示例向您展示如何轻松使用日期来选择给定的时间跨度。
编辑:我更新了答案,向您展示如何轻松地将年份和月份列转换为日期。当将整个事情应用于像您这样的因素时,您还应该使用
as.numeric
。Another edit :
Given your responses on the other solutions, you could get what you want in one line:
No merge needed in this case.
"Over a given timespan" means that you could select a timespan and then apply the function. Joshuas answer is the fastest way around. There's a more general function that gives you more information on run lengths and values,
rle
. Be sure to check that one out.Based on Joshuas answer, this example shows you how you can easily work with the dates to select a given timespan.
Edit: I updated the answer to show you how to easily convert your columns year and month into a date. You should also use
as.numeric
when applying the whole thing on a factor like yours.这是我的猜测。
编辑:这是示例数据的更新。
Here's my guess.
EDIT: Here's an update with your example data.