随时间变化的条件计数？

发布于 2024-09-29 16:58:04 字数 710 浏览 4 评论 0原文

我想计算二进制因子变量的变化次数。对于每个用户 ID，此变量可能会不时地来回更改多次。现在我想计算给定时间跨度内每个用户 ID 对该变量的更改次数。

数据按 id、年份、月份、myfactor 排序。我在 MySQL 中尝试过，但到目前为止还没有成功。 R 有没有简单的方法？我想在我的 data.frame 中添加另一列并逐步添加条件...也许一些 %in% 的东西？

提前感谢您的建议...

嗯，当然...这里有一些例子 – 抱歉没有立即提供它，我的头很痛;)：

   myf   Year    month userid   
  1 A    2005       1    260           
  2 B    2005       2    260           
  3 B    2005       4    260           
  4 A    2005       5    260           
  5 B    2005       6    260           
  6 B    2005       1    261

如果这是我的数据集，我想更新更改列，计算每个用户 myf 的更改次数。基本上我喜欢以：

  user  changes
   260     3
   260     0

等等......

HTH

原文

I´d like to count the number of changes of binary factor variable. This variable can change from time to time back and forth multiple times for every user id. Now I´d like to count he number of changes per user id to this variable over a given timespan.

The data is sorted by id,year,month,myfactor. I tried this in MySQL but had no success so far.
Is there an easy way to do it in R? I though about adding another column to my data.frame and adding up conditions step by step... Maybe some %in% stuff ?

Thx in advance for suggestions...

Hmm, of course... here´s some example – sorry for not providing it immediately, my head hurts ;):

   myf   Year    month userid   
  1 A    2005       1    260           
  2 B    2005       2    260           
  3 B    2005       4    260           
  4 A    2005       5    260           
  5 B    2005       6    260           
  6 B    2005       1    261

if this is my dataset, I want to update the changes column, counting the number of changes of myf per user. Basically id like to end up with:

  user  changes
   260     3
   260     0

and so forth...

HTH

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亣腦蒛氧 2024-10-06 16:58:04

另一项编辑：

鉴于您对其他解决方案的回应，您可以在一行中得到您想要的内容：

Data$extra <- ave(as.integer(Data$myf),Data$id,FUN=function(x) sum(diff(x)!=0))

在这种情况下不需要合并。

“在给定时间跨度内”意味着您可以选择一个时间跨度，然后应用该函数。约书亚的回答是最快的方法。有一个更通用的函数 rle，可以为您提供有关游程长度和值的更多信息。请务必检查一下。

根据 Joshuas 的回答，此示例向您展示如何轻松使用日期来选择给定的时间跨度。

编辑：我更新了答案，向您展示如何轻松地将年份和月份列转换为日期。当将整个事情应用于像您这样的因素时，您还应该使用 as.numeric 。

#Testdata
set.seed(21)
Data <- data.frame(id=rep(letters[1:3],each=24),
                   year= rep(rep(c(2005,2006),each=12),6),
                   month=rep(1:12,6),
                   myf=sample(c("A","B"),24*3,TRUE))

#transformation
Data$dates <- as.Date(paste(Data$year,Data$month,"1",sep="-"))
#function

cond.count <- function(from,to,data){
    x <- data[data$dates>from & data$dates<to,]
    tapply(as.numeric(x$myf),x$id,function(y)sum(diff(y)!=0))
}

#example
from <- as.Date("2005-01-01")
to <- as.Date("2006-04-15")

cond.count(from,to,Data)

Another edit :

Given your responses on the other solutions, you could get what you want in one line:

Data$extra <- ave(as.integer(Data$myf),Data$id,FUN=function(x) sum(diff(x)!=0))

No merge needed in this case.

"Over a given timespan" means that you could select a timespan and then apply the function. Joshuas answer is the fastest way around. There's a more general function that gives you more information on run lengths and values, rle. Be sure to check that one out.

Based on Joshuas answer, this example shows you how you can easily work with the dates to select a given timespan.

Edit: I updated the answer to show you how to easily convert your columns year and month into a date. You should also use as.numeric when applying the whole thing on a factor like yours.

#Testdata
set.seed(21)
Data <- data.frame(id=rep(letters[1:3],each=24),
                   year= rep(rep(c(2005,2006),each=12),6),
                   month=rep(1:12,6),
                   myf=sample(c("A","B"),24*3,TRUE))

#transformation
Data$dates <- as.Date(paste(Data$year,Data$month,"1",sep="-"))
#function

cond.count <- function(from,to,data){
    x <- data[data$dates>from & data$dates<to,]
    tapply(as.numeric(x$myf),x$id,function(y)sum(diff(y)!=0))
}

#example
from <- as.Date("2005-01-01")
to <- as.Date("2006-04-15")

cond.count(from,to,Data)

回复收藏 0 原文

我家小可爱 2024-10-06 16:58:04

#Some data
dfr <- data.frame(
   binary_variable = runif(100) < .7,
   id = sample(7, 100, replace = TRUE)
)

#Split by id
split_by_id <- with(dfr, split(binary_variable, id))

#Number of changes
sapply(split_by_id, function(x) sum(diff(x) != 0))

#Some data
dfr <- data.frame(
   binary_variable = runif(100) < .7,
   id = sample(7, 100, replace = TRUE)
)

#Split by id
split_by_id <- with(dfr, split(binary_variable, id))

#Number of changes
sapply(split_by_id, function(x) sum(diff(x) != 0))

回复收藏 0 原文

若沐 2024-10-06 16:58:04

这是我的猜测。

set.seed(21)
Data <- data.frame(id=sample(letters[1:3],20,TRUE),
                   date=sample(1:3,20,TRUE),
                   myfactor=sample(0:1,20,TRUE))
Data <- Data[order(Data$id,Data$date),]

DataCh <- aggregate(Data[,"myfactor",FALSE],
            by=Data[,c("id","date")], function(x) sum(diff(x)!=0))
DataCh <- DataCh[order(DataCh$id,DataCh$date),]

编辑：这是示例数据的更新。

lines <- "   myf   Year    month userid   
 1 A    2005       1    260           
 2 B    2005       2    260           
 3 B    2005       4    260           
 4 A    2005       5    260           
 5 B    2005       6    260           
 6 B    2005       1    261 "

Data <- read.table(con <- textConnection(lines)); close(con)

DataCh <- aggregate(Data[,"myf",FALSE],
            by=Data[,"userid",FALSE], function(x) sum(diff(unclass(x))!=0))

merge(Data,DataCh,by="userid",suffixes=c("",".change"))
#   userid myf Year month myf.change
# 1    260   A 2005     1          3
# 2    260   B 2005     2          3
# 3    260   B 2005     4          3
# 4    260   A 2005     5          3
# 5    260   B 2005     6          3
# 6    261   B 2005     1          0

Here's my guess.

set.seed(21)
Data <- data.frame(id=sample(letters[1:3],20,TRUE),
                   date=sample(1:3,20,TRUE),
                   myfactor=sample(0:1,20,TRUE))
Data <- Data[order(Data$id,Data$date),]

DataCh <- aggregate(Data[,"myfactor",FALSE],
            by=Data[,c("id","date")], function(x) sum(diff(x)!=0))
DataCh <- DataCh[order(DataCh$id,DataCh$date),]

EDIT: Here's an update with your example data.

lines <- "   myf   Year    month userid   
 1 A    2005       1    260           
 2 B    2005       2    260           
 3 B    2005       4    260           
 4 A    2005       5    260           
 5 B    2005       6    260           
 6 B    2005       1    261 "

Data <- read.table(con <- textConnection(lines)); close(con)

DataCh <- aggregate(Data[,"myf",FALSE],
            by=Data[,"userid",FALSE], function(x) sum(diff(unclass(x))!=0))

merge(Data,DataCh,by="userid",suffixes=c("",".change"))
#   userid myf Year month myf.change
# 1    260   A 2005     1          3
# 2    260   B 2005     2          3
# 3    260   B 2005     4          3
# 4    260   A 2005     5          3
# 5    260   B 2005     6          3
# 6    261   B 2005     1          0

回复收藏 0 原文

~没有更多了~