如何根据R中识别向量的元素为数据帧分配重复次数?
我有一个数据框,其中为个人分配了一个文本 ID,该文本 ID 将地名与个人 ID 连接起来(参见下面的数据)。最终,我需要将数据集从“长”转换为“宽”(例如,使用“重塑”),以便每个个体仅包含一行。为了做到这一点,我需要分配一个“时间”变量,重塑可以用来识别随时间变化的协变量等。我有(可能很糟糕)代码来为重复最多两次的个体执行此操作,但需要能够识别最多 18 个重复事件。如果我删除哈希前面的行,下面的代码可以正常工作,但最多只能识别两次重复。如果我保留该行(这对于重复两次以上的个人来说似乎是必要的),R 会窒息,并给出以下错误(大概是因为第一个人只重复两次):
Error in if (data$uid[i] == data$uid[i - 2]) { :
argument is of length zero
任何人都可以帮忙解决这个问题吗?提前致谢!
place <- rep("ny",10)
pid <- c(1,1,2,2,2,3,4,4,5,5)
uid<- paste(place,pid,sep="")
time <- rep(0,10)
data <- cbind(uid,time)
data <- as.data.frame(data)
data$time <- as.numeric(data$time)
#bad code
data$time[1] <- 1 #need to set first so that loop doesn't go to a row that doesn't exist (i.e., row 0)
for (i in 2:NROW(data)){
data$time[i] <- 1 #set first occurrence to 1
if (data$uid[i] == data$uid[i-1]) {data$time[i] <- 2} #set second occurrence to 2, etc.
#if (data$uid[i] == data$uid[i-2]) {data$time[i] <- 3}
i <- i+1
}
I have a dataframe with individuals assigned a text id that concatenates a place-name with a personal id (see data, below). Ultimately, I need to do a transformation of the data set from "long" to "wide" (e.g., using "reshape") so that each individual comprises one row, only. In order to do that, I need to assign a "time" variable that reshape can use to identify time-varying covariates, etc. I have (probably bad) code to do this for individuals that repeat up to two times, but need to be able to identify up to 18 repeated occurrences. The code below works fine if I remove the line preceded by the hash, but only identifies up to two repeats. If I leave that line in (which would seem necessary for individuals repeated more than twice), R chokes, giving the following error (presumably because the first individual is repeated only twice):
Error in if (data$uid[i] == data$uid[i - 2]) { :
argument is of length zero
Can anyone help with this? Thanks in advance!
place <- rep("ny",10)
pid <- c(1,1,2,2,2,3,4,4,5,5)
uid<- paste(place,pid,sep="")
time <- rep(0,10)
data <- cbind(uid,time)
data <- as.data.frame(data)
data$time <- as.numeric(data$time)
#bad code
data$time[1] <- 1 #need to set first so that loop doesn't go to a row that doesn't exist (i.e., row 0)
for (i in 2:NROW(data)){
data$time[i] <- 1 #set first occurrence to 1
if (data$uid[i] == data$uid[i-1]) {data$time[i] <- 2} #set second occurrence to 2, etc.
#if (data$uid[i] == data$uid[i-2]) {data$time[i] <- 3}
i <- i+1
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
目前尚不清楚您要做什么,但我认为您是说您需要为每行按每个唯一的
uid
创建一个时间索引。是这样吗?如果是这样,请尝试一下,
会给你类似的结果:
It's unclear what you are trying to do, but I think you're saying that you need to create a time index for each row by every unique
uid
. Is that right?If so, give this a whirl
Will give you something like:
这是你的想法吗?
Is this what you have in mind?
使用您的数据框设置:
您可以使用:
获取:
注意:您的 data.frame 必须首先按 uid 排序才能正常工作。
Using your data frame setup:
You can use:
To get:
NOTE: Your data.frame MUST be sorted by uid first for this to work.
在大型数据集上尝试上述解决方案后,我决定为此编写自己的循环。这是非常耗时的,并且仍然需要将数据分解为 50k 元素向量,但它最终确实起作用了:
感谢大家的帮助。
After trying the above solutions on large data sets, I decided to write my own loop for this. It was very time-consuming and still required the data to be broken into 50k-element vectors, but it did work in the end:
Thanks to all for your help.