R中循环中的数据框和替换问题
我在包含行程的数据集上使用 R。每条线路都是一次旅行(从 A 到 B)。在每行中,我都知道个人的身份(一个数字)、旅行的目的(1、2、3 或 4)、时间类别(1、2 或 3)以及识别该旅行的数字。行程已完成(行程是一组行程;所有这些行程都是从 A 到 A)。
我想创建一个新行:对于同一个人,上次旅行在不同旅行中的同一时间类别的目的是什么。该变量称为“prevDistanceSameTimeCategoryDifferentTour”。
我有这个错误:
错误
$<-.data.frame
(*tmp*
,"prevDistanceSameTimeCategoryDifferentTour", :替换有2行,数据有1167
这是我的代码:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[1,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
prevPersonTimeCategory[1,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
prevPersonTimeCategory[1,2] <- -999
}
}
}
else if (TgData$timeCategory[i] == 2) {
if (TgData$tour[i] == prevPersonTimeCategory[2,3]) {
if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
print(i)
prevPersonTimeCategory[2,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
prevPersonTimeCategory[2,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
prevPersonTimeCategory[2,2] <- -999
}
}
}
else if (TgData$timeCategory[i] == 3) {
if (TgData$tour[i] == prevPersonTimeCategory[3,3]) {
if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[3,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
prevPersonTimeCategory[3,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
prevPersonTimeCategory[3,2] <- -999
}
}
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] = -999
}
}
我正在创建一个数组来存储每个时间类别的信息。在此数组中,第一个值是个人的身份(prevPersonTimeCategory[1,1]、prevPersonTimeCategory[2,1]、prevPersonTimeCategory[3,1],每个时间类别一个),第二个值是目的(prevPersonTimeCategory[ 1,2]等),第三个是巡演编号(prevPersonTimeCategory[1,3], ETC。)。 然后我只是阅读每一行(for)并写一些条件(if)。
我真的不明白我哪里做错了。
我的数据集包含 36'784 行,但我正在测试 1932 行(-1 行标题)。数据看起来像这样:
PersonID purpose tour timeCategory
1 1 1 2
1 4 2 3
1 4 2 3
1 4 3 3
1 3 4 3
1 4 5 3
1 4 5 2
1 4 5 3
1 3 5 3
1 4 6 2
1 4 6 2
1 4 6 3
1 3 7 3
1 4 8 3
1 4 9 3
1 4 10 3
1 4 10 3
1 4 11 1
1 4 12 1
1 4 13 1
1 4 14 1
1 4 16 1
1 1 17 2
1 4 18 3
1 4 19 2
1 3 20 3
1 4 20 3
1 4 21 3
1 1 22 2
1 3 22 3
1 3 23 3
1 4 24 3
1 4 25 3
1 4 25 3
1 4 26 3
1 1 27 2
1 3 27 3
1 4 28 3
1 3 28 3
1 4 29 3
1 4 29 3
1 1 30 2
1 4 31 3
1 1 31 2
1 4 32 3
1 3 32 3
1 4 33 3
1 3 34 3
1 4 35 3
1 1 36 2
1 3 36 3
1 4 37 3
1 3 38 3
1 4 39 3
1 3 39 3
1 4 39 3
1 4 40 3
1 4 40 2
1 4 40 3
1 3 41 3
1 4 42 3
1 4 43 3
1 1 44 2
1 3 45 3
1 4 46 3
1 3 47 3
1 3 47 3
1 4 48 2
1 1 49 2
1 4 50 3
1 1 51 2
1 1 51 2
1 2 51 3
1 3 52 3
1 3 53 1
1 4 54 1
1 4 55 1
1 4 55 1
1 4 55 1
1 1 56 3
1 4 57 3
1 4 58 3
1 1 59 2
1 3 59 3
1 4 60 3
1 4 61 3
1 1 62 3
1 3 63 3
1 4 64 3
1 3 65 3
1 4 66 3
1 3 67 3
1 2 68 1
2 3 69 3
2 1 70 3
2 4 71 2
2 1 72 3
2 3 72 3
2 1 72 2
如果我运行这个简短版本的代码,我没有问题:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
}
}
但是如果我再添加几行,如下所示:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[1,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
prevPersonTimeCategory[1,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
prevPersonTimeCategory[1,2] <- -999
}
}
}
}
错误又回来了:
$<-.data.frame
(*tmp*
, "prevPurposeSameTimeCategoryDifferentTour", : 替换有 18 行,数据有1150
I'm using R on a dataset containing trips. Each line is a trip (from A to B). On each line, I know the identity of the individual (a number), the purpose of the trip (1,2,3 or 4), the time category (1,2 or 3) and a number identifying the tour in which the trip was done (a tour is a group of trips; all these trips go from A to A).
I would like to create a new row: for the same individual, what was the purpose of the previous trip in the same time category in a different tour. This variable is called "prevDistanceSameTimeCategoryDifferentTour".
I have this error:
Error in
$<-.data.frame
(*tmp*
,"prevDistanceSameTimeCategoryDifferentTour",
: replacement has 2 rows, data has 1167
Here is my code:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[1,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
prevPersonTimeCategory[1,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
prevPersonTimeCategory[1,2] <- -999
}
}
}
else if (TgData$timeCategory[i] == 2) {
if (TgData$tour[i] == prevPersonTimeCategory[2,3]) {
if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
print(i)
prevPersonTimeCategory[2,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
prevPersonTimeCategory[2,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
prevPersonTimeCategory[2,2] <- -999
}
}
}
else if (TgData$timeCategory[i] == 3) {
if (TgData$tour[i] == prevPersonTimeCategory[3,3]) {
if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[3,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
prevPersonTimeCategory[3,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
prevPersonTimeCategory[3,2] <- -999
}
}
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] = -999
}
}
I'm creating an array to store information for each time category. In this array, the first value is the identity of the individual (prevPersonTimeCategory[1,1], prevPersonTimeCategory[2,1], prevPersonTimeCategory[3,1], one for each time category), the second is the purpose (prevPersonTimeCategory[1,2], etc.), and the third is the tour number (prevPersonTimeCategory[1,3], etc.).
Then I'm just reading each line (for) and writing a few conditions (if).
I really don't see where I'm doing a mistake.
My dataset contains 36'784 lines, but I'm testing on 1932 lines (-1 line for headers). The data looks like this:
PersonID purpose tour timeCategory
1 1 1 2
1 4 2 3
1 4 2 3
1 4 3 3
1 3 4 3
1 4 5 3
1 4 5 2
1 4 5 3
1 3 5 3
1 4 6 2
1 4 6 2
1 4 6 3
1 3 7 3
1 4 8 3
1 4 9 3
1 4 10 3
1 4 10 3
1 4 11 1
1 4 12 1
1 4 13 1
1 4 14 1
1 4 16 1
1 1 17 2
1 4 18 3
1 4 19 2
1 3 20 3
1 4 20 3
1 4 21 3
1 1 22 2
1 3 22 3
1 3 23 3
1 4 24 3
1 4 25 3
1 4 25 3
1 4 26 3
1 1 27 2
1 3 27 3
1 4 28 3
1 3 28 3
1 4 29 3
1 4 29 3
1 1 30 2
1 4 31 3
1 1 31 2
1 4 32 3
1 3 32 3
1 4 33 3
1 3 34 3
1 4 35 3
1 1 36 2
1 3 36 3
1 4 37 3
1 3 38 3
1 4 39 3
1 3 39 3
1 4 39 3
1 4 40 3
1 4 40 2
1 4 40 3
1 3 41 3
1 4 42 3
1 4 43 3
1 1 44 2
1 3 45 3
1 4 46 3
1 3 47 3
1 3 47 3
1 4 48 2
1 1 49 2
1 4 50 3
1 1 51 2
1 1 51 2
1 2 51 3
1 3 52 3
1 3 53 1
1 4 54 1
1 4 55 1
1 4 55 1
1 4 55 1
1 1 56 3
1 4 57 3
1 4 58 3
1 1 59 2
1 3 59 3
1 4 60 3
1 4 61 3
1 1 62 3
1 3 63 3
1 4 64 3
1 3 65 3
1 4 66 3
1 3 67 3
1 2 68 1
2 3 69 3
2 1 70 3
2 4 71 2
2 1 72 3
2 3 72 3
2 1 72 2
If I run this short version of my code, I have no problems:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
}
}
But if I add a few more lines like here:
prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
if (TgData$timeCategory[i] == 1) {
if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
}
}
else {
if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
prevPersonTimeCategory[1,3] <- TgData$tour[i]
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
prevPersonTimeCategory[1,2] <- TgData$purpose[i]
}
else {
TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
prevPersonTimeCategory[1,2] <- -999
}
}
}
}
The error comes back:
Error in
$<-.data.frame
(*tmp*
,
"prevPurposeSameTimeCategoryDifferentTour", : replacement has 18
rows, data has 1150
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
按照 joran 建议的方式创建一个新的空列。
在开始循环之前运行此命令
TgData$prevPurposeSameTimeCategoryDifferentTour <- NA
Creating a new empty column as joran suggested works.
run this before you start the loop
TgData$prevPurposeSameTimeCategoryDifferentTour <- NA