根据 R 中 2 列中的值选择特定行

发布于 2024-12-27 17:18:05 字数 1113 浏览 4 评论 0原文

我有一个 GPS 项圈位置的大型数据集，每天都有不同数量的位置。我想仅分离出收集了单个位置的日期，并创建一个包含所有信息的新数据框。

month    day    easting    northing    time    ID
  6       1     #######    ########    0:00    ##
  6       2     #######    ########    6:00    ##
  6       2     #######    ########    0:00    ##
  6       3     #######    ########    18:00   ##
  6       3     #######    ########    12:00   ##
  6       4     #######    ########    0:00    ##
  6       5     #######    ########    6:00    ##

目前我已经将一些东西散列在一起，但无法完全进入下一步。

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

这给了我一个读数，看起来

    MONTH day freq
1       6  29    1
7       7   5    1
8       7   6    1
10      7   8    1
12      7  10    1

我正在尝试使用月份和日期的值从主数据集中提取包含它们的行，以便我可以制作一个仅包含频率为 1 的点的数据框，但包含所有关联数据。我已经到了这一点：

sis<-c(datasub1$MONTH)
bro<-c(datasub1$day)
datasub2<-subset(data1,MONTH==sis&day==bro)

...但这并没有给我任何东西，就我个人而言，它应该对包含 bro 和 sis 值的行进行子集化，这具有直观意义（R 初学者）。

任何帮助将不胜感激。

原文

I have a large data set of GPS collar locations that have a varying number of locations each day. I want to separate out only the days that have a single location collected and make a new data frame containing all their information.

month    day    easting    northing    time    ID
  6       1     #######    ########    0:00    ##
  6       2     #######    ########    6:00    ##
  6       2     #######    ########    0:00    ##
  6       3     #######    ########    18:00   ##
  6       3     #######    ########    12:00   ##
  6       4     #######    ########    0:00    ##
  6       5     #######    ########    6:00    ##

Currently I have hashed together something, but can't quite get to the next step.

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

This gives me a readout that looks like

    MONTH day freq
1       6  29    1
7       7   5    1
8       7   6    1
10      7   8    1
12      7  10    1

I am trying to use the values of the Month and day to pull out the rows that contain them from the main dataset so that I can make a data frame containing only the points with a frequency of 1 but that contains all the associated data. I've got to this point:

sis<-c(datasub1$MONTH)
bro<-c(datasub1$day)
datasub2<-subset(data1,MONTH==sis&day==bro)

... but that doesn't give me anything, personally it makes intuitive sense (R beginner) that it should subset out the rows that contain both the values of bro and sis.

Any help would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

活泼老夫 2025-01-03 17:18:05

修订：

datasub2<-subset(data1, paste(month,day,sep=".") %in% paste(datasub1$MONTH, datasub1$day,sep=".") )

任何特定的 MONTH 项目不太可能（而且很可能不可能）完全等于该子集。您可能更感兴趣的是“Month.Day”的组合是否位于 datasub1 中“Month.Day”的组合集中。如果标题如图所示，则您已经混淆了从 count() 函数返回的大小写。

> dog
  month day freq
1     6   1    1
2     6   2    2
3     6   3    2
4     6   4    1
5     6   5    1
> datasub1
  month day freq
1     6   1    1
4     6   4    1
5     6   5    1
> datasub2
  month day easting northing time ID
1     6   1 ####### ######## 0:00 ##
6     6   4 ####### ######## 0:00 ##
7     6   5 ####### ######## 6:00 ##

Revised:

datasub2<-subset(data1, paste(month,day,sep=".") %in% paste(datasub1$MONTH, datasub1$day,sep=".") )

It's not very likely (and quite possibly impossible) that any particular MONTH item will exactly equal that subset. You are presumably more interested in whether a combo of "Month.Day" is in the combo sets of "Month.Day" in the datasub1. You have mixed up the capitalization that returns from the count() function if the headers were as you illustrated.

> dog
  month day freq
1     6   1    1
2     6   2    2
3     6   3    2
4     6   4    1
5     6   5    1
> datasub1
  month day freq
1     6   1    1
4     6   4    1
5     6   5    1
> datasub2
  month day easting northing time ID
1     6   1 ####### ######## 0:00 ##
6     6   4 ####### ######## 0:00 ##
7     6   5 ####### ######## 6:00 ##

回复收藏 0 原文

岁月染过的梦 2025-01-03 17:18:05

之后：

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))

尝试这个：

indx = which(dog$freq==1)
data1[indx,]

After this:

library(plyr)
dog<-count(data1,vars=c("MONTH","day"))

try this:

indx = which(dog$freq==1)
data1[indx,]

回复收藏 0 原文

半衬遮猫 2025-01-03 17:18:05

data1[rownames(datasub1), ]

这是OP最初想法的延伸，但可能不是他们所追求的，实际上正是Wesley建议的，但又进一步推进了OP最初的步骤（减去兄弟部分，这让我有点困惑，因为......好吧出于同样的原因 DWin 说:)）。您所追求的是行名，而不是这些列中的值。您已经获得了该信息。行名称将该信息带回原始数据集。

n <- 100
data1 <- data.frame(
    Accuracy = round(runif(n, 0, 5), 1),
    MONTH    = sample(1:5, n, replace=TRUE),
    day      = sample(1:28, n, replace=TRUE),
    Easting  = rnorm(n),
    Northing = rnorm(n),
    Etc      = rnorm(n)
)


library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

data1[rownames(datasub1), ]

data1[rownames(datasub1), ]

This is an extension of the OP's original thinking but may not be what they're after and is really just what Wesley suggested but carrying the OP's original steps one more forward (minus the bro sis part which confused me a bit because...well for the same reason DWin said :)). You're after the rownames not really the values in those columns. You've already got that information. The row names carry that information back to the original data set.

n <- 100
data1 <- data.frame(
    Accuracy = round(runif(n, 0, 5), 1),
    MONTH    = sample(1:5, n, replace=TRUE),
    day      = sample(1:28, n, replace=TRUE),
    Easting  = rnorm(n),
    Northing = rnorm(n),
    Etc      = rnorm(n)
)


library(plyr)
dog<-count(data1,vars=c("MONTH","day"))
datasub1<-subset(dog,freq==1)

data1[rownames(datasub1), ]

回复收藏 0 原文

~没有更多了~