tapply 类似问题,但需要数据帧输出 - R

发布于 2024-12-27 21:20:08 字数 964 浏览 1 评论 0原文

这是我的第一篇文章,所以希望我能解释我需要正确做什么。我对 R 还很陌生,我可能读过回答这个问题的帖子,但我一生都无法理解它们的含义。如果这个问题已经得到解答,请提前致歉。

我有一个来自无线电项圈的非常大的 GPS 位置数据集,并且每天的位置数量不一致。我想浏览数据集并根据 GPS 信号的准确度水平每天选择一个数据点。

所以它基本上看起来像这样。

Accuracy    Month    Day    Easting    Northing    Etc
   5          6       1     #######    ########     #
   3.2        6       1     #######    ########     #
   3.8        6       1     #######    ########     #
   1.6        6       2     #######    ########     #
   4          6       3     #######    ########     #
   3.2        6       3     #######    ########     #

我想提取每天最准确的点(最低准确度度量),同时保留其余的相关数据。

目前,我一直在使用 tapply 函数

datasub1<-subset(data,MONTH==6)
tapply(datasub1$accuracy, datasub1$day, min)

使用这种方法,我可以成功检索最小值,每天一个,但是我无法获取相关的坐标和时间,以及所有其他重要信息,因为数据集是近30万行,我手工实在是做不到。

因此,本质上,我需要获得与 tapply 相同的结果,但我需要的不是单个点,而是找到该点的整行。

提前感谢任何可以提供帮助的人。如果您需要更多信息,请告诉我,我会尽力为您提供。

This is my first post, so hopefully I explain what I need to do properly. I am still quite new to R and I may have read posts that answer this, but I just can't for the life of me understand what they mean. So apologies in advance if this has already been answered.

I have a very large data set of GPS locations from radiocollars and there are inconsistent numbers of locations for each day. I want to go through the dataset and select a single data point for each day based on the accuracy level of the GPS signal.

So it essentially looks like this.

Accuracy    Month    Day    Easting    Northing    Etc
   5          6       1     #######    ########     #
   3.2        6       1     #######    ########     #
   3.8        6       1     #######    ########     #
   1.6        6       2     #######    ########     #
   4          6       3     #######    ########     #
   3.2        6       3     #######    ########     #

And I want to pull out the most accurate point for each day (the lowest accuracy measure) while keeping the rest of the associated data.

Currently I have been using the tapply function

datasub1<-subset(data,MONTH==6)
tapply(datasub1$accuracy, datasub1$day, min)

Using this method I can successfully retrieve the minimum values, one for each day, however I cannot take the associated coordinates and timing, and all the other important information along with it, and as the data set is nearly 300 000 rows, I really can't do it by hand.

So essentially, I need to get the same results as the tapply, but instead of individual points, I need the entire row that that point is found in.

Thanks in advance to anyone that could lend a hand. If you need any more information, let me know, I'll try my best to get it to you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我不咬妳我踢妳 2025-01-03 21:20:08

您可以使用 ddply:它将 data.frame 分割成多个片段(每天一个)并对每个片段应用一个函数。

# Sample data
n <- 100
d <- data.frame(
  Accuracy = round(runif(n, 0, 5), 1),
  Month    = sample(1:2, n, replace=TRUE),
  Day      = sample(1:5, n, replace=TRUE),
  Easting  = rnorm(n),
  Northing = rnorm(n),
  Etc      = rnorm(n)
)

# Extract the maximum for each day
# (In case of ties, you only have the first row)
library(plyr)
ddply( 
  d, 
  c("Month", "Day"), 
  function (u) u[ which.min(u$Accuracy), ] 
)

You can use ddply: it cuts a data.frame into pieces (one per day) and applies a function to each piece.

# Sample data
n <- 100
d <- data.frame(
  Accuracy = round(runif(n, 0, 5), 1),
  Month    = sample(1:2, n, replace=TRUE),
  Day      = sample(1:5, n, replace=TRUE),
  Easting  = rnorm(n),
  Northing = rnorm(n),
  Etc      = rnorm(n)
)

# Extract the maximum for each day
# (In case of ties, you only have the first row)
library(plyr)
ddply( 
  d, 
  c("Month", "Day"), 
  function (u) u[ which.min(u$Accuracy), ] 
)
淡淡绿茶香 2025-01-03 21:20:08

这是使用 split-apply 范例的一个基本解决方案,至少在一开始就构成了 plyr 函数的基础:

lapply( 
     split(dat, list(dat$Month, dat$Day)),
         function(d) d[ which.min(d$Accuracy), ])

This is one base solution using the split-apply paradigm that formed the basis for the plyr functions at least in the beginning:

lapply( 
     split(dat, list(dat$Month, dat$Day)),
         function(d) d[ which.min(d$Accuracy), ])
水水月牙 2025-01-03 21:20:08

所以你根本不想以任何方式聚合。您所需要做的就是选择每天的最小值。因此,您所需要做的就是找到最小值并选择匹配项。

mins <- ave(datasub1$accuracy, datasub1$day, FUN = min)
datasub1[ datasub1$accuracy == mins, ]

如果您需要逐月或每年或其他什么,那么只需将它们作为列表添加到 ave 的第二个参数中即可。这是一种替代语法。

mins <- with( datasub1, ave(accuracy, day, month, FUN = min) )

So you don't want to aggregate in any way at all really. All you need to do is select the minimum for each day. So, all you need to do is find the minimums and select the matches.

mins <- ave(datasub1$accuracy, datasub1$day, FUN = min)
datasub1[ datasub1$accuracy == mins, ]

If you need day by month or year or whatever then just add them in as a list to the second argument of ave. Here's an alternate syntax.

mins <- with( datasub1, ave(accuracy, day, month, FUN = min) )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文