如何选择 R 数据框中满足特定条件的第一行?

发布于 2024-12-15 15:29:40 字数 683 浏览 4 评论 0原文

如何选择满足特定条件的 R 数据框的第一行?

上下文如下:

我有一个包含五列的数据框:

"pixel", "year","propvar", "component", "cumsum." 

有 1,225 种 pixelyear 组合,因为数据是根据 49 个地理区域的年度时间序列计算得出的25 个研究年中每年的像素。在每个像素年中,我计算了 propvar,即给定像素年时间序列的快速傅里叶变换的给定分量所解释的总方差的比例。然后我计算了 cumsum,它是像素年内每个频率分量的 propvar 累积和。 component 列仅提供傅立叶级数分量的索引(加 1),从中计算 propvar

我想确定解释大于 99% 的方差所需的成分数量。我认为实现此目的的一种方法是找到每个像素年中的第一行,其中 cumsum > > 0.99,并从中创建一个包含三列的数据框:pixelyearnumbercomps,其中 numbercomps是在给定像素年内解释大于 99% 的方差所需的分量数量。我不知道如何在 R 中执行此操作。有人有解决方案吗?

How do I select the first row of an R data frame that meets certain criteria?

Here is the context:

I have a data frame with five columns:

"pixel", "year","propvar", "component", "cumsum." 

There are 1,225 combinations of pixel and year, because the data was computed from the annual time series of 49 geographic pixels for each of 25 study years. Within each pixel-year, I have computed propvar, the proportion of total variance explained by a given component of the fast Fourier transform for the time series of a given pixel-year. I then computed cumsum, which is the cumulative sum of propvar for each frequency component within a pixel-year. The component column just gives you an index for the Fourier series component (plus 1) from which propvar was calculated.

I want to determine the number of components required to explain greater than 99% of the variance. I figure one way to do this is to find the first row within each pixel-year where cumsum > 0.99, and create a data frame from it with three columns, pixel, year, and numbercomps, where numbercomps is the number of components required within a given pixel-year to explain greater than 99% of the variance. I do not know how to do this in R. Does anyone have a solution?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

残花月 2024-12-22 15:29:40

当然。像这样的东西应该可以解决问题:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT 另外,对于那些对 data.table 感兴趣的人,有这样的:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]

Sure. Something like this should do the trick:

# CREATE A REPRODUCIBLE EXAMPLE!
df <- data.frame(year = c("2001", "2003", "2001", "2003", "2003"),
                 pixel = c("a", "b", "a", "b", "a"), 
                 cumsum = c(99, 99, 98, 99, 99),
                 numbercomps=1:5)
df
#   year pixel cumsum numbercomps
# 1 2001     a     99           1
# 2 2003     b     99           2 
# 3 2001     a     98           3
# 4 2003     b     99           4
# 5 2003     a     99           5

# EXTRACT THE SUBSET YOU'D LIKE.
res <- subset(df, cumsum>=99)
res <- subset(res, 
              subset = !duplicated(res[c("year", "pixel")]),
              select = c("pixel", "year", "numbercomps"))
#   pixel year numbercomps
# 1     a 2001           1
# 2     b 2003           2
# 5     a 2003           5

EDIT Also, for those interested in data.table, there is this:

library(data.table)
dt <- data.table(df, key="pixel, year")    
dt[cumsum>=99, .SD[1], by=key(dt)]
笑忘罢 2024-12-22 15:29:40

假设 df 是数据集,我们必须从中选择满足条件的第一行。
这段两行代码将为您提供所需的行。

row_index <- which(<criteria>, arr.ind = TRUE)[1]

df_required <- df[row_index,]

Let's assume df is the dataset from which we have to select the first row that meets the criteria.
This two-line code will give you the required row.

row_index <- which(<criteria>, arr.ind = TRUE)[1]

df_required <- df[row_index,]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文