R-数据框列中序列中数字的平均值

发布于 2025-01-23 07:51:35 字数 372 浏览 0 评论 0原文

我正在尝试计算在另一列(x)内被依次计数(1、2、3等)的列(y)中的值的平均值。一个示例数据帧如下所示。

> df
   x  y
1  1 15
2  2 20
3  4 16
4  5 12
5  6 17
6  8 14
7  9 13
8 10 19

我想获得一个向量结果,该结果将读取数值序列中数字的平均值。所需的向量将读取:17.5 15.33333

我不确定产生所需结果的最佳方法,但是我尝试使用diff(df [,1]) to to to for loop for for loop to找到断点。

任何人都能提供的任何帮助将不胜感激。这是一个小示例数据集,但目标是将其应用于大数据集。

I am trying to calculate the mean of values in a column (y) that are being sequentially counted (1, 2, 3, etc.) within another column (x). An example dataframe is shown below.

> df
   x  y
1  1 15
2  2 20
3  4 16
4  5 12
5  6 17
6  8 14
7  9 13
8 10 19

I would like to get a vector result that will read out the mean values of numbers in a numerical sequence. The desired vector would read: 17.5 15 15.33333

I am not sure the best way to produce this desired result, but I have tried and failed to make a for loop using diff(df[,1]) to find the breakpoints.

Any help anyone could provide would be appreciated. This is a small example dataset, but the goal is to apply it to a large dataset.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

逆流 2025-01-30 07:51:35

上创建分组列

with(df, tapply(y, cumsum(c(TRUE, diff(x) != 1)), FUN = mean))

diff使用cumsum在逻辑向量-Output

  1        2        3 
17.50000 15.00000 15.33333 

数据

df <- structure(list(x = c(1L, 2L, 4L, 5L, 6L, 8L, 9L, 10L), y = c(15L, 
20L, 16L, 12L, 17L, 14L, 13L, 19L)), class = "data.frame",
 row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))

Create a grouping column from the diff using cumsum on a logical vector

with(df, tapply(y, cumsum(c(TRUE, diff(x) != 1)), FUN = mean))

-output

  1        2        3 
17.50000 15.00000 15.33333 

data

df <- structure(list(x = c(1L, 2L, 4L, 5L, 6L, 8L, 9L, 10L), y = c(15L, 
20L, 16L, 12L, 17L, 14L, 13L, 19L)), class = "data.frame",
 row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))
故人的歌 2025-01-30 07:51:35

更新:代码减少2行:

df %>% 
  group_by(id_Group =cumsum(x-lag(x, default = x[1])>=2)) %>% 
  summarise(mean = mean(y, na.rm=TRUE)) %>% 
  pull(mean)

这是dplyr版本:

  1. 之间的差异
  2. 计算滞后x使用cumsum(diff&gt创建组) ; = 2)
  3. 计算均值和拉动向量。
library(dplyr)

df %>% 
  mutate(diff= x-lag(x, default = x[1])) %>% 
  group_by(id_Group =cumsum(diff>=2)) %>% 
  mutate(mean = mean(y, na.rm=TRUE)) %>% 
  slice(1) %>% 
  pull(mean)
[1] 17.50000 15.00000 15.33333

Update: code reduced 2 lines:

df %>% 
  group_by(id_Group =cumsum(x-lag(x, default = x[1])>=2)) %>% 
  summarise(mean = mean(y, na.rm=TRUE)) %>% 
  pull(mean)

Here is a dplyr version:

  1. calculate the difference between the lagged x
  2. create group with cumsum(diff>=2)
  3. calculate mean and pull the vector.
library(dplyr)

df %>% 
  mutate(diff= x-lag(x, default = x[1])) %>% 
  group_by(id_Group =cumsum(diff>=2)) %>% 
  mutate(mean = mean(y, na.rm=TRUE)) %>% 
  slice(1) %>% 
  pull(mean)
[1] 17.50000 15.00000 15.33333
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文