将时间信息汇总到动态矩阵中

发布于 2025-01-26 01:07:52 字数 1147 浏览 4 评论 0原文

有一个数据框

id <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "E")
year <- c("2005", "2006", "2007", "2008", "2005", "2006", "2007", "2005", "2007", "2006", "2007", "2008")
value <- 1:12
df <- data.frame(id, year, value)

我，按年。行计算到连续一年的“生存”多少个ID：

id_observed <- matrix(c(3,2,3,1,0,1,1,0,0,0,0,0,0,0,0,1), nrow = 4, ncol = 4)
#First observed id's (by columns), consecutive id's observations (by rows)
colnames(id_observed) <- c("2005", "2006", "2007", "2008")
rownames(id_observed) <- c("2005", "2006", "2007", "2008")
id_observed

适用于生成矩阵value_observed从value中获取信息。列计算第一次观察到的ID的汇总值。行计算“生存”到连续一年的ID的汇总值：

value_observed <- matrix(c(14,8,19,4,0,10,11,0,0,0,0,0,0,0,0,12), nrow = 4, ncol = 4)
#First observed value (by columns), consecutive value's observations (by rows)
colnames(value_observed) <- c("2005", "2006", "2007", "2008")
rownames(value_observed) <- c("2005", "2006", "2007", "2008")
value_observed

关于如何构建矩阵id_observed和value_observed 的任何线索？

原文

I have a data frame like df:

id <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D", "E")
year <- c("2005", "2006", "2007", "2008", "2005", "2006", "2007", "2005", "2007", "2006", "2007", "2008")
value <- 1:12
df <- data.frame(id, year, value)

I want to convert df into a matrix id_observed where columns count how many id's are observed for the first time, by year. Rows count how many ids "survive" to the consecutive year:

id_observed <- matrix(c(3,2,3,1,0,1,1,0,0,0,0,0,0,0,0,1), nrow = 4, ncol = 4)
#First observed id's (by columns), consecutive id's observations (by rows)
colnames(id_observed) <- c("2005", "2006", "2007", "2008")
rownames(id_observed) <- c("2005", "2006", "2007", "2008")
id_observed

The same idea applies to generate matrix value_observed taking the information from value. Where columns count the aggregated value of id's that are observed for the first time, by year. Rows count the aggregated value of the ids that "survived" to the consecutive year:

value_observed <- matrix(c(14,8,19,4,0,10,11,0,0,0,0,0,0,0,0,12), nrow = 4, ncol = 4)
#First observed value (by columns), consecutive value's observations (by rows)
colnames(value_observed) <- c("2005", "2006", "2007", "2008")
rownames(value_observed) <- c("2005", "2006", "2007", "2008")
value_observed

Any clue on how to build matrices id_observed, and value_observed in an automatic way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掀纱窥君容 2025-02-02 01:07:52

您可以创建此功能，get_matrix（），该功能利用整理的方法在唯一的年份中循环，每年创建数据，绑定行，然后转移更广泛的

library(tidyverse)

get_matrix <- function(df, type=c("value","id")) {
  res = lapply(unique(df$year), function(y) {
    d = df %>% group_by(id) %>% filter(min(year)==y) %>% group_by(year)
    if(type == "value") d = summarize(d,n=sum(value))
    else d = summarize(d,n=n())
    d = mutate(d,y=y)
    if(nrow(d)==0) return(tibble(year=y, n=0, y=y)) else return(d)
  })
  bind_rows(res) %>% 
    pivot_wider(id_cols = year,names_from = y,values_from = n,values_fill = 0)    
}

用法

get_matrix(df, type="value")

输出

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005      14      0      0      0
2 2006       8     10      0      0
3 2007      19     11      0      0
4 2008       4      0      0     12

输出

get_matrix(df, type="id")

输出输出

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005       3      0      0      0
2 2006       2      1      0      0
3 2007       3      1      0      0
4 2008       1      0      0      1

更新：

数据。表选项

setDT(df)[, year:=as.integer(year)]
syears = unique(df$year)
df = df[, y:=min(year), by = id][, .SD[,.N, year], by=y]
dcast(
  rbind(df,data.table(y=setdiff(syears, unique(df$y)))[,`:=`(year=y,N=0)]),
  year~y, value.var="N"
)

输出：

    year  2005  2006  2007  2008
   <int> <num> <num> <num> <num>
1:  2005     3    NA    NA    NA
2:  2006     2     1    NA    NA
3:  2007     3     1     0    NA
4:  2008     1    NA    NA     1

You can create this function, get_matrix(), which leverages tidyverse approach to loop over unique years, creating the data for each year, binding the rows, and then pivoting wider

library(tidyverse)

get_matrix <- function(df, type=c("value","id")) {
  res = lapply(unique(df$year), function(y) {
    d = df %>% group_by(id) %>% filter(min(year)==y) %>% group_by(year)
    if(type == "value") d = summarize(d,n=sum(value))
    else d = summarize(d,n=n())
    d = mutate(d,y=y)
    if(nrow(d)==0) return(tibble(year=y, n=0, y=y)) else return(d)
  })
  bind_rows(res) %>% 
    pivot_wider(id_cols = year,names_from = y,values_from = n,values_fill = 0)    
}

Usage

get_matrix(df, type="value")

Output

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005      14      0      0      0
2 2006       8     10      0      0
3 2007      19     11      0      0
4 2008       4      0      0     12

Usage

get_matrix(df, type="id")

Output

  year  `2005` `2006` `2007` `2008`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 2005       3      0      0      0
2 2006       2      1      0      0
3 2007       3      1      0      0
4 2008       1      0      0      1

Update:

data.table option

setDT(df)[, year:=as.integer(year)]
syears = unique(df$year)
df = df[, y:=min(year), by = id][, .SD[,.N, year], by=y]
dcast(
  rbind(df,data.table(y=setdiff(syears, unique(df$y)))[,`:=`(year=y,N=0)]),
  year~y, value.var="N"
)

Output:

    year  2005  2006  2007  2008
   <int> <num> <num> <num> <num>
1:  2005     3    NA    NA    NA
2:  2006     2     1    NA    NA
3:  2007     3     1     0    NA
4:  2008     1    NA    NA     1

回复收藏 0 原文

~没有更多了~