使每年包含 12 个月的大型数据集

发布于 2025-01-12 22:36:01 字数 505 浏览 0 评论 0原文

我有一个关于鲑鱼养殖（2005-2020）的变量的大型数据框。它包含来自数百个不同农场 (org_anonym) 长达 15 年的数据。然而，许多农场缺少某些月份或有重复的月份。我怎样才能这样写，以便每个位置每年都有 12 个月（顺序为 1-12）？

示例：

在此示例中，农场 126 缺少 2005 年的第 12 个月，而 2006 年只有第 11 个月和第 12 个月。有时，同一年有两个连续的行与同一月。

我期望的结果是让所有地点的年份为 2005-2020 年，月份为 1-12，没有重复或缺失月份（填充行中的数据可以为 0 或 NA）。

我没有直观的方法来做到这一点，因为错误是随机的。

请帮忙:)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星軌x 2025-01-19 22:36:01

salmon %>% distinct(year) %>%
  mutate(month = list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon, by=c("year", "month")

如果您希望 org_acronym 也被保留，只需将上面更改为 distinct(year, acronym) 即可。

如果您想要 2005:2020 以来的所有年份，只需将上面更改为

tibble(year=2005:2020) %>% 
  mutate(month=list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon)

输出：

   year month org_acronym
1  2005     1         126
2  2005     2         126
3  2005     3         126
4  2005     4         126
5  2005     5         126
6  2005     6         126
7  2005     7         126
8  2005     8         126
9  2005     9         126
10 2005    10         126
11 2005    11         126
12 2005    12          NA
13 2006     1          NA
14 2006     2          NA
15 2006     3          NA
16 2006     4          NA
17 2006     5          NA
18 2006     6          NA
19 2006     7          NA
20 2006     8          NA
21 2006     9          NA
22 2006    10          NA
23 2006    11         126
24 2006    12         126
25 2007     1         126
26 2007     2          NA
27 2007     3          NA
28 2007     4          NA
29 2007     5          NA
30 2007     6          NA
31 2007     7          NA
32 2007     8          NA
33 2007     9          NA
34 2007    10          NA
35 2007    11          NA
36 2007    12          NA

输入：

salmon  =tibble(
  year = c(rep(2005,11), rep(2006,2),2007),
  month = c(seq(1:11), 11,12,1),
  org_acronym = 126
)

salmon %>% distinct(year) %>%
  mutate(month = list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon, by=c("year", "month")

If you want org_acronym to also be carried over, just change above to distinct(year, acronym).

If you want all the years from 2005:2020, just change above to

tibble(year=2005:2020) %>% 
  mutate(month=list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon)

Output:

   year month org_acronym
1  2005     1         126
2  2005     2         126
3  2005     3         126
4  2005     4         126
5  2005     5         126
6  2005     6         126
7  2005     7         126
8  2005     8         126
9  2005     9         126
10 2005    10         126
11 2005    11         126
12 2005    12          NA
13 2006     1          NA
14 2006     2          NA
15 2006     3          NA
16 2006     4          NA
17 2006     5          NA
18 2006     6          NA
19 2006     7          NA
20 2006     8          NA
21 2006     9          NA
22 2006    10          NA
23 2006    11         126
24 2006    12         126
25 2007     1         126
26 2007     2          NA
27 2007     3          NA
28 2007     4          NA
29 2007     5          NA
30 2007     6          NA
31 2007     7          NA
32 2007     8          NA
33 2007     9          NA
34 2007    10          NA
35 2007    11          NA
36 2007    12          NA

Input:

salmon  =tibble(
  year = c(rep(2005,11), rep(2006,2),2007),
  month = c(seq(1:11), 11,12,1),
  org_acronym = 126
)

回复收藏 0 原文

甜心 2025-01-19 22:36:01

这是一个类似的示例，其中我只使用“6 个月的年份”，因此更具可读性。通过小例子对内容进行排序会更容易。

library(plyr) # join and arrange function
# Example where i miss last month of 2005 
# and only have 2 first month for 2006
df <- data.frame( 
  year = c(rep(2005, 5), rep(2006, 2)), 
  month = c(1:5, 4, 5 ),
  value = 126)

# I create a data.frame of all coombination of year and month.
# You can replace year here by 2005:2020 and month by 1:12
tmp <- expand.grid(year = unique(df$year), month = 1:6) 

res <- join(tmp, df, type = "left") # join will add NA for missing values
res <- arrange(res, year, month, value) # arrange will order the rows.
res
#    year month value
# 1  2005     1   126
# 2  2005     2   126
# 3  2005     3   126
# 4  2005     4   126
# 5  2005     5   126
# 6  2005     6    NA
# 7  2006     1    NA
# 8  2006     2    NA
# 9  2006     3    NA
# 10 2006     4   126
# 11 2006     5   126
# 12 2006     6    NA

更短的方法是使用函数complete()，但这要求您的数据帧至少每年和每月出现一次。在我的示例中，这并不完全有效，因为我没有任何带有“第六”个月的年份。

library(tidyr)
df %>% complete(year, month)

另外，complete() 只是 expand() 和 join() 的包装，因此您最好了解第一个过程中发生的情况解决方案。

here is a similar example where I only work with a "6 month year" so it's more readible. It's easier to sort stuff with small examples.

library(plyr) # join and arrange function
# Example where i miss last month of 2005 
# and only have 2 first month for 2006
df <- data.frame( 
  year = c(rep(2005, 5), rep(2006, 2)), 
  month = c(1:5, 4, 5 ),
  value = 126)

# I create a data.frame of all coombination of year and month.
# You can replace year here by 2005:2020 and month by 1:12
tmp <- expand.grid(year = unique(df$year), month = 1:6) 

res <- join(tmp, df, type = "left") # join will add NA for missing values
res <- arrange(res, year, month, value) # arrange will order the rows.
res
#    year month value
# 1  2005     1   126
# 2  2005     2   126
# 3  2005     3   126
# 4  2005     4   126
# 5  2005     5   126
# 6  2005     6    NA
# 7  2006     1    NA
# 8  2006     2    NA
# 9  2006     3    NA
# 10 2006     4   126
# 11 2006     5   126
# 12 2006     6    NA

A shorter way is to use the function complete(), but this require that your dataframe has an occurence of at least each year and month. In my example this won't exactly work since I don't have any year with the "sixth" month.

library(tidyr)
df %>% complete(year, month)

Also complete() is only a wrapper around expand() and join(), so it's better for you to understand what happens during the first solution.

回复收藏 0 原文

~没有更多了~

关于作者

药祭#氼

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使每年包含 12 个月的大型数据集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

使每年包含 12 个月的大型数据集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。