使每年包含 12 个月的大型数据集

发布于 2025-01-12 22:36:01 字数 505 浏览 0 评论 0原文

我有一个关于鲑鱼养殖(2005-2020)的变量的大型数据框。它包含来自数百个不同农场 (org_anonym) 长达 15 年的数据。然而,许多农场缺少某些月份或有重复的月份。我怎样才能这样写,以便每个位置每年都有 12 个月(顺序为 1-12)?

示例:

在此处输入图像描述

在此示例中,农场 126 缺少 2005 年的第 12 个月,而 2006 年只有第 11 个月和第 12 个月。有时,同一年有两个连续的行与同一月。

我期望的结果是让所有地点的年份为 2005-2020 年,月份为 1-12,没有重复或缺失月份(填充行中的数据可以为 0 或 NA)。

我没有直观的方法来做到这一点,因为错误是随机的。

请帮忙:)

I have a large data.frame of variables regarding salmon farming (2005-2020). It contains data from hundreds of different farms (org_anonym) for all 15 years. However, many farms are missing some months or have duplicate months. How can I write this so that every year for every location has 12 months in the order 1-12?

Example:

enter image description here

In this example, farm 126 is missing the 12th month of the year for 2005, whereas 2006 has only the 11th and 12th month. Sometimes the same year has two consecutive rows with the same month.

My desired outcome is to have all locations have years 2005-2020 with months 1-12 without duplicates or missing months (the data in the filled rows can be 0 or NA).

I don't have an intuitive way of doing this since the errors are random.

Please help :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

星軌x 2025-01-19 22:36:01
salmon %>% distinct(year) %>%
  mutate(month = list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon, by=c("year", "month")

如果您希望 org_acronym 也被保留,只需将上面更改为 distinct(year, acronym) 即可。

如果您想要 2005:2020 以来的所有年份,只需将上面更改为

tibble(year=2005:2020) %>% 
  mutate(month=list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon)

输出:

   year month org_acronym
1  2005     1         126
2  2005     2         126
3  2005     3         126
4  2005     4         126
5  2005     5         126
6  2005     6         126
7  2005     7         126
8  2005     8         126
9  2005     9         126
10 2005    10         126
11 2005    11         126
12 2005    12          NA
13 2006     1          NA
14 2006     2          NA
15 2006     3          NA
16 2006     4          NA
17 2006     5          NA
18 2006     6          NA
19 2006     7          NA
20 2006     8          NA
21 2006     9          NA
22 2006    10          NA
23 2006    11         126
24 2006    12         126
25 2007     1         126
26 2007     2          NA
27 2007     3          NA
28 2007     4          NA
29 2007     5          NA
30 2007     6          NA
31 2007     7          NA
32 2007     8          NA
33 2007     9          NA
34 2007    10          NA
35 2007    11          NA
36 2007    12          NA

输入:

salmon  =tibble(
  year = c(rep(2005,11), rep(2006,2),2007),
  month = c(seq(1:11), 11,12,1),
  org_acronym = 126
)
salmon %>% distinct(year) %>%
  mutate(month = list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon, by=c("year", "month")

If you want org_acronym to also be carried over, just change above to distinct(year, acronym).

If you want all the years from 2005:2020, just change above to

tibble(year=2005:2020) %>% 
  mutate(month=list(1:12)) %>% 
  unnest(month) %>% 
  left_join(salmon)

Output:

   year month org_acronym
1  2005     1         126
2  2005     2         126
3  2005     3         126
4  2005     4         126
5  2005     5         126
6  2005     6         126
7  2005     7         126
8  2005     8         126
9  2005     9         126
10 2005    10         126
11 2005    11         126
12 2005    12          NA
13 2006     1          NA
14 2006     2          NA
15 2006     3          NA
16 2006     4          NA
17 2006     5          NA
18 2006     6          NA
19 2006     7          NA
20 2006     8          NA
21 2006     9          NA
22 2006    10          NA
23 2006    11         126
24 2006    12         126
25 2007     1         126
26 2007     2          NA
27 2007     3          NA
28 2007     4          NA
29 2007     5          NA
30 2007     6          NA
31 2007     7          NA
32 2007     8          NA
33 2007     9          NA
34 2007    10          NA
35 2007    11          NA
36 2007    12          NA

Input:

salmon  =tibble(
  year = c(rep(2005,11), rep(2006,2),2007),
  month = c(seq(1:11), 11,12,1),
  org_acronym = 126
)
甜心 2025-01-19 22:36:01

这是一个类似的示例,其中我只使用“6 个月的年份”,因此更具可读性。通过小例子对内容进行排序会更容易。

library(plyr) # join and arrange function
# Example where i miss last month of 2005 
# and only have 2 first month for 2006
df <- data.frame( 
  year = c(rep(2005, 5), rep(2006, 2)), 
  month = c(1:5, 4, 5 ),
  value = 126)

# I create a data.frame of all coombination of year and month.
# You can replace year here by 2005:2020 and month by 1:12
tmp <- expand.grid(year = unique(df$year), month = 1:6) 

res <- join(tmp, df, type = "left") # join will add NA for missing values
res <- arrange(res, year, month, value) # arrange will order the rows.
res
#    year month value
# 1  2005     1   126
# 2  2005     2   126
# 3  2005     3   126
# 4  2005     4   126
# 5  2005     5   126
# 6  2005     6    NA
# 7  2006     1    NA
# 8  2006     2    NA
# 9  2006     3    NA
# 10 2006     4   126
# 11 2006     5   126
# 12 2006     6    NA

更短的方法是使用函数complete(),但这要求您的数据帧至少每年和每月出现一次。在我的示例中,这并不完全有效,因为我没有任何带有“第六”个月的年份。

library(tidyr)
df %>% complete(year, month)

另外,complete() 只是 expand()join() 的包装,因此您最好了解第一个过程中发生的情况解决方案。

here is a similar example where I only work with a "6 month year" so it's more readible. It's easier to sort stuff with small examples.

library(plyr) # join and arrange function
# Example where i miss last month of 2005 
# and only have 2 first month for 2006
df <- data.frame( 
  year = c(rep(2005, 5), rep(2006, 2)), 
  month = c(1:5, 4, 5 ),
  value = 126)

# I create a data.frame of all coombination of year and month.
# You can replace year here by 2005:2020 and month by 1:12
tmp <- expand.grid(year = unique(df$year), month = 1:6) 

res <- join(tmp, df, type = "left") # join will add NA for missing values
res <- arrange(res, year, month, value) # arrange will order the rows.
res
#    year month value
# 1  2005     1   126
# 2  2005     2   126
# 3  2005     3   126
# 4  2005     4   126
# 5  2005     5   126
# 6  2005     6    NA
# 7  2006     1    NA
# 8  2006     2    NA
# 9  2006     3    NA
# 10 2006     4   126
# 11 2006     5   126
# 12 2006     6    NA

A shorter way is to use the function complete(), but this require that your dataframe has an occurence of at least each year and month. In my example this won't exactly work since I don't have any year with the "sixth" month.

library(tidyr)
df %>% complete(year, month)

Also complete() is only a wrapper around expand() and join(), so it's better for you to understand what happens during the first solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文