如何根据 R 中 for 循环的结果创建多个数据帧?

发布于 2025-01-09 14:14:18 字数 2143 浏览 1 评论 0原文

我有 11 个数据框,其中包含切萨皮克海草调查的各种观察结果。每个数据帧包含以下变量(包括示例值)。有 11 个数据帧,每个数据帧代表来自单个 SAMPYR 的观察结果。所以:

    > head(density.2007)
       PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
    1  HI 2  1.0      50   2006   2007   1    6.0
    2  HI 5  0.5     100   2006   2007   1   11.6
    3  HI 7  0.5      50   2006   2007   1    6.0
    4  HI 9  0.5     100   2006   2007   1    9.6
    5 HI 10  1.0     100   2006   2007   1   30.0
    6 HI 23  1.0      50   2006   2007   1   40.4
                                               
 > head(density.2008)
   PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS NOTES id
29 HI 1  1.0     100   2007   2008   1   39.6       29
30 HI 2  1.0      50   2006   2008   2   54.8       30
31 HI 3  0.5     100   2007   2008   1   11.2       31
32 HI 4  1.0     100   2007   2008   1    8.8       32
33 HI 5  0.5     100   2006   2008   2   24.0       33
34 HI 7  0.5      50   2006   2008   2    0.0       34

我想编写一个 for 循环,从 PLOT 列中获取唯一字符的数量,并计算每个字符的频率(这样我就可以过滤,只列出那些出现多次的字符)。

到目前为止我所拥有的是:

density.names <- c("density.2007",
                   "density.2008",
                   "density.2009",
                   "density.2010",
                   "density.2011",
                   "density.2012",
                   "density.2013",
                   "density.2014",
                   "density.2015",
                   "density.2016",
                   "density.2017"
                   )

for(i in 1:length(density.names)) {
  get(density.names[i]) %>%
    count(PLOT) %>%
    print()
}  

此代码输出

+     print()
      PLOT n
1     HI 1 1
2    HI 10 1
3   HI 100 1
4   HI 103 1
5   HI 104 1
6    HI 11 1
7    HI 13 1
8    HI 14 1
9    HI 15 1
10   HI 17 1
11   HI 18 1
12    HI 2 1
13   HI 20 1
14   HI 21 1
15   HI 23 1
16   HI 25 1
17   HI 27 1
18   HI 29 1
19    HI 3 1
20   HI 31 1
21   HI 32 1
22   HI 36 1
23   HI 37 1
24   HI 38 1
25   HI 39 1
26    HI 4 1
27   HI 40 1

但我无法对此做任何进一步的事情。有没有办法让我过滤行,以便只显示 an=2 的行?或者从 for 循环中打印 11 个数据帧,这样我就可以进一步操作它们,但至少我会在全局环境中拥有它们的副本?

谢谢你!如果有帮助的话,我可以提供更多详细信息。

I have 11 dataframes with various observations from seagrass surveys in the Chesapeake. Each dataframe contains the following variables (with example values included). There are 11 dataframes as each one represents observations from a single SAMPYR. So:

    > head(density.2007)
       PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
    1  HI 2  1.0      50   2006   2007   1    6.0
    2  HI 5  0.5     100   2006   2007   1   11.6
    3  HI 7  0.5      50   2006   2007   1    6.0
    4  HI 9  0.5     100   2006   2007   1    9.6
    5 HI 10  1.0     100   2006   2007   1   30.0
    6 HI 23  1.0      50   2006   2007   1   40.4
                                               
 > head(density.2008)
   PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS NOTES id
29 HI 1  1.0     100   2007   2008   1   39.6       29
30 HI 2  1.0      50   2006   2008   2   54.8       30
31 HI 3  0.5     100   2007   2008   1   11.2       31
32 HI 4  1.0     100   2007   2008   1    8.8       32
33 HI 5  0.5     100   2006   2008   2   24.0       33
34 HI 7  0.5      50   2006   2008   2    0.0       34

I would like to write a for loop that takes the number of unique characters from the PLOT column, and calculates the frequency of each one (so I can then filter so only those that appear more than once are listed).

What I have so far is:

density.names <- c("density.2007",
                   "density.2008",
                   "density.2009",
                   "density.2010",
                   "density.2011",
                   "density.2012",
                   "density.2013",
                   "density.2014",
                   "density.2015",
                   "density.2016",
                   "density.2017"
                   )

for(i in 1:length(density.names)) {
  get(density.names[i]) %>%
    count(PLOT) %>%
    print()
}  

This code outputs

+     print()
      PLOT n
1     HI 1 1
2    HI 10 1
3   HI 100 1
4   HI 103 1
5   HI 104 1
6    HI 11 1
7    HI 13 1
8    HI 14 1
9    HI 15 1
10   HI 17 1
11   HI 18 1
12    HI 2 1
13   HI 20 1
14   HI 21 1
15   HI 23 1
16   HI 25 1
17   HI 27 1
18   HI 29 1
19    HI 3 1
20   HI 31 1
21   HI 32 1
22   HI 36 1
23   HI 37 1
24   HI 38 1
25   HI 39 1
26    HI 4 1
27   HI 40 1

But I can't do anything further with that. Is there a way for me to filter rows so only those with a n=2 show up? Or to print 11 dataframes from the for loop, so I can further manipulate them but at least I'll have a copy of them in the global environment?

Thank you! I can provide additional details if that helps.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半山落雨半山空 2025-01-16 14:14:18

不要循环这样做!它的做法完全不同。我将逐步向您展示。
我的第一步是准备一个函数来生成与您类似的数据。

library(tidyverse)

dens = function(year, n) tibble(
  PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
  SIZE = runif(n, 0.1, 3), 
  DENSITY = sample(seq(50,200, by=50), n, replace = T),
  SEEDYR = year-1,
  SAMPYR = year,
  AGE = sample(1:5, n, replace = T),
  SHOOTS = runif(n, 0.1, 3)
)

让我们看看它是如何工作的并生成一些示例数据帧

set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)

密度.2007 数据帧如下所示

# A tibble: 120 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 110 more rows

现在需要将它们组合成一个帧

df = density.2007 %>% 
  bind_rows(density.2008) %>% 
  bind_rows(density.2009) %>% 
  bind_rows(density.2010) 

输出

# A tibble: 499 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 489 more rows

在下一步中,计算每个值的出现次数PLOT 变量出现

PLOT.count = df %>% 
  group_by(PLOT) %>% 
  summarise(PLOT.n = n()) %>% 
  arrange(PLOT.n)

ouptut

# A tibble: 22 x 2
   PLOT  PLOT.n
   <chr>  <int>
 1 HI 20      3
 2 HI 22      5
 3 HI 21      7
 4 HI 18     12
 5 HI 2      19
 6 HI 1      20
 7 HI 15     20
 8 HI 17     21
 9 HI 6      22
10 HI 11     23
# ... with 12 more rows

在倒数第二步中,让我们将这些计数器追加到原始数据帧

df = df %>% left_join(PLOT.count, by="PLOT")

输出

# A tibble: 499 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 15 1.67      200   2006   2007     4  1.80      20
 2 HI 14 0.270     150   2006   2007     2  2.44      32
 3 HI 3  0.856      50   2006   2007     3  0.686     27
 4 HI 10 1.25      200   2006   2007     5  1.43      25
 5 HI 11 0.673      50   2006   2007     5  1.40      23
 6 HI 5  2.51      150   2006   2007     3  2.23      38
 7 HI 14 0.543     150   2006   2007     2  2.17      32
 8 HI 5  2.43      200   2006   2007     5  2.51      38
 9 HI 9  1.69      100   2006   2007     4  2.67      26
10 HI 3  2.02       50   2006   2007     2  2.86      27
# ... with 489 more rows

中现在随意过滤它

df %>% filter(PLOT.n > 30)

ouptut

# A tibble: 139 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 14 0.270     150   2006   2007     2  2.44      32
 2 HI 5  2.51      150   2006   2007     3  2.23      38
 3 HI 14 0.543     150   2006   2007     2  2.17      32
 4 HI 5  2.43      200   2006   2007     5  2.51      38
 5 HI 8  0.598      50   2006   2007     1  1.70      34
 6 HI 7  1.94       50   2006   2007     4  1.61      35
 7 HI 14 2.91       50   2006   2007     4  0.215     32
 8 HI 7  0.846     150   2006   2007     4  0.506     35
 9 HI 7  2.38      150   2006   2007     3  1.34      35
10 HI 7  2.62      100   2006   2007     3  0.167     35
# ... with 129 more rows

或者这样

df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))

输出

# A tibble: 3 x 8
  PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
  <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
1 HI 20 0.392     200   2009   2010     1  0.512      3
2 HI 20 0.859     150   2009   2010     5  2.62       3
3 HI 20 0.882     200   2009   2010     5  1.06       3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 9  1.69      100   2006   2007     4  2.67      26
 2 HI 9  2.20       50   2006   2007     4  1.49      26
 3 HI 9  0.587     200   2006   2007     3  1.13      26
 4 HI 9  1.27       50   2006   2007     1  2.55      26
 5 HI 9  1.56      150   2006   2007     3  2.01      26
 6 HI 9  0.198     100   2006   2007     3  2.08      26
 7 HI 9  2.72      150   2007   2008     3  0.421     26
 8 HI 9  0.251     200   2007   2008     2  0.328     26
 9 HI 9  1.83       50   2007   2008     1  0.192     26
10 HI 9  1.97      100   2007   2008     1  0.900     26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 5  2.51      150   2006   2007     3   2.23     38
 2 HI 5  2.43      200   2006   2007     5   2.51     38
 3 HI 5  2.06      100   2006   2007     5   1.93     38
 4 HI 5  1.25      150   2006   2007     4   2.29     38
 5 HI 5  2.29      200   2006   2007     1   2.97     38
 6 HI 5  0.789     150   2006   2007     2   1.59     38
 7 HI 5  1.11      100   2007   2008     4   2.61     38
 8 HI 5  2.38      150   2007   2008     4   2.95     38
 9 HI 5  2.67      200   2007   2008     3   1.77     38
10 HI 5  2.63      100   2007   2008     1   1.90     38
# ... with 28 more rows

Don't do it in a loop !! It is done completely different. I'll show you step by step.
My first step is to prepare a function that will generate data similar to yours.

library(tidyverse)

dens = function(year, n) tibble(
  PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
  SIZE = runif(n, 0.1, 3), 
  DENSITY = sample(seq(50,200, by=50), n, replace = T),
  SEEDYR = year-1,
  SAMPYR = year,
  AGE = sample(1:5, n, replace = T),
  SHOOTS = runif(n, 0.1, 3)
)

Let's see how it works and generate some sample data frames

set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)

The density.2007 data frame looks like this

# A tibble: 120 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 110 more rows

Now they need to be combined into one frame

df = density.2007 %>% 
  bind_rows(density.2008) %>% 
  bind_rows(density.2009) %>% 
  bind_rows(density.2010) 

output

# A tibble: 499 x 7
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>
 1 HI 15 1.67      200   2006   2007     4  1.80 
 2 HI 14 0.270     150   2006   2007     2  2.44 
 3 HI 3  0.856      50   2006   2007     3  0.686
 4 HI 10 1.25      200   2006   2007     5  1.43 
 5 HI 11 0.673      50   2006   2007     5  1.40 
 6 HI 5  2.51      150   2006   2007     3  2.23 
 7 HI 14 0.543     150   2006   2007     2  2.17 
 8 HI 5  2.43      200   2006   2007     5  2.51 
 9 HI 9  1.69      100   2006   2007     4  2.67 
10 HI 3  2.02       50   2006   2007     2  2.86 
# ... with 489 more rows

In the next step, count how many times each value of the PLOT variable occurs

PLOT.count = df %>% 
  group_by(PLOT) %>% 
  summarise(PLOT.n = n()) %>% 
  arrange(PLOT.n)

ouptut

# A tibble: 22 x 2
   PLOT  PLOT.n
   <chr>  <int>
 1 HI 20      3
 2 HI 22      5
 3 HI 21      7
 4 HI 18     12
 5 HI 2      19
 6 HI 1      20
 7 HI 15     20
 8 HI 17     21
 9 HI 6      22
10 HI 11     23
# ... with 12 more rows

In the penultimate step, let's append these counters to the original data frame

df = df %>% left_join(PLOT.count, by="PLOT")

output

# A tibble: 499 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 15 1.67      200   2006   2007     4  1.80      20
 2 HI 14 0.270     150   2006   2007     2  2.44      32
 3 HI 3  0.856      50   2006   2007     3  0.686     27
 4 HI 10 1.25      200   2006   2007     5  1.43      25
 5 HI 11 0.673      50   2006   2007     5  1.40      23
 6 HI 5  2.51      150   2006   2007     3  2.23      38
 7 HI 14 0.543     150   2006   2007     2  2.17      32
 8 HI 5  2.43      200   2006   2007     5  2.51      38
 9 HI 9  1.69      100   2006   2007     4  2.67      26
10 HI 3  2.02       50   2006   2007     2  2.86      27
# ... with 489 more rows

Now filter it at will

df %>% filter(PLOT.n > 30)

ouptut

# A tibble: 139 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 14 0.270     150   2006   2007     2  2.44      32
 2 HI 5  2.51      150   2006   2007     3  2.23      38
 3 HI 14 0.543     150   2006   2007     2  2.17      32
 4 HI 5  2.43      200   2006   2007     5  2.51      38
 5 HI 8  0.598      50   2006   2007     1  1.70      34
 6 HI 7  1.94       50   2006   2007     4  1.61      35
 7 HI 14 2.91       50   2006   2007     4  0.215     32
 8 HI 7  0.846     150   2006   2007     4  0.506     35
 9 HI 7  2.38      150   2006   2007     3  1.34      35
10 HI 7  2.62      100   2006   2007     3  0.167     35
# ... with 129 more rows

Or this way

df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))

output

# A tibble: 3 x 8
  PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
  <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
1 HI 20 0.392     200   2009   2010     1  0.512      3
2 HI 20 0.859     150   2009   2010     5  2.62       3
3 HI 20 0.882     200   2009   2010     5  1.06       3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 9  1.69      100   2006   2007     4  2.67      26
 2 HI 9  2.20       50   2006   2007     4  1.49      26
 3 HI 9  0.587     200   2006   2007     3  1.13      26
 4 HI 9  1.27       50   2006   2007     1  2.55      26
 5 HI 9  1.56      150   2006   2007     3  2.01      26
 6 HI 9  0.198     100   2006   2007     3  2.08      26
 7 HI 9  2.72      150   2007   2008     3  0.421     26
 8 HI 9  0.251     200   2007   2008     2  0.328     26
 9 HI 9  1.83       50   2007   2008     1  0.192     26
10 HI 9  1.97      100   2007   2008     1  0.900     26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
   PLOT   SIZE DENSITY SEEDYR SAMPYR   AGE SHOOTS PLOT.n
   <chr> <dbl>   <dbl>  <dbl>  <dbl> <int>  <dbl>  <int>
 1 HI 5  2.51      150   2006   2007     3   2.23     38
 2 HI 5  2.43      200   2006   2007     5   2.51     38
 3 HI 5  2.06      100   2006   2007     5   1.93     38
 4 HI 5  1.25      150   2006   2007     4   2.29     38
 5 HI 5  2.29      200   2006   2007     1   2.97     38
 6 HI 5  0.789     150   2006   2007     2   1.59     38
 7 HI 5  1.11      100   2007   2008     4   2.61     38
 8 HI 5  2.38      150   2007   2008     4   2.95     38
 9 HI 5  2.67      200   2007   2008     3   1.77     38
10 HI 5  2.63      100   2007   2008     1   1.90     38
# ... with 28 more rows

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文