在R中的数据框中每个数值的绘制平均值和标准偏差

发布于 2025-02-13 05:14:31 字数 1049 浏览 1 评论 0原文

我想绘制每个数字列的均值为条，标准偏差是通过栏的线。如何为iris数据集执行此操作？

我正在尝试更改数据集，以使其易于在GGPLOT2中绘制。

What I've tried

iris %>%
  dplyr::select_if(is.numeric) %>%
  dplyr::summarise(avg_sepal_length = mean(Sepal.Length),
                  avg_sepal_width = mean(Sepal.Width),
                  avg_petal_length = mean(Petal.Length),
                  avg_petal_width = mean(Petal.Width),
                  sd_sepal_length = sd(Sepal.Length),
                  sd_sepal_width = sd(Sepal.Width),
                  sd_petal_length = sd(Petal.Length),
                  sd_petal_width = sd(Petal.Width))

I want to pivot into two columns so the dataframe will look like so:

stat            mean            sd
sepal_length    5.843333        0.8280661        
sepal_width     3.057333        0.4358663
petal_length    3.758           1.765298    
pedal_width     1.199333        0.7622377

And then plot the upperbound and lower bound as a line for the sd and the.均值作为GGPLOT中的酒吧

原文

I want to plot every numeric column with the mean as a bar and the standard deviation is a line through the bar. How can I do this for the iris dataset?

I'm trying to transform my dataset to make it easy to plot in ggplot2.

What I've tried

iris %>%
  dplyr::select_if(is.numeric) %>%
  dplyr::summarise(avg_sepal_length = mean(Sepal.Length),
                  avg_sepal_width = mean(Sepal.Width),
                  avg_petal_length = mean(Petal.Length),
                  avg_petal_width = mean(Petal.Width),
                  sd_sepal_length = sd(Sepal.Length),
                  sd_sepal_width = sd(Sepal.Width),
                  sd_petal_length = sd(Petal.Length),
                  sd_petal_width = sd(Petal.Width))

I want to pivot into two columns so the dataframe will look like so:

stat            mean            sd
sepal_length    5.843333        0.8280661        
sepal_width     3.057333        0.4358663
petal_length    3.758           1.765298    
pedal_width     1.199333        0.7622377

And then plot the upperbound and lower bound as a line for the sd and the. mean as a bar in ggplot

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

海拔太高太耀眼 2025-02-20 05:14:32

为了达到所需的结果，您可以首先使用dplyr ::跨简化代码。 Afterwards you could convert to long via pivot_longer whereby using the .value allows to put the means and the sd s在他们自己的列中。 Finally you could make your plot as a combination of eg geom_col and geom_pointrange:

library(dplyr)
library(tidyr)
library(ggplot2)

iris_sum <- iris %>%
  summarise(across(where(is.numeric), .fns = list(avg = mean, sd = sd), .names = "{.fn}_{.col}")) |> 
  pivot_longer(everything(), names_to = c(".value", "name"), names_sep = "_") |> 
  mutate(name = gsub("\\.", '_', tolower(name)))

iris_sum
#> # A tibble: 4 × 3
#>   name           avg    sd
#>   <chr>        <dbl> <dbl>
#> 1 sepal_length  5.84 0.828
#> 2 sepal_width   3.06 0.436
#> 3 petal_length  3.76 1.77 
#> 4 petal_width   1.20 0.762

ggplot(iris_sum, aes(name, avg)) +
  geom_col() +
  geom_pointrange(aes(ymin = avg - sd, ymax = avg + sd))

To achieve your desired result you could first simplify your code using dplyr::across. Afterwards you could convert to long via pivot_longer whereby using the .value allows to put the means and the sds in their own columns. Finally you could make your plot as a combination of e.g. geom_col and geom_pointrange:

library(dplyr)
library(tidyr)
library(ggplot2)

iris_sum <- iris %>%
  summarise(across(where(is.numeric), .fns = list(avg = mean, sd = sd), .names = "{.fn}_{.col}")) |> 
  pivot_longer(everything(), names_to = c(".value", "name"), names_sep = "_") |> 
  mutate(name = gsub("\\.", '_', tolower(name)))

iris_sum
#> # A tibble: 4 × 3
#>   name           avg    sd
#>   <chr>        <dbl> <dbl>
#> 1 sepal_length  5.84 0.828
#> 2 sepal_width   3.06 0.436
#> 3 petal_length  3.76 1.77 
#> 4 petal_width   1.20 0.762

ggplot(iris_sum, aes(name, avg)) +
  geom_col() +
  geom_pointrange(aes(ymin = avg - sd, ymax = avg + sd))

回复收藏 0 原文

感情废物 2025-02-20 05:14:32

您的输出格式不是ggplot2的最佳格式，它更喜欢它的时间更长：


library(tidyr); library(dplyr)

iris %>%
  summarise(
        across(
            where(is.double), 
            list(mean = mean, sd = sd)
        )
    )  |>
    pivot_longer(
        everything(), 
        names_sep = "_", 
        names_to = c("feature", "stat")
    )  


# A tibble: 8 x 3
#   feature      stat  value
#   <chr>        <chr> <dbl>
# 1 Sepal.Length mean  5.84
# 2 Sepal.Length sd    0.828
# 3 Sepal.Width  mean  3.06
# 4 Sepal.Width  sd    0.436
# 5 Petal.Length mean  3.76
# 6 Petal.Length sd    1.77
# 7 Petal.Width  mean  1.20
# 8 Petal.Width  sd    0.762

由于您熟悉iris数据集，因此值得检查 docs for 大量使用它。

要达到您的格式，您可以将以下内容添加到管道：

|>
    pivot_wider(names_from = "stat")

# # A tibble: 4 x 3
#   feature       mean    sd
#   <chr>        <dbl> <dbl>
# 1 Sepal.Length  5.84 0.828
# 2 Sepal.Width   3.06 0.436
# 3 Petal.Length  3.76 1.77 
# 4 Petal.Width   1.20 0.762

Your output format is not the best format for ggplot2, which prefers it even longer:


library(tidyr); library(dplyr)

iris %>%
  summarise(
        across(
            where(is.double), 
            list(mean = mean, sd = sd)
        )
    )  |>
    pivot_longer(
        everything(), 
        names_sep = "_", 
        names_to = c("feature", "stat")
    )  


# A tibble: 8 x 3
#   feature      stat  value
#   <chr>        <chr> <dbl>
# 1 Sepal.Length mean  5.84
# 2 Sepal.Length sd    0.828
# 3 Sepal.Width  mean  3.06
# 4 Sepal.Width  sd    0.436
# 5 Petal.Length mean  3.76
# 6 Petal.Length sd    1.77
# 7 Petal.Width  mean  1.20
# 8 Petal.Width  sd    0.762

As you are familiar with the iris dataset, it is worth checking out the docs for across which make heavy use of it.

To get to your format you can add the following to the pipe:

|>
    pivot_wider(names_from = "stat")

# # A tibble: 4 x 3
#   feature       mean    sd
#   <chr>        <dbl> <dbl>
# 1 Sepal.Length  5.84 0.828
# 2 Sepal.Width   3.06 0.436
# 3 Petal.Length  3.76 1.77 
# 4 Petal.Width   1.20 0.762

回复收藏 0 原文

全部不再 2025-02-20 05:14:32

请注意，您实际上不需要预处理DF即可计算摘要值，您可以直接使用GGPLOT2的stat_summary：

library(ggplot2)

ggplot(stack(iris), aes(x = ind, y = values)) + 
  stat_summary(geom = "bar", fun = mean) + 
  stat_summary(
    fun = mean, 
    fun.min = function(x) mean(x) - sd(x), 
    fun.max = function(x) mean(x) + sd(x))

在这里，我已经使用了Base R的Simple R的Simple stack 使虹膜数据集的长版本的功能；您可以使用自己喜欢的任何库（尤其是要包括其他操纵的情况下）。

Note that you don't actually need to pre-process the df to calculate the summary values, you can use ggplot2's stat_summary directly:

library(ggplot2)

ggplot(stack(iris), aes(x = ind, y = values)) + 
  stat_summary(geom = "bar", fun = mean) + 
  stat_summary(
    fun = mean, 
    fun.min = function(x) mean(x) - sd(x), 
    fun.max = function(x) mean(x) + sd(x))

Here I've used base R's simple stack function to make a long version of the iris dataset; you can use whatever libraries you prefer (especially if you want to include other manipulations).

回复收藏 0 原文

凯凯我们等你回来 2025-02-20 05:14:31

您只需尝试

iris %>%
  dplyr::select_if(is.numeric) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(name, value)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult = 1))

You can simply try

iris %>%
  dplyr::select_if(is.numeric) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(name, value)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult = 1))