创建一个频率表，并使用dplyr/ ggplot绘制直方图

发布于 2025-02-13 00:53:30 字数 1420 浏览 2 评论 0原文

我是R的新手，在R中，需要一些帮助。请注意 - 我使用剪切功能有解决这个问题的解决方案。

使用dplyr。我想使用dplyr创建频率表（不想存储此频率表）并使用ggplot绘制数据。

问题：我有来自2个传感器参考数据和传感器数据的数据（这是我正在评估的传感器）。传感器数据是分类数据（1或2或3）。我正在尝试为不同的参考bin值绘制传感器状态的直方图。例如：当参考值为1-5时，我想看到传感器1状态（1或2或3）的频率分布。同样，对于6-10个参考数据和参考值的最多95-100，我想要传感器状态的频率分布。请参阅下面的示例数据。感谢帮助。

dput(sample1)
structure(list(test_data = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 
0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 
49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 
27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 
13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 
7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 
3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 
1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 
66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), status = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")

原文

I am new to piping and dplyr in R and need some help. Please note- I have a solution for this question using cut function.

Plotting Categorical Values Histogram in R

I want to solve the problem using dplyr. I want to use dplyr to create a frequency table (do not want to store this frequency table) and plot the data using ggplot.

Problem: I have data from 2 sensors- reference data and sensor data (this is the sensor I am evaluating). Sensor data is categorical data (1 or 2 or 3). I am trying to plot a histogram of status of sensor for different bin values of reference. For example: when the reference value is 1-5, I want to see a frequency distribution of sensor 1 status (1 or 2 or 3). Similarly for 6-10 of reference data and upto 95-100 of reference value, I want a frequency distributions of sensor status. Please see sample data below. Appreciate the help.

dput(sample1)
structure(list(test_data = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 
0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 
49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 
27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 
13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 
7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 
3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 
1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 
66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), status = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苄①跕圉湢 2025-02-20 00:53:30

library(dplyr)
library(ggplot2)
dat %>%
  mutate(bin = cut(dusttrak_conc, breaks=seq(0,100,by=10))) %>%
  count(bin, sensor1_status) %>%
  ggplot(aes(bin, sensor1_status)) +
  geom_tile(aes(fill = n))

library(dplyr)
library(ggplot2)
dat %>%
  mutate(bin = cut(dusttrak_conc, breaks=seq(0,100,by=10))) %>%
  count(bin, sensor1_status) %>%
  ggplot(aes(bin, sensor1_status)) +
  geom_tile(aes(fill = n))

回复收藏 0 原文

浅听莫相离 2025-02-20 00:53:30

听起来您想将每个状态级别的频率1：3作为每个参考值范围的条形图。这里的一种选择是使用FaceTing为每个兴趣范围的新图分开一个新图。要生成拆分，是base :: cut（） is plyr :: round_any（）的替代方案。在我的示例中，我分成15个垃圾箱，使图形更简单，但您可以调整以适应。

注意：由于与共享名称的{dplyr}函数发生冲突，加载{plyr}库通常是不受欢迎的。因此，您可能只需要明确调用此一个函数，或者在脚本中手动定义它，如定义在这里。

library(tidyverse)

d <- structure(list(reference = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), 
                    sensor1_status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")


d %>% 
  mutate(ref_range = plyr::round_any(reference, accuracy = 15, f = ceiling)) %>%
  ggplot(aes(sensor1_status)) +
  geom_bar() +
  facet_grid(rows = "ref_range", scales = "free_y")

^{在2022-07-05上创建的 reprex软件包（v2.0.1）}

It sounds like you want to see frequency of each status level 1:3 as a bar plot for each range of reference values. One option here is to use faceting to split out a new graph for each range of interest. To generate the splits, an alternative to base::cut() is plyr::round_any(). In my example I split into bins of 15 to make the graphic more simple but you can adjust to suit.

Note: it is often undesirable to load the {plyr} library due to conflicts with {dplyr} functions that share a name. Therefore you may want to just call this one function explicitly or define it manually in your script as defined here.

library(tidyverse)

d <- structure(list(reference = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), 
                    sensor1_status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")


d %>% 
  mutate(ref_range = plyr::round_any(reference, accuracy = 15, f = ceiling)) %>%
  ggplot(aes(sensor1_status)) +
  geom_bar() +
  facet_grid(rows = "ref_range", scales = "free_y")

^{Created on 2022-07-05 by the reprex package (v2.0.1)}

回复收藏 0 原文

满地尘埃落定 2025-02-20 00:53:30

在这里不需要dplyr：

library('ggplot2')

# create ggplot; specify data frame and x-axis variable
ggplot(sample1, aes(x = sensor1_status)) +
  
  # geom_bar() counts the number of cases at each x position
  geom_bar(stat = "count") +
  
  # facet_wrap() creates a square-ish grid of multiple panels
  # - facets defines the "grouping" per panel; cut creates the bins
  # - scales chooses to keep all x/y-axes the same or not
  # - drop chooses if empty groups should be dropped
  facet_wrap(facets = vars(reference_bin = cut(dusttrak_conc, seq(0,100,5), right = F, include.lowest = T)),
             scales = "free_y",
             drop = F) +
  
  # format y axis: desire ± 4 ticks; round() + unique() to prevent fractions on the ticks
  # I used (the new base R) piping because you were interested in it :)
  scale_y_continuous(breaks=\(x) pretty(x, n = 4) |> round() |> unique() )

dplyr isn't necessary here:

library('ggplot2')

# create ggplot; specify data frame and x-axis variable
ggplot(sample1, aes(x = sensor1_status)) +
  
  # geom_bar() counts the number of cases at each x position
  geom_bar(stat = "count") +
  
  # facet_wrap() creates a square-ish grid of multiple panels
  # - facets defines the "grouping" per panel; cut creates the bins
  # - scales chooses to keep all x/y-axes the same or not
  # - drop chooses if empty groups should be dropped
  facet_wrap(facets = vars(reference_bin = cut(dusttrak_conc, seq(0,100,5), right = F, include.lowest = T)),
             scales = "free_y",
             drop = F) +
  
  # format y axis: desire ± 4 ticks; round() + unique() to prevent fractions on the ticks
  # I used (the new base R) piping because you were interested in it :)
  scale_y_continuous(breaks=\(x) pretty(x, n = 4) |> round() |> unique() )