创建一个频率表,并使用dplyr/ ggplot绘制直方图

发布于 2025-02-13 00:53:30 字数 1420 浏览 2 评论 0原文

我是R的新手,在R中,需要一些帮助。请注意 - 我使用剪切功能有解决这个问题的解决方案。

​使用dplyr。我想使用dplyr创建频率表(不想存储此频率表)并使用ggplot绘制数据。

问题:我有来自2个传感器参考数据和传感器数据的数据(这是我正在评估的传感器)。传感器数据是分类数据(1或2或3)。我正在尝试为不同的参考bin值绘制传感器状态的直方图。例如:当参考值为1-5时,我想看到传感器1状态(1或2或3)的频率分布。同样,对于6-10个参考数据和参考值的最多95-100,我想要传感器状态的频率分布。请参阅下面的示例数据。感谢帮助。

dput(sample1)
structure(list(test_data = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 
0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 
49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 
27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 
13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 
7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 
3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 
1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 
66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), status = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")

I am new to piping and dplyr in R and need some help. Please note- I have a solution for this question using cut function.

Plotting Categorical Values Histogram in R

I want to solve the problem using dplyr. I want to use dplyr to create a frequency table (do not want to store this frequency table) and plot the data using ggplot.

Problem: I have data from 2 sensors- reference data and sensor data (this is the sensor I am evaluating). Sensor data is categorical data (1 or 2 or 3). I am trying to plot a histogram of status of sensor for different bin values of reference. For example: when the reference value is 1-5, I want to see a frequency distribution of sensor 1 status (1 or 2 or 3). Similarly for 6-10 of reference data and upto 95-100 of reference value, I want a frequency distributions of sensor status. Please see sample data below. Appreciate the help.

dput(sample1)
structure(list(test_data = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 
0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 
49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 
27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 
13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 
7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 
3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 
1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 
66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), status = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苄①跕圉湢 2025-02-20 00:53:30
library(dplyr)
library(ggplot2)
dat %>%
  mutate(bin = cut(dusttrak_conc, breaks=seq(0,100,by=10))) %>%
  count(bin, sensor1_status) %>%
  ggplot(aes(bin, sensor1_status)) +
  geom_tile(aes(fill = n))

library(dplyr)
library(ggplot2)
dat %>%
  mutate(bin = cut(dusttrak_conc, breaks=seq(0,100,by=10))) %>%
  count(bin, sensor1_status) %>%
  ggplot(aes(bin, sensor1_status)) +
  geom_tile(aes(fill = n))

enter image description here

浅听莫相离 2025-02-20 00:53:30

听起来您想将每个状态级别的频率1:3作为每个参考值范围的条形图。这里的一种选择是使用FaceTing为每个兴趣范围的新图分开一个新图。要生成拆分,是base :: cut() is plyr :: round_any()的替代方案。在我的示例中,我分成15个垃圾箱,使图形更简单,但您可以调整以适应。

注意:由于与共享名称的{dplyr}函数发生冲突,加载{plyr}库通常是不受欢迎的。因此,您可能只需要明确调用此一个函数,或者在脚本中手动定义它,如定义在这里

library(tidyverse)

d <- structure(list(reference = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), 
                    sensor1_status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")


d %>% 
  mutate(ref_range = plyr::round_any(reference, accuracy = 15, f = ceiling)) %>%
  ggplot(aes(sensor1_status)) +
  geom_bar() +
  facet_grid(rows = "ref_range", scales = "free_y")

“”

在2022-07-05上创建的 reprex软件包(v2.0.1)

It sounds like you want to see frequency of each status level 1:3 as a bar plot for each range of reference values. One option here is to use faceting to split out a new graph for each range of interest. To generate the splits, an alternative to base::cut() is plyr::round_any(). In my example I split into bins of 15 to make the graphic more simple but you can adjust to suit.

Note: it is often undesirable to load the {plyr} library due to conflicts with {dplyr} functions that share a name. Therefore you may want to just call this one function explicitly or define it manually in your script as defined here.

library(tidyverse)

d <- structure(list(reference = c(1.2, 0.2, 0.6, 1.6, 1, 1, 0.4, 0.4, 0.8, 0.8, 0.4, 0.2, 15.8, 59.2, 55.4, 54.8, 54.6, 54.2, 49, 53, 47.2, 44, 40.2, 39, 34.2, 35.8, 33.4, 30.6, 29.4, 29.2, 27.6, 24.8, 24, 22, 21.2, 20.6, 18.6, 18, 17, 17.2, 14.8, 15.2, 13.2, 13.4, 12, 11.8, 11, 10.8, 10, 9.2, 8.8, 8.8, 8.4, 7.8, 7.6, 6.6, 6.4, 6.2, 6, 5.8, 5.4, 5, 4.8, 4.4, 4.2, 4, 3.8, 3.6, 3.6, 3.6, 3, 2.8, 3, 2.8, 2.6, 2.4, 2.4, 2.2, 2, 2.2, 2.2, 1.8, 1.8, 1.6, 1.8, 1.8, 2.2, 71.2, 75.8, 74.6, 74.6, 74.2, 67.2, 66.2, 60.6, 60.6, 54.8, 53.6, 48.4, 45.2), 
                    sensor1_status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = 113:212, class = "data.frame")


d %>% 
  mutate(ref_range = plyr::round_any(reference, accuracy = 15, f = ceiling)) %>%
  ggplot(aes(sensor1_status)) +
  geom_bar() +
  facet_grid(rows = "ref_range", scales = "free_y")

Created on 2022-07-05 by the reprex package (v2.0.1)

满地尘埃落定 2025-02-20 00:53:30

在这里不需要dplyr:

library('ggplot2')

# create ggplot; specify data frame and x-axis variable
ggplot(sample1, aes(x = sensor1_status)) +
  
  # geom_bar() counts the number of cases at each x position
  geom_bar(stat = "count") +
  
  # facet_wrap() creates a square-ish grid of multiple panels
  # - facets defines the "grouping" per panel; cut creates the bins
  # - scales chooses to keep all x/y-axes the same or not
  # - drop chooses if empty groups should be dropped
  facet_wrap(facets = vars(reference_bin = cut(dusttrak_conc, seq(0,100,5), right = F, include.lowest = T)),
             scales = "free_y",
             drop = F) +
  
  # format y axis: desire ± 4 ticks; round() + unique() to prevent fractions on the ticks
  # I used (the new base R) piping because you were interested in it :)
  scale_y_continuous(breaks=\(x) pretty(x, n = 4) |> round() |> unique() )

”在此处输入图像说明”

dplyr isn't necessary here:

library('ggplot2')

# create ggplot; specify data frame and x-axis variable
ggplot(sample1, aes(x = sensor1_status)) +
  
  # geom_bar() counts the number of cases at each x position
  geom_bar(stat = "count") +
  
  # facet_wrap() creates a square-ish grid of multiple panels
  # - facets defines the "grouping" per panel; cut creates the bins
  # - scales chooses to keep all x/y-axes the same or not
  # - drop chooses if empty groups should be dropped
  facet_wrap(facets = vars(reference_bin = cut(dusttrak_conc, seq(0,100,5), right = F, include.lowest = T)),
             scales = "free_y",
             drop = F) +
  
  # format y axis: desire ± 4 ticks; round() + unique() to prevent fractions on the ticks
  # I used (the new base R) piping because you were interested in it :)
  scale_y_continuous(breaks=\(x) pretty(x, n = 4) |> round() |> unique() )

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文