轮廓散点图/条形图与折线图以及 ggplot 中的分类数据和组
我有一个包含约 150 个国家/地区的数据集、一个分组变量以及每个国家/地区和组的值 (0-6)。我试图表明,GDP 较高的国家在一组中获得的价值高于另一组。我制作了一个散点图,按组显示每个国家/地区的值(国家/地区按 GDP 排序)。我想在这些点周围画一条线,这样哪个组在哪个 GDP 范围内具有更高的值就变得更加明显。 然而我却不知所措。
data %>% ggplot(aes(x = fct_rev(fct_inorder(country)),
y = count, color = group))+
geom_point()
显然,我需要清理它,但是有人知道如何做到这一点吗?由于 x 轴上的因素,密度不起作用,线也不起作用,因为它到达每个点。 根据我当前的数据集模拟连续数据也可以(我只是不知道如何做到这一点)。 我只想强调基于群体和 GDP 的数值范围。 也许不同类型的情节可能会有所帮助,但是哪一个呢?所有评论表示赞赏! 我的数据如下,已经按GDP排序了
structure(list(count = c(1, 0, 1, 3, 0, 1, 0, 1, 0, 1, 0, 2,
2, 0, 1, 0, 0, 2, 3, 0, 0, 1, 0, 2, 2, 0, 1, 0, 1, 0, 0, 1, 1,
0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 2, 0,
0, 1, 2, 0, 0, 6, 1, 1, 1, 0, 2, 1, 1, 0, 4, 2, 1, 0, 2, 4, 0,
1, 1, 0, 2, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 3, 2, 1, 0, 1, 1,
2, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 4, 0, 2, 0,
2, 0, 2, 0, 1, 1, 1, 1, 1, 5, 0, 0, 3, 3, 0, 0, 2, 1, 2, 0, 2,
1, 0, 0, 2, 1, 0, 1, 2, 3, 0, 0, 1, 4, 0, 0, 3, 0, 1, 0, 2, 4,
0, 1, 0, 0, 2, 0, 1, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 0, 2, 0, 1,
0, 1, 1, 2, 0, 1, 1, 1, 0, 2, 0, 2, 0, 3, 1, 0, 4, 0, 2, 0, 0,
1, 1, 0, 0, 1, 0, 2, 0, 2, 0, 1, 0, 1, 0, 1, 0, 2, 2, 0), country = c("CHE",
"CHE", "NOR", "NOR", "IRL", "IRL", "SGP", "SGP", "USA", "USA",
"AUS", "AUS", "DNK", "DNK", "SWE", "SWE", "NLD", "NLD", "GBR",
"GBR", "SCT", "SCT", "FIN", "FIN", "CAN", "CAN", "DEU", "DEU",
"BEL", "BEL", "ARE", "ARE", "JPN", "JPN", "ITA", "ITA", "KOR",
"KOR", "BRN", "BRN", "SVN", "SVN", "BHR", "BHR", "PRT", "PRT",
"SAU", "SAU", "EST", "EST", "GRC", "GRC", "LTU", "LTU", "SYC",
"SYC", "LVA", "LVA", "CHL", "CHL", "HRV", "HRV", "CRI", "CRI",
"TUR", "TUR", "MYS", "MYS", "KAZ", "KAZ", "LCA", "LCA", "ROU",
"ROU", "MUS", "MUS", "GRD", "GRD", "MEX", "MEX", "RUS", "RUS",
"CHN", "CHN", "SUR", "SUR", "BRA", "BRA", "DOM", "DOM", "BGR",
"BGR", "MNE", "MNE", "THA", "THA", "COL", "COL", "SRB", "SRB",
"ZAF", "ZAF", "GUY", "GUY", "FJI", "FJI", "LBY", "LBY", "BIH",
"BIH", "AZE", "AZE", "MKD", "MKD", "JAM", "JAM", "IRQ", "IRQ",
"NAM", "NAM", "GEO", "GEO", "ALB", "ALB", "XKX", "XKX", "WSM",
"WSM", "LKA", "LKA", "JOR", "JOR", "ARM", "ARM", "EGY", "EGY",
"SWZ", "SWZ", "IDN", "IDN", "PSE", "PSE", "CPV", "CPV", "MDA",
"MDA", "MAR", "MAR", "VUT", "VUT", "PNG", "PNG", "NGA", "NGA",
"LAO", "LAO", "VNM", "VNM", "SLB", "SLB", "GHA", "GHA", "MRT",
"MRT", "KEN", "KEN", "PAK", "PAK", "BGD", "BGD", "HTI", "HTI",
"ZMB", "ZMB", "SEN", "SEN", "YEM", "YEM", "SDN", "SDN", "TLS",
"TLS", "KGZ", "KGZ", "BEN", "BEN", "TJK", "TJK", "TZA", "TZA",
"EAZ", "EAZ", "NPL", "NPL", "GIN", "GIN", "UGA", "UGA", "MLI",
"MLI", "ETH", "ETH", "BFA", "BFA", "GMB", "GMB", "LBR", "LBR",
"GNB", "GNB", "SLE", "SLE", "MOZ", "MOZ", "AFG", "AFG", "COD",
"COD", "MWI", "MWI"), group = c(1, 2, 2, 1, 2, 1, 1, 2, 1, 2,
2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1,
1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1,
2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1,
2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2,
1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1
)), row.names = c(33L, 34L, 151L, 152L, 89L, 90L, 175L, 176L,
205L, 206L, 9L, 10L, 49L, 50L, 187L, 188L, 149L, 150L, 65L, 66L,
169L, 170L, 61L, 62L, 31L, 32L, 47L, 48L, 13L, 14L, 5L, 6L, 99L,
100L, 93L, 94L, 107L, 108L, 29L, 30L, 185L, 186L, 23L, 24L, 159L,
160L, 167L, 168L, 57L, 58L, 77L, 78L, 119L, 120L, 191L, 192L,
121L, 122L, 35L, 36L, 83L, 84L, 45L, 46L, 199L, 200L, 143L, 144L,
101L, 102L, 115L, 116L, 163L, 164L, 139L, 140L, 79L, 80L, 127L,
128L, 165L, 166L, 37L, 38L, 183L, 184L, 27L, 28L, 51L, 52L, 21L,
22L, 133L, 134L, 193L, 194L, 41L, 42L, 181L, 182L, 217L, 218L,
81L, 82L, 63L, 64L, 113L, 114L, 25L, 26L, 11L, 12L, 129L, 130L,
95L, 96L, 91L, 92L, 145L, 146L, 67L, 68L, 3L, 4L, 213L, 214L,
211L, 212L, 117L, 118L, 97L, 98L, 7L, 8L, 55L, 56L, 189L, 190L,
87L, 88L, 161L, 162L, 43L, 44L, 125L, 126L, 123L, 124L, 209L,
210L, 157L, 158L, 147L, 148L, 109L, 110L, 207L, 208L, 177L, 178L,
69L, 70L, 137L, 138L, 103L, 104L, 155L, 156L, 19L, 20L, 85L,
86L, 219L, 220L, 173L, 174L, 215L, 216L, 171L, 172L, 197L, 198L,
105L, 106L, 15L, 16L, 195L, 196L, 201L, 202L, 53L, 54L, 153L,
154L, 71L, 72L, 203L, 204L, 131L, 132L, 59L, 60L, 17L, 18L, 73L,
74L, 111L, 112L, 75L, 76L, 179L, 180L, 135L, 136L, 1L, 2L, 39L,
40L, 141L, 142L), class = "data.frame")
I have a dataset with ~ 150 countries, a grouping variable, and a value for each country and group (0-6). I am trying to show, that countries with a higher GDP get higher values in one group than the other. I made a scatterplot showing the values for each country by group (the countries are sorted by GDP). I want to draw a line around the points, so it becomes more aparent which group has higher values in which range of GDP.
I am however, at a loss.
data %>% ggplot(aes(x = fct_rev(fct_inorder(country)),
y = count, color = group))+
geom_point()
I need to clean it up, obviously, but does someone know how to do this, density does not work due to the factor on the x axis, line does not work either, as it goes to every single point.
Simulating continous data on the basis of my current dataset would work as well (I just cant figure out how to do that).
I just want to highlight the value ranges based by group and GDP.
Maybe a different type of plot might help, but which? All comments appreciated!
My data is below, it is sorted by GDP already
structure(list(count = c(1, 0, 1, 3, 0, 1, 0, 1, 0, 1, 0, 2,
2, 0, 1, 0, 0, 2, 3, 0, 0, 1, 0, 2, 2, 0, 1, 0, 1, 0, 0, 1, 1,
0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 2, 0,
0, 1, 2, 0, 0, 6, 1, 1, 1, 0, 2, 1, 1, 0, 4, 2, 1, 0, 2, 4, 0,
1, 1, 0, 2, 1, 2, 1, 0, 2, 0, 1, 2, 1, 0, 2, 3, 2, 1, 0, 1, 1,
2, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 4, 0, 2, 0,
2, 0, 2, 0, 1, 1, 1, 1, 1, 5, 0, 0, 3, 3, 0, 0, 2, 1, 2, 0, 2,
1, 0, 0, 2, 1, 0, 1, 2, 3, 0, 0, 1, 4, 0, 0, 3, 0, 1, 0, 2, 4,
0, 1, 0, 0, 2, 0, 1, 0, 2, 0, 2, 0, 2, 0, 2, 0, 1, 0, 2, 0, 1,
0, 1, 1, 2, 0, 1, 1, 1, 0, 2, 0, 2, 0, 3, 1, 0, 4, 0, 2, 0, 0,
1, 1, 0, 0, 1, 0, 2, 0, 2, 0, 1, 0, 1, 0, 1, 0, 2, 2, 0), country = c("CHE",
"CHE", "NOR", "NOR", "IRL", "IRL", "SGP", "SGP", "USA", "USA",
"AUS", "AUS", "DNK", "DNK", "SWE", "SWE", "NLD", "NLD", "GBR",
"GBR", "SCT", "SCT", "FIN", "FIN", "CAN", "CAN", "DEU", "DEU",
"BEL", "BEL", "ARE", "ARE", "JPN", "JPN", "ITA", "ITA", "KOR",
"KOR", "BRN", "BRN", "SVN", "SVN", "BHR", "BHR", "PRT", "PRT",
"SAU", "SAU", "EST", "EST", "GRC", "GRC", "LTU", "LTU", "SYC",
"SYC", "LVA", "LVA", "CHL", "CHL", "HRV", "HRV", "CRI", "CRI",
"TUR", "TUR", "MYS", "MYS", "KAZ", "KAZ", "LCA", "LCA", "ROU",
"ROU", "MUS", "MUS", "GRD", "GRD", "MEX", "MEX", "RUS", "RUS",
"CHN", "CHN", "SUR", "SUR", "BRA", "BRA", "DOM", "DOM", "BGR",
"BGR", "MNE", "MNE", "THA", "THA", "COL", "COL", "SRB", "SRB",
"ZAF", "ZAF", "GUY", "GUY", "FJI", "FJI", "LBY", "LBY", "BIH",
"BIH", "AZE", "AZE", "MKD", "MKD", "JAM", "JAM", "IRQ", "IRQ",
"NAM", "NAM", "GEO", "GEO", "ALB", "ALB", "XKX", "XKX", "WSM",
"WSM", "LKA", "LKA", "JOR", "JOR", "ARM", "ARM", "EGY", "EGY",
"SWZ", "SWZ", "IDN", "IDN", "PSE", "PSE", "CPV", "CPV", "MDA",
"MDA", "MAR", "MAR", "VUT", "VUT", "PNG", "PNG", "NGA", "NGA",
"LAO", "LAO", "VNM", "VNM", "SLB", "SLB", "GHA", "GHA", "MRT",
"MRT", "KEN", "KEN", "PAK", "PAK", "BGD", "BGD", "HTI", "HTI",
"ZMB", "ZMB", "SEN", "SEN", "YEM", "YEM", "SDN", "SDN", "TLS",
"TLS", "KGZ", "KGZ", "BEN", "BEN", "TJK", "TJK", "TZA", "TZA",
"EAZ", "EAZ", "NPL", "NPL", "GIN", "GIN", "UGA", "UGA", "MLI",
"MLI", "ETH", "ETH", "BFA", "BFA", "GMB", "GMB", "LBR", "LBR",
"GNB", "GNB", "SLE", "SLE", "MOZ", "MOZ", "AFG", "AFG", "COD",
"COD", "MWI", "MWI"), group = c(1, 2, 2, 1, 2, 1, 1, 2, 1, 2,
2, 1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1,
1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1,
2, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1,
2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2,
1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1,
2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2,
1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1
)), row.names = c(33L, 34L, 151L, 152L, 89L, 90L, 175L, 176L,
205L, 206L, 9L, 10L, 49L, 50L, 187L, 188L, 149L, 150L, 65L, 66L,
169L, 170L, 61L, 62L, 31L, 32L, 47L, 48L, 13L, 14L, 5L, 6L, 99L,
100L, 93L, 94L, 107L, 108L, 29L, 30L, 185L, 186L, 23L, 24L, 159L,
160L, 167L, 168L, 57L, 58L, 77L, 78L, 119L, 120L, 191L, 192L,
121L, 122L, 35L, 36L, 83L, 84L, 45L, 46L, 199L, 200L, 143L, 144L,
101L, 102L, 115L, 116L, 163L, 164L, 139L, 140L, 79L, 80L, 127L,
128L, 165L, 166L, 37L, 38L, 183L, 184L, 27L, 28L, 51L, 52L, 21L,
22L, 133L, 134L, 193L, 194L, 41L, 42L, 181L, 182L, 217L, 218L,
81L, 82L, 63L, 64L, 113L, 114L, 25L, 26L, 11L, 12L, 129L, 130L,
95L, 96L, 91L, 92L, 145L, 146L, 67L, 68L, 3L, 4L, 213L, 214L,
211L, 212L, 117L, 118L, 97L, 98L, 7L, 8L, 55L, 56L, 189L, 190L,
87L, 88L, 161L, 162L, 43L, 44L, 125L, 126L, 123L, 124L, 209L,
210L, 157L, 158L, 147L, 148L, 109L, 110L, 207L, 208L, 177L, 178L,
69L, 70L, 137L, 138L, 103L, 104L, 155L, 156L, 19L, 20L, 85L,
86L, 219L, 220L, 173L, 174L, 215L, 216L, 171L, 172L, 197L, 198L,
105L, 106L, 15L, 16L, 195L, 196L, 201L, 202L, 53L, 54L, 153L,
154L, 71L, 72L, 203L, 204L, 131L, 132L, 59L, 60L, 17L, 18L, 73L,
74L, 111L, 112L, 75L, 76L, 179L, 180L, 135L, 136L, 1L, 2L, 39L,
40L, 141L, 142L), class = "data.frame")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这里有一个想法可以帮助形象化您想要展示的差异。首先,无论您尝试如何标记,x 轴上的国家/地区名称可能仍然难以辨认。因此,最好将国家/地区的排名放在 x 轴上。
围绕这些点绘制多边形可能会使该点在视觉上更加直观,但从统计角度来看没有多大意义。这里可能更好的是为每个组绘制一条单独的线的回归。由于我们正在处理计数数据,因此我们可以使用泊松回归,并且由于我们在 x 轴上有一个数字排名,因此可以使用穿过绘图的线来显示回归。
我认为这个情节很好地说明了这一点。为了清楚起见,我可能会为国家/地区添加几个标签,以帮助您的受众适应规模。
如果您确实希望在点周围绘制多边形,那么最好在每个点周围绘制凸包,您可以这样做:
虽然正如我所说,这并没有具有很大的统计意义。
另一种想法是,您可以旋转数据框并找到第 1 组和第 2 组之间的计数差异,然后显示排名国家/地区的趋势线:
这有好处您可以更轻松地标记各个国家,并非常清楚地显示您想要展示的关系。
由 reprex 软件包 (v2.0.1) 创建于 2022 年 3 月 5 日
Here's one idea to help visualize the difference you are trying to show. Firstly, the country names on the x axis are likely to remain illegible however you try to label them. It might therefore be better to have the rank of the countries on the x axis.
Drawing a polygon around the points might make the point visually, but doesn't make much sense in statistical terms. What might be better here is to plot a regression with a separate line for each group. Since we are dealing with count data, we can use Poisson regression, and since we have a numeric rank on the x axis, it is possible to have lines going across your plot to show the regression.
I think this plot makes the point quite well. For clarity, I might add a couple of labels for countries to help orient your audience to the scale.
If you really want polygons drawn around the points, then it would probably be best to draw the convex hull around each set, which you could do like this:
Although as I say, this doesn't have much statistical meaning.
One other thought is that you can pivot the dataframe and find the difference in counts between group 1 and group 2, then show the trend line over the ranked countries:
This has the benefit that you can label individual countries more easily, and very clearly shows the relationship you want to demonstrate.
Created on 2022-03-05 by the reprex package (v2.0.1)