根据计数改变线图的厚度

发布于 2025-01-10 20:26:17 字数 2154 浏览 0 评论 0原文

数据框“id”具有“year”、“id”和“matriline”列，其中每一行都是一个事件。我想按母系计算每年的事件数量，所以我这样做了：

events.bymatr = 
id %>%
group_by(year, matr, .drop = FALSE) %>%
dplyr::summarise(n = n()) %>%
ungroup()
events.bymatr

我绘制了按母系随时间变化的事件数量的折线图。

ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=matr))

我的问题是双重的：

有没有一种方法可以重新创建这个折线图，其中线条的粗细由每个母系有多少个 ID 决定？我想这将涉及重塑我上面的数据，但是当我尝试 group_by(year,matr,id,.drop=FALSE) 时，我的数据变得很不稳定。
我想更改调色板，使每种颜色都非常独特 - 如何附加新调色板？我尝试使用这个 c25 调色板与此代码，但它使我所有线条消失。 ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=c25))

提前非常感谢！

“id”的输出（缩短为每列前五行）：

> dput(id)
    structure(list(date = structure(c(8243, 8243, 8243, 8248, 8947,
    class = "Date"), year = c(1992L, 1992L, 1992L, 1992L, 1994L), 
    event.id = c(8L, 8L, 8L, 10L, 11L), id = structure(c(51L, 55L, 59L, 
    46L, 51L), .Label = c("J11", "J16", "J17", "J2", "J22"),
    class = "factor"), sex = structure(c(1L, 2L, 2L, 1L, 1L),
    .Label = c("0", "1"), class = "factor"), age = c(28L, 12L, 6L, 42L, 
    30L), matr = structure(c(20L, 20L, 20L, 11L, 20L), .Label = c("J2", 
    "J4", "J7", "J9", "K11"), class = "factor"),
    matralive = structure(c(2L, 2L, 2L, 2L, 2L),
    .Label = c("0", "1"), class = "factor"), pod = structure(c(3L, 3L, 
    3L, 3L, 3L), .Label = c("J", "K", "L"), class = "factor")),
    row.names = c(NA, -134L), class = c("tbl_df", "tbl", "data.frame"))

events.bymatr 的输出：

> dput(events.bymatr)
      structure(list(year = c(1992L, 1992L, 1992L, 1992L, 1992L),
      matr = structure(c(1L, 2L, 3L, 4L, 5L), .Label = c("J2", "J4", 
      "J7", "J9", "K11"), class = "factor"), n = c(0L, 0L, 0L, 0L, 0L)), 
      row.names = c(NA, -380L), class = c("tbl_df", "tbl", 
      "data.frame"))

原文

Dataframe "id" has the columns year, id, and matriline, where each row is an incident. I wanted to count the number of incidents by matriline per year, so I did:

events.bymatr = 
id %>%
group_by(year, matr, .drop = FALSE) %>%
dplyr::summarise(n = n()) %>%
ungroup()
events.bymatr

I plotted a line graph of the number of incidents over time, by matriline.

ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=matr))

My question is twofold:

Is there a way I could recreate this line graph where the thickness of the lines is determined by how many IDs there were, per matriline? I imagine this would involve reshaping my data above but when I tried to group_by(year,matr,id,.drop=FALSE) my data came out all wonky.
I want to change the color palete so that each color is very distinct - how do I attach a new color palette? I tried using this c25 palette with this code but it makes all my lines disappear.
ggplot(events.bymatr, aes(x=year, y=n, group=matr)) + geom_line(aes(color=c25))

Thanks so much in advance!

Output of "id" (shortened to just the first five rows per column):

> dput(id)
    structure(list(date = structure(c(8243, 8243, 8243, 8248, 8947,
    class = "Date"), year = c(1992L, 1992L, 1992L, 1992L, 1994L), 
    event.id = c(8L, 8L, 8L, 10L, 11L), id = structure(c(51L, 55L, 59L, 
    46L, 51L), .Label = c("J11", "J16", "J17", "J2", "J22"),
    class = "factor"), sex = structure(c(1L, 2L, 2L, 1L, 1L),
    .Label = c("0", "1"), class = "factor"), age = c(28L, 12L, 6L, 42L, 
    30L), matr = structure(c(20L, 20L, 20L, 11L, 20L), .Label = c("J2", 
    "J4", "J7", "J9", "K11"), class = "factor"),
    matralive = structure(c(2L, 2L, 2L, 2L, 2L),
    .Label = c("0", "1"), class = "factor"), pod = structure(c(3L, 3L, 
    3L, 3L, 3L), .Label = c("J", "K", "L"), class = "factor")),
    row.names = c(NA, -134L), class = c("tbl_df", "tbl", "data.frame"))

Output of events.bymatr:

> dput(events.bymatr)
      structure(list(year = c(1992L, 1992L, 1992L, 1992L, 1992L),
      matr = structure(c(1L, 2L, 3L, 4L, 5L), .Label = c("J2", "J4", 
      "J7", "J9", "K11"), class = "factor"), n = c(0L, 0L, 0L, 0L, 0L)), 
      row.names = c(NA, -380L), class = c("tbl_df", "tbl", 
      "data.frame"))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷离° 2025-01-17 20:26:17

正如 @r2evans 所指出的，要清楚地区分多种颜色是非常困难的。我在这里使用了一个 20 色标尺的示例，效果相当不错，但即便如此，仍有一些色标很难区分。这是使用 dplyr 附带的 storms 数据集的尝试。

library(dplyr)
storms %>%
  group_by(name, year) %>%
  summarize(n = n(), .groups = "drop") %>%   # = number of name per year View
  tidyr::complete(name, year = 1975:2015, fill = list(n = 0)) %>% 
  group_by(name) %>%
  mutate(total = sum(n)) %>%                 # = number of name overall
  ungroup()  %>% 
  filter(total %% 12 == 0) %>% # Arbitrary, to reduce scope of data for example
  ggplot(aes(year, n, color = name, size = total, group = name)) +
  geom_line() +
  guides(color = guide_legend(override.aes = list(size = 3))) +
  ggthemes::scale_color_tableau(palette = "Tableau 20")

As @r2evans noted, it is surprisingly hard to distinguish clearly among more than a handful of colors. I used an example 20-color scale here that does a pretty good job, but even so a few can be tricky to distinguish. Here's an attempt using the storms dataset included with dplyr.

library(dplyr)
storms %>%
  group_by(name, year) %>%
  summarize(n = n(), .groups = "drop") %>%   # = number of name per year View
  tidyr::complete(name, year = 1975:2015, fill = list(n = 0)) %>% 
  group_by(name) %>%
  mutate(total = sum(n)) %>%                 # = number of name overall
  ungroup()  %>% 
  filter(total %% 12 == 0) %>% # Arbitrary, to reduce scope of data for example
  ggplot(aes(year, n, color = name, size = total, group = name)) +
  geom_line() +
  guides(color = guide_legend(override.aes = list(size = 3))) +
  ggthemes::scale_color_tableau(palette = "Tableau 20")

回复收藏 0 原文

~没有更多了~