将临床病例的重叠点堆叠在小提琴图的顶部

发布于 2025-01-11 04:06:56 字数 2157 浏览 2 评论 0原文

第一次在这里发帖。如果我错过了解决我的情况所需的东西，我深表歉意。

我有一个匹配的病例对照设计，其中三个“年轻”临床病例与一个“年轻”对照组年龄匹配，三个“老年”病例与一个“老年”对照组匹配。我正在尝试在小提琴图中绘制对照组分布，并将相应的匹配案例叠加为数据点（博士生导师建议每个案例的数据点都有独特的形状和颜色，以帮助跟踪整个小提琴系列中的案例）地块）。

到目前为止，我的新手解决方案是为每个对照组创建一个数据框，然后为案例创建单独的数据框。创建绘图并添加格式详细信息，例如数据点的形状、颜色。

我的代码设置数据框，然后是绘图示例：

#remove the cases and put into a separate data frame
case_1.1 <- FTD_data[1:1, ]
case_1.2 <- FTD_data[2:2, ]
case_1.3 <- FTD_data[3:3, ]
case_2.1 <- FTD_data[13:13, ]
case_2.2 <- FTD_data[14:14, ]
case_2.3 <- FTD_data[15:15, ]

#remove control groups and put into own group
young_controls <- FTD_data [4:12, ]
old_controls <- FTD_data [16:23, ]

#example plot
ggplot(data = young_controls, aes(x = strange_stories_ToM_mean, y = analysis_group, fill = 
analysis_group)) + 
geom_point(data=case_1.1, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.1"), fill = "deeppink1", col = "deeppink1", pch = 21, size = 5) +
labs (color = "Young cases") +
geom_point(data=case_1.2, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.2"), fill = "indianred3", col = "indianred3", pch=24, size = 4) +
geom_point(data=case_1.3, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.3"), fill = "blueviolet", col = "blueviolet", pch=22, size = 5, 
position=position_jitter(h=0.09,w=0.0)) +
geom_violin(trim = FALSE,
          alpha = 0.2,
          draw_quantiles = c(0.25, 0.5, 0.75))+
theme_classic() + 
scale_fill_manual(values = c("gray90")) + 
guides(fill = "none")

不过，我遇到的一个恼人的问题是案例中的数据点重叠（如下图所示）。我尝试过“position=position_jitter(h=0.09,w=0.0)”，但这每次都会移动数据点，因为抖动会引入噪声。我需要一些一致且可重复的东西来定位重叠点，因为我将在论文中排列几个图。需要垂直堆叠。

具有重叠问题的绘图示例

我也尝试过：

`position_jitter(width = NULL, height = NULL，seed = NA)'

，但随后收到以下错误：

'Error in `check_subclass()`:
! `stat` must be either a string or a Stat object, not an S3 object with class 
PositionJitter/Position/ggproto/gg'

关于重叠问题有什么想法吗？另外，关于我如何设置数据框以及我是否以正确的方式或繁琐的方式进行设置的任何反馈！我发现这是最容易单独操作每个数据点的解决方案。

原文

first time posting on here. Apologies if I miss including something needed to solve my situation.

I have a matched case-control design where three 'younger' clinical cases have been age-matched to a 'younger' control group, and three 'older' cases have been matched to an 'older' control group. I am attempting to plot the control group distribution in a violin plot and overlay the corresponding matched cases as a data point (PhD supervisors recommending each case has a unique shape and colour for their data point, to assist following the cases throughout the series of violin plots).

My novice solution so far has been to create a data frame for each control group and then individual data frames for the cases. Create plots and add formatting details, e.g., shape, colour of data points.

My code to set up data frames and then example of a plot:

#remove the cases and put into a separate data frame
case_1.1 <- FTD_data[1:1, ]
case_1.2 <- FTD_data[2:2, ]
case_1.3 <- FTD_data[3:3, ]
case_2.1 <- FTD_data[13:13, ]
case_2.2 <- FTD_data[14:14, ]
case_2.3 <- FTD_data[15:15, ]

#remove control groups and put into own group
young_controls <- FTD_data [4:12, ]
old_controls <- FTD_data [16:23, ]

#example plot
ggplot(data = young_controls, aes(x = strange_stories_ToM_mean, y = analysis_group, fill = 
analysis_group)) + 
geom_point(data=case_1.1, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.1"), fill = "deeppink1", col = "deeppink1", pch = 21, size = 5) +
labs (color = "Young cases") +
geom_point(data=case_1.2, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.2"), fill = "indianred3", col = "indianred3", pch=24, size = 4) +
geom_point(data=case_1.3, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case 
1.3"), fill = "blueviolet", col = "blueviolet", pch=22, size = 5, 
position=position_jitter(h=0.09,w=0.0)) +
geom_violin(trim = FALSE,
          alpha = 0.2,
          draw_quantiles = c(0.25, 0.5, 0.75))+
theme_classic() + 
scale_fill_manual(values = c("gray90")) + 
guides(fill = "none")

One annoying issue I am having though, is where data points from cases overlap (as in plot below). I have tried "position=position_jitter(h=0.09,w=0.0)" but this is moving the data point around each time, as jitter introduces noise. I need something consistent and reproducible for positioning the overlapped points as I will be lining up several plots in a paper. Need to vertically stacked.

Example of plot with overlap issue

I have also tried:

`position_jitter(width = NULL, height = NULL, seed = NA)'

but then receive the following error:

'Error in `check_subclass()`:
! `stat` must be either a string or a Stat object, not an S3 object with class 
PositionJitter/Position/ggproto/gg'

Any ideas on the overlap issue? Also, any feedback on how I have set up the data frames and whether I have gone about it in the right way or a cumbersome way! It was the solution that I found easiest to manipulate each data point separately.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月牙弯弯 2025-01-18 04:06:56

简短回答：尝试 position_dodge()。

更长的答案：

是的，为每个观察创建单独的数据框并为每个观察手动设置美学有点麻烦！您通常希望将值保留在同一数据框中，然后只需告诉 ggplot 哪些维度是重要的以及将这些维度映射到什么美学即可。在个人观察很重要的情况下，您可以将美学映射到唯一的主题 ID。

也就是说，当您想要为不同的子集提供完全不同的几何图形（例如用于控件的小提琴和用于案例的点）时，使用单独的数据框会很有帮助，因此您的方向是正确的。

library(ggplot2)
set.seed(22)

# fake data
cases <- data.frame(
  id = factor(1:6),
  strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
  age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
  id = 7:23,
  strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
  age = c(rep("young", 9), rep("old", 8))
)

ggplot(data = controls, aes(strange_stories_ToM_mean, age)) + 
  geom_violin(
    trim = FALSE, 
    alpha = 0.2, 
    draw_quantiles = c(0.25, 0.5, 0.75), 
    fill = "gray90"
  ) +
  geom_point(
    data = cases, 
    aes(colour = id, shape = id),  # map color/shape to individual cases
    position = position_dodge(width = .2),    # spread cases apart to avoid overplotting
    size = 5,
    show.legend = FALSE
  ) +
  theme_classic()

PS - 如果您仍然想为每种情况指定特定的颜色或形状，可以使用 scale_color_manual() 和 scale_shape_manual()。

Short answer: try position_dodge().

Longer answer:

Yes, making separate dataframes for each observation and manually setting aesthetics for each is a bit cumbersome! You generally want to keep values in the same dataframe, then just tell ggplot what dimensions are important and what aesthetics to map these to. In cases where individual observations are important, you can map an aesthetic to a unique subject id.

That said, it can be helpful to use separate dataframes when you want completely different geoms for different subsets -- such as violins for controls and points for cases -- so you were on the right track there.

library(ggplot2)
set.seed(22)

# fake data
cases <- data.frame(
  id = factor(1:6),
  strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
  age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
  id = 7:23,
  strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
  age = c(rep("young", 9), rep("old", 8))
)

ggplot(data = controls, aes(strange_stories_ToM_mean, age)) + 
  geom_violin(
    trim = FALSE, 
    alpha = 0.2, 
    draw_quantiles = c(0.25, 0.5, 0.75), 
    fill = "gray90"
  ) +
  geom_point(
    data = cases, 
    aes(colour = id, shape = id),  # map color/shape to individual cases
    position = position_dodge(width = .2),    # spread cases apart to avoid overplotting
    size = 5,
    show.legend = FALSE
  ) +
  theme_classic()

PS - if you still want to specify particular colors or shapes for each case, you can use scale_color_manual() and scale_shape_manual().

回复收藏 0 原文

~没有更多了~