r绘制分组的BoxPlot突出显示了每个类别的特定值

发布于 2025-01-20 05:57:28 字数 1866 浏览 3 评论 0原文

我有以下代码，可以正常工作：

# Seeding the pseudo-random number generator for reproducible results
  set.seed(1234)
  # Create three varaible
  income <- round(rnorm(500,  # 500 random data point values
                        mean = 10000,  # mean of 100
                        sd = 1000),  # standard deviation of 1000
                  digits = 2)  # round the random values to two decimal points
  stage <- sample(c("Early",  
                    "Mid",
                    "Late"),  # sample space of the stage variable
                  500,  # 500 random data point values
                  replace = TRUE)  # replace values for reselection
  country <- sample(c("USA",
                      "Canada"),  # sample space of the country variabe
                    500,  # 500 random data point values
                    replace = TRUE)  # replace values for reselection
  # Create tibble
  df1 <- tibble(Income = income,  # create an Income variable for the income data point values
               Stage = stage,  # create a Stage variable for the stage data point values
               Country = country)  # create a Country variable for the country data point values
  
  df1 <- as.data.frame(df1)
  df1$HIGHLIGHT <- 'NO'
  df1$TMP = paste0(df1$Country,"_",df1$Stage)
  idx <- duplicated(df1$TMP)
  df1$HIGHLIGHT[!idx] = 'YES'
  
  
  plot_ly(df1,
          x = ~Country,
          y = ~Income,
          color = ~Stage,
          type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE))

但是，我想添加的是每个箱线图上的红点，显示“HIGHLIGHT”列给出的最新值，其中该列中的值为“YES”。这不仅有助于用户查看每个箱线图的分布，还有助于查看最新值的位置。我找不到添加这些红点的方法。有什么建议吗？谢谢

原文

I have the following code that works fine on my end:

# Seeding the pseudo-random number generator for reproducible results
  set.seed(1234)
  # Create three varaible
  income <- round(rnorm(500,  # 500 random data point values
                        mean = 10000,  # mean of 100
                        sd = 1000),  # standard deviation of 1000
                  digits = 2)  # round the random values to two decimal points
  stage <- sample(c("Early",  
                    "Mid",
                    "Late"),  # sample space of the stage variable
                  500,  # 500 random data point values
                  replace = TRUE)  # replace values for reselection
  country <- sample(c("USA",
                      "Canada"),  # sample space of the country variabe
                    500,  # 500 random data point values
                    replace = TRUE)  # replace values for reselection
  # Create tibble
  df1 <- tibble(Income = income,  # create an Income variable for the income data point values
               Stage = stage,  # create a Stage variable for the stage data point values
               Country = country)  # create a Country variable for the country data point values
  
  df1 <- as.data.frame(df1)
  df1$HIGHLIGHT <- 'NO'
  df1$TMP = paste0(df1$Country,"_",df1$Stage)
  idx <- duplicated(df1$TMP)
  df1$HIGHLIGHT[!idx] = 'YES'
  
  
  plot_ly(df1,
          x = ~Country,
          y = ~Income,
          color = ~Stage,
          type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE))

However, what I would like to add is a red dot over each single boxplot showing the most recent value given by column "HIGHLIGHT" where the value in this column is "YES".
This helps uses to see not only the distribution for each boxplot but also where the most recent value is positioned.
I can't find a way to add those red dots. Any suggestions?
Thank you

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

云之铃。 2025-01-27 05:57:28

我找不到我所说的一种简单或直观的方式，但是我确实找到了一种有效的方法。

我使用该域来对齐X轴上的点和Y轴的收入。因为plotly中的注释需要文本，所以我使用了一个星号。我确实从一个时期开始，但是要点出现，因为一个时期位于文本“空间”的底部。

让我知道这是否是您想要的。

# first find the values needed 
df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income)
# # A tibble: 6 × 3
# # Groups:   Country [2]
#   Country Stage Income
#   <chr>   <chr>  <dbl>
# 1 Canada  Early  7654.
# 2 Canada  Late   9002.
# 3 Canada  Mid    8793.
# 4 USA     Early 11084.
# 5 USA     Late   9110.
# 6 USA     Mid   10277.

然后提取图所需的值。也要在此处注意。这是地图中相同的顺序。

使用反复试验，知道加拿大在域中大约是x = 0的中心，并且在域中以x = 1为中心，我尝试了一些值，直到找到有效的值。

X上域中的BoxPlot中心为-.235，0，.235，.765，1和1.235。

接下来，我为注释创建了X和Y。

newY = df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income) %>% 
  ungroup() %>% 
  select(Income) %>% as.data.frame() %>% 
  unlist()

x = c(-.235, 0, .235, .765, 1, 1.235)

然后我把它们放在一起。在图块的代码中，大多数变量已大写，但它们不在数据中。我只是在数据中更改了它们。

(plt = plot_ly(df1,
               x = ~Country,
               y = ~Income,
               color = ~Stage,
               type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE),
           annotations = list(x = x,
                              y = newY,
                              text = "*",
                              hovertext = newY,
                              font = list(size = 20,
                                          color = "red"),
                              showarrow = F,
                              valign = "middle",
                              xanchor = "middle",
                              yanchor = "middle" ) 
    ) # end legend
) # end print

I couldn't find what I would call an easy or intuitive way of doing this, but I did find a way that works.

I used the domain to align the points on the x-axis and the income on the y-axis. Because annotations in plotly require text, I used an asterisk. I did start with a period, but the points appear off, because a period is at the bottom of text 'space.'

Let me know if this is what you were looking for.

# first find the values needed 
df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income)
# # A tibble: 6 × 3
# # Groups:   Country [2]
#   Country Stage Income
#   <chr>   <chr>  <dbl>
# 1 Canada  Early  7654.
# 2 Canada  Late   9002.
# 3 Canada  Mid    8793.
# 4 USA     Early 11084.
# 5 USA     Late   9110.
# 6 USA     Mid   10277.

Then extract the values needed for the plot. Note the order here, as well. This is the same order in the plot right now.

Using trial and error, knowing that Canada is centered about x = 0 in the domain and the US is centered at x = 1 in the domain, I tried a few values until I found ones that work.

The boxplot centers in the domain on x are -.235, 0, .235, .765, 1, and 1.235.

Next, I created the x and y for the annotation.

newY = df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income) %>% 
  ungroup() %>% 
  select(Income) %>% as.data.frame() %>% 
  unlist()

x = c(-.235, 0, .235, .765, 1, 1.235)

Then I put it all together. In your code for the plot, most of the variables are capitalized, but they aren't in the data. I just changed them in the data.

(plt = plot_ly(df1,
               x = ~Country,
               y = ~Income,
               color = ~Stage,
               type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE),
           annotations = list(x = x,
                              y = newY,
                              text = "*",
                              hovertext = newY,
                              font = list(size = 20,
                                          color = "red"),
                              showarrow = F,
                              valign = "middle",
                              xanchor = "middle",
                              yanchor = "middle" ) 
    ) # end legend
) # end print