r绘制分组的BoxPlot突出显示了每个类别的特定值
我有以下代码,可以正常工作:
# Seeding the pseudo-random number generator for reproducible results
set.seed(1234)
# Create three varaible
income <- round(rnorm(500, # 500 random data point values
mean = 10000, # mean of 100
sd = 1000), # standard deviation of 1000
digits = 2) # round the random values to two decimal points
stage <- sample(c("Early",
"Mid",
"Late"), # sample space of the stage variable
500, # 500 random data point values
replace = TRUE) # replace values for reselection
country <- sample(c("USA",
"Canada"), # sample space of the country variabe
500, # 500 random data point values
replace = TRUE) # replace values for reselection
# Create tibble
df1 <- tibble(Income = income, # create an Income variable for the income data point values
Stage = stage, # create a Stage variable for the stage data point values
Country = country) # create a Country variable for the country data point values
df1 <- as.data.frame(df1)
df1$HIGHLIGHT <- 'NO'
df1$TMP = paste0(df1$Country,"_",df1$Stage)
idx <- duplicated(df1$TMP)
df1$HIGHLIGHT[!idx] = 'YES'
plot_ly(df1,
x = ~Country,
y = ~Income,
color = ~Stage,
type = "box") %>%
layout(boxmode = "group",
title = "Income by career stage",
xaxis = list(title = "Country",
zeroline = FALSE),
yaxis = list(title = "Income",
zeroline = FALSE))
但是,我想添加的是每个箱线图上的红点,显示“HIGHLIGHT”列给出的最新值,其中该列中的值为“YES”。 这不仅有助于用户查看每个箱线图的分布,还有助于查看最新值的位置。 我找不到添加这些红点的方法。有什么建议吗? 谢谢
I have the following code that works fine on my end:
# Seeding the pseudo-random number generator for reproducible results
set.seed(1234)
# Create three varaible
income <- round(rnorm(500, # 500 random data point values
mean = 10000, # mean of 100
sd = 1000), # standard deviation of 1000
digits = 2) # round the random values to two decimal points
stage <- sample(c("Early",
"Mid",
"Late"), # sample space of the stage variable
500, # 500 random data point values
replace = TRUE) # replace values for reselection
country <- sample(c("USA",
"Canada"), # sample space of the country variabe
500, # 500 random data point values
replace = TRUE) # replace values for reselection
# Create tibble
df1 <- tibble(Income = income, # create an Income variable for the income data point values
Stage = stage, # create a Stage variable for the stage data point values
Country = country) # create a Country variable for the country data point values
df1 <- as.data.frame(df1)
df1$HIGHLIGHT <- 'NO'
df1$TMP = paste0(df1$Country,"_",df1$Stage)
idx <- duplicated(df1$TMP)
df1$HIGHLIGHT[!idx] = 'YES'
plot_ly(df1,
x = ~Country,
y = ~Income,
color = ~Stage,
type = "box") %>%
layout(boxmode = "group",
title = "Income by career stage",
xaxis = list(title = "Country",
zeroline = FALSE),
yaxis = list(title = "Income",
zeroline = FALSE))
However, what I would like to add is a red dot over each single boxplot showing the most recent value given by column "HIGHLIGHT" where the value in this column is "YES".
This helps uses to see not only the distribution for each boxplot but also where the most recent value is positioned.
I can't find a way to add those red dots. Any suggestions?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我找不到我所说的一种简单或直观的方式,但是我确实找到了一种有效的方法。
我使用该域来对齐X轴上的点和Y轴的收入。因为
plotly
中的注释需要文本,所以我使用了一个星号。我确实从一个时期开始,但是要点出现,因为一个时期位于文本“空间”的底部。让我知道这是否是您想要的。
然后提取图所需的值。也要在此处注意。这是地图中相同的顺序。
使用反复试验,知道加拿大在域中大约是x = 0的中心,并且在域中以x = 1为中心,我尝试了一些值,直到找到有效的值。
X上域中的BoxPlot中心为-.235,0,.235,.765,1和1.235。
接下来,我为注释创建了X和Y。
然后我把它们放在一起。在图块的代码中,大多数变量已大写,但它们不在数据中。我只是在数据中更改了它们。
I couldn't find what I would call an easy or intuitive way of doing this, but I did find a way that works.
I used the domain to align the points on the x-axis and the income on the y-axis. Because annotations in
plotly
require text, I used an asterisk. I did start with a period, but the points appear off, because a period is at the bottom of text 'space.'Let me know if this is what you were looking for.
Then extract the values needed for the plot. Note the order here, as well. This is the same order in the plot right now.
Using trial and error, knowing that Canada is centered about x = 0 in the domain and the US is centered at x = 1 in the domain, I tried a few values until I found ones that work.
The boxplot centers in the domain on x are -.235, 0, .235, .765, 1, and 1.235.
Next, I created the x and y for the annotation.
Then I put it all together. In your code for the plot, most of the variables are capitalized, but they aren't in the data. I just changed them in the data.