R：如何在Ggplotly中自定义Sankey图？

发布于 2025-02-13 02:24:01 字数 1700 浏览 1 评论 0原文

我有按年份和模型的销售数据，可以通过

首先，我需要将模型 b 始终设置在Chaty的底部，无论其多年来的价值如何。
当我通过 ggplotly 悬停不显示销售或多年

代码时：

df <- data.frame (model  = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
 Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
                  sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))

install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)

plot <- ggplot(df, aes(x = Year,
               node = model,
               fill = model,
               value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) 

  
  
ggplotly(plot)

原文

I have data of sales by year and model, which is visualized via Sankey chart. Now I am struggling to handle 2 issue:

Firstly I need to set model B always on the bottom of chaty regardless its value over the years.
When I re-visualize ggplot via ggplotly hover does not show sales or years

code:

df <- data.frame (model  = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
 Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
                  sales = c(450,678,456,344,984,456,234,244,655,789,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,6666,7777,8888,1233,1255,5677,3411,2344,6122,4533))

install.packages("remotes")
#remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)
library(tidyverse)

plot <- ggplot(df, aes(x = Year,
               node = model,
               fill = model,
               value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) 

  
  
ggplotly(plot)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

才能让你更想念 2025-02-20 02:24:02

我绝对可以肯定有一种更好的方法，但是我花了一段时间才能使它起作用。我认为这就是您想要的。

我从ggplot和ggplotly对象开始。此初始图的主要目的是捕获颜色。我本可以捕获几种不同的方式，但这已经在您的情节中为我完成

。

library(ggsankey)
library(tidyverse)
library(plotly)

# df from the question is unchanged

# visualize the original
(plot <- ggplot(df, 
              aes(Year, node = model, fill = model, value = sales)) + 
    geom_sankey_bump(space = 0, type = "alluvial", 
                     color = "transparent", smooth = 15) +
    scale_fill_viridis_d(option = "A", alpha = .8) +
    theme_sankey_bump(base_size = 16)) 

ggplotly(plot) -> plp 
plp

#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
  nm <- plp$x$data[[k]]$name
  filler <- plp$x$data[[k]]$fillcolor
  c(nm = nm, filler = filler)
})

（总是最小的颠簸。这使我能够为所有其他模型收集堆叠的值，这是将B推到底部所需的。

#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>% 
  arrange(Year, sales)

df2 <- df %>% filter(model == "B") %>% # this gets used further down
  arrange(Year)

# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
  map(1:10,
      function(i) {
        ng[i] <<- rep("B", i) %>% paste0(collapse = "")
      })
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
                  model = rep(ng, length(unique(df$Year))),
                  sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)

使用B型的10个小节重新创建Sankey Bump。随后的所有内容都可以使用。

#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
   geom_sankey_bump(space = 0, type = "alluvial", 
                    color = "transparent", smooth = 15) +
   scale_fill_viridis_d(option = "A", alpha = .8) +
   theme_sankey_bump(base_size = 16))

ggplotly(nplt) -> plt
plt

创建一个用B Just B的Sankey Bump，以捕获代表底部模型B的数据。使用此数据替换代表对象plt中代表b的轨迹的全部。颜色也可以在这里修复。（第一个图中的原始10种颜色。）最后，HoverInfo被删除。接下来将修复。

#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>% 
  ggplot(aes(x = Year,
             node = model,
             fill = model,
             value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly

#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace

#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
  map(1:length(plt$x$data),
      function(j) {
        nm <- plt$x$data[[j]]$name
        plt$x$data[[j]]$hoverinfo <<- "none"
        plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"], 
                                             use.names = F)
        if(str_detect(nm, "^B$")){
          plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
          plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
        }
        if(str_detect(nm, "BB")) {
          wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
        }
      })
)

#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)]         # <------ forget this line when updated last time

# visualize Sankey bump with B at the bottom
plt

绘图对象基本上是10个颜色，在后台的几年之间没有分离。因此，如果您对此添加了一个工具提示，则只有一个...

要获取您要寻找的工具提示，我创建了另一个跟踪（嗯，实际上是10个，每种型号为1）。为了获得正确的值（因为销售数据不在50K范围内），我使用plt中的数据来创建一个新的数据框架。

#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]

tellMe <- invisible(
  map(1:length(plt$x$data),
      function(m) {
        y <- plt$x$data[[m]]$y
        y[inds]
      }) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10] 
    as.data.frame() %>% 
    mutate(yr = yrs %>% as.integer()) %>% 
    pivot_longer(names_to = "model", values_to = "sales", 
                 cols = sort(unique(df$model))) %>% 
    distinct() %>% 
    group_by(yr, model) %>% 
    summarise(val = mean(sales)) %>% 
    left_join(df, by = c("yr" = "Year", "model" = "model")) %>% 
    as.data.frame() # drop groups
)

#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model, 
        customdata = ~sales, text = ~model,
        line = list(width = .01, shape = "spline", smoothing = 1.3),
        hovertemplate = "Year: %{x}<br>Model: %{text}<br>Sales: %{customdata}<extra></extra>",
        type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2

如果您在这里查看图，它看起来很空白。那是因为线条有多小。这是故意的。您不需要图表上的线条。

修复颜色，以便Hoverlabel背景颜色与传奇颜色匹配。

# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
  map(1:10,
      function(z) {
        nm <- pp2$x$data[[z]]$name
        # collect and assign the color
        cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
        pp2$x$data[[z]]$line$color <<- cr
      })
)

使用子图此处无效。当我尝试添加一个跟踪时，Plotly给了我一个错误，无论是每种型号一次甚至是一个。所以我将痕迹逼在一起。

#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top

# replace data 
plt$x$data <- yx

# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)

最终产品：

I'm absolutely certain that there is a better way, but it took me a while to get it working. I think this is what you were looking for.

I started with the ggplot and ggplotly objects that you have here. The primary purpose of this initial plot is to capture the colors. (I could have captured them a few different ways, but this was already done for me in your plot.)

Update ** I've modified the two elements you requested

library(ggsankey)
library(tidyverse)
library(plotly)

# df from the question is unchanged

# visualize the original
(plot <- ggplot(df, 
              aes(Year, node = model, fill = model, value = sales)) + 
    geom_sankey_bump(space = 0, type = "alluvial", 
                     color = "transparent", smooth = 15) +
    scale_fill_viridis_d(option = "A", alpha = .8) +
    theme_sankey_bump(base_size = 16)) 

ggplotly(plot) -> plp 
plp

#-------- colors --------
# collect the 10 colors
cols <- map_dfr(1:10, function(k){
  nm <- plp$x$data[[k]]$name
  filler <- plp$x$data[[k]]$fillcolor
  c(nm = nm, filler = filler)
})

Then I divided the contents of the B model into 10 groups to ensure it was always the smallest bump. This allowed me to collect the stacked values for all of the other models, which is needed to push B to the bottom.

#-------------- splitting B -------------
df1 <- df %>% filter(model != "B") %>% 
  arrange(Year, sales)

df2 <- df %>% filter(model == "B") %>% # this gets used further down
  arrange(Year)

# split B into 10 groups - keep on the bottom, then join the groups
# make the groups
ng <- vector(length = 10)
invisible(
  map(1:10,
      function(i) {
        ng[i] <<- rep("B", i) %>% paste0(collapse = "")
      })
)
# add values for these groups by year
df4 <- data.frame(Year = rep(unique(df$Year), each = 10),
                  model = rep(ng, length(unique(df$Year))),
                  sales = rep(df2$sales/10, each = 10))
df5 <- rbind(df1, df4)

Recreate the Sankey bump with 10 subsections of model B. Everything that follows works with this plot.

#-------------- plotly after dividing B -------------
(nplt <- ggplot(df5, aes(x = Year, node = model, fill = model, value = sales)) +
   geom_sankey_bump(space = 0, type = "alluvial", 
                    color = "transparent", smooth = 15) +
   scale_fill_viridis_d(option = "A", alpha = .8) +
   theme_sankey_bump(base_size = 16))

ggplotly(nplt) -> plt
plt

Create a Sankey bump with JUST B, to capture data that represents model B at the bottom. Use this data to substitute all of the traces that represent B in the object plt. The colors get fixed here, as well. (The original 10 colors from the first plot.) Lastly, the hoverinfo gets removed. That will get fixed next.

#-------------- get values for B at the bottom -------------
df %>% filter(model == "B") %>% 
  ggplot(aes(x = Year,
             node = model,
             fill = model,
             value = sales)) +
  geom_sankey_bump(space = 0, type = "alluvial", color = "transparent", smooth = 15) +
  scale_fill_viridis_d(option = "A", alpha = .8) +
  theme_sankey_bump(base_size = 16) -> bplt
ggplotly(bplt) -> bplotly
bplotly

#------- take divided B and remove all but one trace for B --------
# xx <- plt$x$data
# plt$x$data <- xx[c(1:2, 12:length(xx))] # keep only one B trace

#---------------- adjustments to plt's build --------------------
# change out data for the B trace, add the right colors
wh <- vector(length = 0)
invisible(
  map(1:length(plt$x$data),
      function(j) {
        nm <- plt$x$data[[j]]$name
        plt$x$data[[j]]$hoverinfo <<- "none"
        plt$x$data[[j]]$fillcolor <<- unlist(cols[cols$nm == nm, "filler"], 
                                             use.names = F)
        if(str_detect(nm, "^Bquot;)){
          plt$x$data[[j]]$x <<- bplotly$x$data[[1]]$x
          plt$x$data[[j]]$y <<- bplotly$x$data[[1]]$y
        }
        if(str_detect(nm, "BB")) {
          wh[length(wh) + 1] <<- j # list of unnecessary traces (extra B groups)
        }
      })
)

#----- take divided B and remove all but one trace for B ------
plt$x$data <- plt$x$data[-c(wh)]         # <------ forget this line when updated last time

# visualize Sankey bump with B at the bottom
plt

The Plotly object is basically 10 globs of color, there is no separation between years in the background. So if you add a tooltip to this as it is, there can be only one...

To get the tooltips you're looking for, I created another trace (well, 10, actually—1 for each model). In order to get the right values (because the sales data isn't in the 50K range), I used the data in plt to create a new data frame.

#--------------- collect values for hovertext positions ----------
x <- plt$x$data[[1]]$x
inds <- which(x %in% 2015:2020, arr.ind = T)
yrs <- x[inds]

tellMe <- invisible(
  map(1:length(plt$x$data),
      function(m) {
        y <- plt$x$data[[m]]$y
        y[inds]
      }) %>% setNames(sort(unique(df$model))) %>% # changed from LETTERS[1:10] 
    as.data.frame() %>% 
    mutate(yr = yrs %>% as.integer()) %>% 
    pivot_longer(names_to = "model", values_to = "sales", 
                 cols = sort(unique(df$model))) %>% 
    distinct() %>% 
    group_by(yr, model) %>% 
    summarise(val = mean(sales)) %>% 
    left_join(df, by = c("yr" = "Year", "model" = "model")) %>% 
    as.data.frame() # drop groups
)

#-------------- create data trace for hovertext --------------
plot_ly(tellMe, x = ~yr, y = ~val, split = ~model, 
        customdata = ~sales, text = ~model,
        line = list(width = .01, shape = "spline", smoothing = 1.3),
        hovertemplate = "Year: %{x}<br>Model: %{text}<br>Sales: %{customdata}<extra></extra>",
        type = "scatter", mode = "lines", showlegend = F) -> pp2
pp2

If you look at the plot here, it looks blank. That's because of how small the lines are. This is intentional. You don't want lines on your graph.

Fix the colors, so that the hoverlabel background colors match the legend colors.

# change colors to match sankey
pp2 <- plotly_build(pp2)
invisible(
  map(1:10,
      function(z) {
        nm <- pp2$x$data[[z]]$name
        # collect and assign the color
        cr <- unlist(cols[cols$nm == nm, "filler"], use.names = F)
        pp2$x$data[[z]]$line$color <<- cr
      })
)

Using subplot here didn't work. Plotly gave me an error when I tried adding a trace, whether all at once or even one for each model. So I forced the traces together.

#-------------- consolidate the traces (subplot won't work) -----------
# collect data one more time!
dx <- plt$x$data
yx <- pp2$x$data
yx <- append(yx, dx) # put plt on top

# replace data 
plt$x$data <- yx

# lines are small, increase the distance searched for matches
plt %>% layout(hoverdistance = 40)

The final product: