隔离“分支”在使用 networkd3 的桑基图中

发布于 2025-01-13 23:57:27 字数 1019 浏览 1 评论 0原文

我正在使用 networkD3 包中的 sankeyNetwork() 来可视化一些数据。我想知道是否有一种方法可以从头到尾“隔离”一个分支，忽略不相关的链接。

示例：我有这个：SankeyGot

我想要提取此： SankeyWant

可重现的示例：

set.seed(9)

df <- tibble(
  source = sample(stringr::words, 5) %>% rep(2),
  target = c(sample(words, 7), source[1:3]), 
  values = rnorm(10, 10, 7) %>% round(0) %>% abs)

nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
  )

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

我希望能够过滤掉例如“名称”，并获取上游和下游所有级别上与其连接的所有内容 - 我将如何去做呢？

原文

I am using sankeyNetwork() from the networkD3 package for visualizing some data. I was wondering if theres a way to "isolate" a branch from start to finish, ignoring the irrelevant links.

Example: I've got this: SankeyGot

And I want to extract this: SankeyWant

reproducible example:

set.seed(9)

df <- tibble(
  source = sample(stringr::words, 5) %>% rep(2),
  target = c(sample(words, 7), source[1:3]), 
  values = rnorm(10, 10, 7) %>% round(0) %>% abs)

nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
  )

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

I'd like to be able to filter out "name" for example and get everything that connects to that on all levels upstream and downstream - how would i go about doing this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ι不睡觉的鱼゛ 2025-01-20 23:57:27

计算图中节点的路径并非易事，但 igraph 包可以帮助使用 all_simple_paths()。但是，请注意帮助文件中的警告......

请注意，两个之间可能存在指数级多的路径
图的顶点，使用它时可能会耗尽内存
函数，如果你的图是格子状的。

（我不知道你的 words 向量是什么，所以我手动重新创建了 links data.frame）

library(dplyr)
library(networkD3)

set.seed(9)

df <- read.csv(header = TRUE, text = "
source,target
summer,obvious
summer,structure
however,either
however,match
obvious,about
obvious,non
either,contract
either,produce
contract,paint
contract,name
")
df$values <- rnorm(10, 10, 7) %>% round(0) %>% abs()


# use graph to calculate the paths from a node
library(igraph)

graph <- graph_from_data_frame(df)

start_node <- "name"

# get nodes along a uni-directional path going IN to the start_node
connected_nodes_in <- 
  all_simple_paths(graph, from = start_node, mode = "in") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# get nodes along a uni-directional path going OUT of the start_node
connected_nodes_out <- 
  all_simple_paths(graph, from = start_node, mode = "out") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# combine them
connected_nodes <- unique(c(connected_nodes_in, connected_nodes_out))

# filter your data frame so it only includes links/edges that start and
# end at connected nodes
df <- df %>% filter(source %in% connected_nodes & target %in% connected_nodes)



nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
)

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

Calculating the paths from a node in a graph is non-trivial, but the igraph package can help with the all_simple_paths(). However, heed that warning in the help file...

Note that potentially there are exponentially many paths between two
vertices of a graph, and you may run out of memory when using this
function, if your graph is lattice-like.

(I don't know what your words vector is, so I recreated the links data.frame manually)

library(dplyr)
library(networkD3)

set.seed(9)

df <- read.csv(header = TRUE, text = "
source,target
summer,obvious
summer,structure
however,either
however,match
obvious,about
obvious,non
either,contract
either,produce
contract,paint
contract,name
")
df$values <- rnorm(10, 10, 7) %>% round(0) %>% abs()


# use graph to calculate the paths from a node
library(igraph)

graph <- graph_from_data_frame(df)

start_node <- "name"

# get nodes along a uni-directional path going IN to the start_node
connected_nodes_in <- 
  all_simple_paths(graph, from = start_node, mode = "in") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# get nodes along a uni-directional path going OUT of the start_node
connected_nodes_out <- 
  all_simple_paths(graph, from = start_node, mode = "out") %>% 
  unlist() %>% 
  names() %>% 
  unique()

# combine them
connected_nodes <- unique(c(connected_nodes_in, connected_nodes_out))

# filter your data frame so it only includes links/edges that start and
# end at connected nodes
df <- df %>% filter(source %in% connected_nodes & target %in% connected_nodes)



nodes <- data.frame(names = unique(c(df$source, df$target)))

links <- tibble(
  source = match(
    df$source, nodes$names) -1,
  target = match(
    df$target, nodes$names) -1,
  value = df$values
)

sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "value", NodeID = "names",
              iterations = 64, sinksRight = F, fontSize = 14)

回复收藏 0 原文