图算法 - 寻求提高可扩展性
我编写了一个算法来计算和存储 DAG 的所有路径,并且它在小图上运行得很好 - 但现在我正在寻求提高它在较大图上运行的效率。该算法的核心逻辑在下面的 createSF() 和 makePathList() 中,其他方法是帮助器 - 我可以看到追加是一个瓶颈。然而,我想最大的帮助是设计一个可以在字典中存储路径的数据结构,因为许多路径是由其他路径组成的,这是我问题的关键。
private Multiset<String> paths = new Multiset<String>();
public Multiset<String> createSF(DAGNode n) {
for (DAGNode succ : n.getSuccessors())
createSF(succ);
if (!n.isVisited())
for (String s : makePathList(n))
paths.put(s);
n.setVisited(true);
return paths;
}
private List<String> makePathList(DAGNode n) {
List<String> list = new ArrayList<String>();
list.add(n.getLabel());
for (DAGNode node : n.getSuccessors())
list.addAll(append(n.getLabel(), makePathList(node)));
return list;
}
private List<String> append(String s, List<String> src) {
List<String> ls = new ArrayList<String>();
for (String str : src)
ls.add(s + "/" + str);
return ls;
}
编辑:
我现在使用路径对象来表示路径,并确定了这两种方法的瓶颈:
public List<Path> createPathList(Tree n) {
List<Path> list = new ArrayList<Path>();
list.add(new Path(n.getNodeName()));
for (Tree node : n.getSuccessors()) {
list.addAll(append(n.getNodeName(), createPathList(node)));
}
return list;
}
public List<Path> append(String s, List<Path> src) {
List<Path> ls = new ArrayList<Path>();
for (Path path : src) {
ls.add(new Path(path, s));
}
return ls;
}
问题是当图形大小为 M 时,这些方法将被调用 M 次,这意味着有很多列表在这里创建...是否有更有效的方法来构建 createPathList() 的返回?
I've written an algorithm which calculates and stores all paths of a DAG, and it works nicely on small graphs - but now i'm looking to improve it's efficiency to run over larger graphs. The core logic of the algorithm is in createSF() and makePathList() below, the other methods are helpers - I can see that append is a bottleneck. However, I guess the biggest help would be to devise a data structure that can store paths in a dictionary, since many of the paths are composed of other paths, this is the crux of my question.
private Multiset<String> paths = new Multiset<String>();
public Multiset<String> createSF(DAGNode n) {
for (DAGNode succ : n.getSuccessors())
createSF(succ);
if (!n.isVisited())
for (String s : makePathList(n))
paths.put(s);
n.setVisited(true);
return paths;
}
private List<String> makePathList(DAGNode n) {
List<String> list = new ArrayList<String>();
list.add(n.getLabel());
for (DAGNode node : n.getSuccessors())
list.addAll(append(n.getLabel(), makePathList(node)));
return list;
}
private List<String> append(String s, List<String> src) {
List<String> ls = new ArrayList<String>();
for (String str : src)
ls.add(s + "/" + str);
return ls;
}
EDIT:
I'm now using a path object to represent paths and have pin-pointed the bottle neck to these two methods:
public List<Path> createPathList(Tree n) {
List<Path> list = new ArrayList<Path>();
list.add(new Path(n.getNodeName()));
for (Tree node : n.getSuccessors()) {
list.addAll(append(n.getNodeName(), createPathList(node)));
}
return list;
}
public List<Path> append(String s, List<Path> src) {
List<Path> ls = new ArrayList<Path>();
for (Path path : src) {
ls.add(new Path(path, s));
}
return ls;
}
Trouble is when a graph is size M these methods will be called M times, this means there is a lot of lists being created here... is there a more efficient way to build up the return for createPathList()?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为了回答这个问题,有必要了解为什么需要路径列表。路径列表不会为您提供有关 DAG 表示形式的任何附加信息。
如果您想分别计算每个路径的内容,或者计算所有路径的总和/最小值/最大值之类的内容,也可以使用 DAG 本身来完成。
如果您确实坚持保存单独的路径,一种选择是将 DAG 转换为 Trie< 的变体/a>.另一种选择可能是使用 Lempel- 的某些变体Ziv 表示。这取决于您的 DAG 类型以及您希望如何处理路径信息。
In order to answer this question, it is necessary to understand why do you need the list of paths. The list of paths does not give you any additional information over what you have in the DAG representation.
If you want to calculate things for each path separately, or calculate something like sum/min/max over all paths, it too could be done using the DAG itself.
If you do insist on saving separate paths, one option would be to convert your DAG into a variant of a Trie. Another option could be to use some variant of the Lempel-Ziv representation. It depends on your DAG types, and what you expect to do with the paths information.
看一下 Graphviz 的 DOT 源代码,它可能会给你一些想法。
Take a look at the DOT source code from Graphviz, it might give you some ideas.
请允许我首先提出两条(希望有帮助的)评论:
我在理解您的代码时遇到了一些困难,因为某些方法名称误导了我。从名字上看,我期待的是别的东西。我可以建议一些重构吗:(
删除了在 Robert 编辑后变得过时的部分答案)
正如 Margus 所说,用 StringBuilder 附加链替换字符串连接并不会提高性能。无论如何,编译器可能会优化 String 串联到 StringBuilder 附加(我见过这样的字节代码)。
您可以尝试将 DAG 转换为树结构。引入一个不可见的根,将所有节点作为直接子节点。现在,为每个节点添加它的后继节点(子节点和/或同级节点)。现在叶子的数量应该等于路径的数量,并且从根到任何叶子的每个图都是 DAG 中的一条路径。
编辑
一个小改进 - 它是微优化,但至少会留下更少的垃圾:
Please allow me to put two (hopefully helpful) comments first:
I had some difficulties understanding your code, because some of the method names misled me. From looking at the names, I expected something else. May I suggest some refactoring:
(removed part of the answer that became obsolete after Robert's edit)
As Margus said, replacing the String concatenation with a StringBuilder append chain doesn't increase your performance. Compilers may optimize String concatenations to StringBuilder appends anyway (I've seen such byte code).
You could try to convert the DAG into a tree structure. Introduce an invisible root with all nodes as direct children. Now for each node add it's successors (child and/or sibling). The number of leaves now should be equal to the number of path and every graph from the root to any leaf is one path in the DAG.
Edit
A small improvement - it's micro-optimization but at least it will leave less garbage:
一个简单的修改(取决于您如何使用数据)可能是延迟加载路径,这样如果您倾向于只使用几个路径,您甚至永远不会生成一些路径。
当然,这完全取决于预期用途
A simple modification (depending on how you use the data) might be to load the paths lazily, that way if you tend to only use a few paths you'll never even generate some paths.
Of course, this depends entirely on expected use