执行缓慢并且耗尽堆空间(即使 vm args 设置为 2g)
我正在编写一个函数,该函数将树中的所有路径生成为 xpath 语句并将它们存储在下面的包中是一个天真的(抱歉,这很长),下面是我对其进行优化的尝试:
/**
* Create the structural fingerprint of a tree. Defined as the multiset of
* all paths and their multiplicities
*/
protected Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
/*
* difference between unordered and ordered trees is that the
* next-sibling axis must also be used
*
* this means that each node's children are liable to be generated more
* than once and so are memo-ised and reused
*/
Multiset<String> res = new Multiset<String>();
// so, we return a set containing:
// 1. the node name itself, prepended by root symbol
res.add("/" + t.getNodeName());
List<AbstractTree<String>> children = t.getChildren();
// all of the childrens' sets prepended by this one
if (children != null) {
for (AbstractTree<String> child : children) {
Multiset<String> sub = createSF(child, children);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
// 2. all of the following siblings' sets, prepended by this one
if (allSiblings != null) {
// node is neither original root nor leaf
// first, find current node
int currentNodePos = 0;
int ptrPos = 0;
for (AbstractTree<String> node : allSiblings) {
if (node == t) {
currentNodePos = ptrPos;
}
ptrPos++;
}
// 3. then add all paths deriving from (all) following siblings
for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
AbstractTree<String> sibling = allSiblings.get(i);
Multiset<String> sub = createSF(sibling, allSiblings);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
return res;
}
现在的优化是(目前)在子类中:
private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();
public Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
Multiset<String> lookup = lookupTable.get(t);
if (lookup != null) {
return lookup;
} else {
Multiset<String> res = super.createSF(t, allSiblings);
lookupTable.put(t, res);
return res;
}
}
我的问题是优化版本耗尽了堆空间(vm 参数设置为 -Xms2g -Xmx2g),并且在中等大小的输入上速度非常慢。 有人能找到改进的方法吗?
I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:
/**
* Create the structural fingerprint of a tree. Defined as the multiset of
* all paths and their multiplicities
*/
protected Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
/*
* difference between unordered and ordered trees is that the
* next-sibling axis must also be used
*
* this means that each node's children are liable to be generated more
* than once and so are memo-ised and reused
*/
Multiset<String> res = new Multiset<String>();
// so, we return a set containing:
// 1. the node name itself, prepended by root symbol
res.add("/" + t.getNodeName());
List<AbstractTree<String>> children = t.getChildren();
// all of the childrens' sets prepended by this one
if (children != null) {
for (AbstractTree<String> child : children) {
Multiset<String> sub = createSF(child, children);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
// 2. all of the following siblings' sets, prepended by this one
if (allSiblings != null) {
// node is neither original root nor leaf
// first, find current node
int currentNodePos = 0;
int ptrPos = 0;
for (AbstractTree<String> node : allSiblings) {
if (node == t) {
currentNodePos = ptrPos;
}
ptrPos++;
}
// 3. then add all paths deriving from (all) following siblings
for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
AbstractTree<String> sibling = allSiblings.get(i);
Multiset<String> sub = createSF(sibling, allSiblings);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
return res;
}
And now the optimization which is (currently) in a subclass:
private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();
public Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
Multiset<String> lookup = lookupTable.get(t);
if (lookup != null) {
return lookup;
} else {
Multiset<String> res = super.createSF(t, allSiblings);
lookupTable.put(t, res);
return res;
}
}
My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
通过分析器运行代码。 这是获得有关代码的真实情况的唯一方法。 其他一切都只是猜测。
Run the code through a profiler. That's the only way to get real facts about the code. Everything else is just guesswork.
“将树中的所有路径生成为 xpath 语句”
您要创建多少条路径? 这可能是不平凡的。 路径的数量应该是 O( n log n ),但是算法可能会更糟,具体取决于它们对子级使用的表示形式家长。
您应该分析简单的路径枚举,而不必担心包的存储。
"generates all paths in a tree as xpath statements"
How many paths are you creating? This can be non-trivial. The number of paths should be O( n log n ), but the algorithm could be much worse depending on what representation they use for children of a parent.
You should profile the simple enumeration of paths without worrying about the bag storage.
您的代码会呈指数级消耗 RAM。 因此,多一层意味着
children.size()
倍的 RAM。尝试使用生成器而不是具体化结果:实现一个 Multiset,它不会事先计算结果,而是在您在集合的迭代器上调用
next()
时迭代树结构。Your code eats RAM exponentially. So one layer more means
children.size()
times more RAM.Try to use a generator instead of materializing the results: Implement a Multiset which does not calculate the results beforehand but iterates through the tree structure as you call
next()
on the set's iterator.