执行缓慢并且耗尽堆空间（即使 vm args 设置为 2g）

发布于 2024-07-30 11:51:22 字数 3135 浏览 2 评论 0原文

我正在编写一个函数，该函数将树中的所有路径生成为 xpath 语句并将它们存储在下面的包中是一个天真的（抱歉，这很长），下面是我对其进行优化的尝试：

/**
 * Create the structural fingerprint of a tree. Defined as the multiset of
 * all paths and their multiplicities
 */
protected Multiset<String> createSF(AbstractTree<String> t,
        List<AbstractTree<String>> allSiblings) {
    /*
     * difference between unordered and ordered trees is that the
     * next-sibling axis must also be used
     * 
     * this means that each node's children are liable to be generated more
     * than once and so are memo-ised and reused
     */

    Multiset<String> res = new Multiset<String>();

     // so, we return a set containing:
     // 1. the node name itself, prepended by root symbol

    res.add("/" + t.getNodeName());
    List<AbstractTree<String>> children = t.getChildren();

    // all of the childrens' sets prepended by this one

    if (children != null) {

        for (AbstractTree<String> child : children) {

            Multiset<String> sub = createSF(child, children);

            for (String nextOne : sub) {
                if (nextOne.indexOf("//") == 0) {
                    res.add(nextOne);
                } else {
                    res.add("/" + nextOne);
                    res.add("/" + t.getNodeName() + nextOne);
                }
            }
        }
    }

    // 2. all of the following siblings' sets, prepended by this one

    if (allSiblings != null) {

         // node is neither original root nor leaf 
         // first, find current node

        int currentNodePos = 0;
        int ptrPos = 0;

        for (AbstractTree<String> node : allSiblings) {
            if (node == t) {
                currentNodePos = ptrPos;
            }
            ptrPos++;
        }

         // 3. then add all paths deriving from (all) following siblings 

        for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
            AbstractTree<String> sibling = allSiblings.get(i);

            Multiset<String> sub = createSF(sibling, allSiblings);

            for (String nextOne : sub) {
                if (nextOne.indexOf("//") == 0) {
                    res.add(nextOne);
                } else {
                    res.add("/" + nextOne);
                    res.add("/" + t.getNodeName() + nextOne);
                }
            }
        }
    }
    return res;
}

现在的优化是（目前）在子类中：

private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();

public Multiset<String> createSF(AbstractTree<String> t,
        List<AbstractTree<String>> allSiblings) {

    Multiset<String> lookup = lookupTable.get(t);
    if (lookup != null) {
        return lookup;
    } else {

        Multiset<String> res = super.createSF(t, allSiblings);

        lookupTable.put(t, res);
        return res;
    }
}

我的问题是优化版本耗尽了堆空间（vm 参数设置为 -Xms2g -Xmx2g），并且在中等大小的输入上速度非常慢。有人能找到改进的方法吗？

原文

I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:

/**
 * Create the structural fingerprint of a tree. Defined as the multiset of
 * all paths and their multiplicities
 */
protected Multiset<String> createSF(AbstractTree<String> t,
        List<AbstractTree<String>> allSiblings) {
    /*
     * difference between unordered and ordered trees is that the
     * next-sibling axis must also be used
     * 
     * this means that each node's children are liable to be generated more
     * than once and so are memo-ised and reused
     */

    Multiset<String> res = new Multiset<String>();

     // so, we return a set containing:
     // 1. the node name itself, prepended by root symbol

    res.add("/" + t.getNodeName());
    List<AbstractTree<String>> children = t.getChildren();

    // all of the childrens' sets prepended by this one

    if (children != null) {

        for (AbstractTree<String> child : children) {

            Multiset<String> sub = createSF(child, children);

            for (String nextOne : sub) {
                if (nextOne.indexOf("//") == 0) {
                    res.add(nextOne);
                } else {
                    res.add("/" + nextOne);
                    res.add("/" + t.getNodeName() + nextOne);
                }
            }
        }
    }

    // 2. all of the following siblings' sets, prepended by this one

    if (allSiblings != null) {

         // node is neither original root nor leaf 
         // first, find current node

        int currentNodePos = 0;
        int ptrPos = 0;

        for (AbstractTree<String> node : allSiblings) {
            if (node == t) {
                currentNodePos = ptrPos;
            }
            ptrPos++;
        }

         // 3. then add all paths deriving from (all) following siblings 

        for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
            AbstractTree<String> sibling = allSiblings.get(i);

            Multiset<String> sub = createSF(sibling, allSiblings);

            for (String nextOne : sub) {
                if (nextOne.indexOf("//") == 0) {
                    res.add(nextOne);
                } else {
                    res.add("/" + nextOne);
                    res.add("/" + t.getNodeName() + nextOne);
                }
            }
        }
    }
    return res;
}

And now the optimization which is (currently) in a subclass:

private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();

public Multiset<String> createSF(AbstractTree<String> t,
        List<AbstractTree<String>> allSiblings) {

    Multiset<String> lookup = lookupTable.get(t);
    if (lookup != null) {
        return lookup;
    } else {

        Multiset<String> res = super.createSF(t, allSiblings);

        lookupTable.put(t, res);
        return res;
    }
}

My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?

分享到QQ

分享到微博