在 Java(1.5 或更高版本)中,从 Set 中获取(任何)元素的最佳执行方法是什么?

发布于 2024-10-06 12:36:56 字数 1517 浏览 2 评论 0原文

在下面的代码中,我需要从 toSearch 获取一个元素,任何元素。我无法在 Set 接口定义上找到有用的方法来仅返回集合的单个(随机,但不要求是随机)成员。因此,我使用了 toArray()[0] 技术(如下面的代码所示)。

private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart)
{
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();

    Set<Coordinate> toSearch = new LinkedHashSet<Coordinate>();
    toSearch.add(coordinateStart);
    while (toSearch.size() > 0)
    {
        Coordinate coordinate = (Coordinate)toSearch.toArray()[0];
        result.add(coordinate);
        toSearch.remove(coordinate);
        for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
        {
            if (this.query.getCoordinateValue(coordinateAdjacent) == value)
            {
                if (!result.contains(coordinateAdjacent))
                {
                    toSearch.add(coordinateAdjacent);
                }
            }
        }
    }

    return result;
}

我见过讨论的另一种技术是用“toSearch.iterator().next()”替换“(Cooperative)toSearch.toArray()[0]”。 toArray() 或 iterator() 哪种技术执行速度最快且 GC(垃圾收集)影响最小?

我的直觉(在提出这个问题之后)是使用迭代器的第二种技术执行速度更快,并且 GC 的开销更低。鉴于我不知道所传递的 Set 的实现(假设最有可能是 HashSet 或 LinkedHashSet),每个 toArray() 或 iterator() 方法会产生多少开销?对此的任何见解将不胜感激。

问题(从上面重复):

  1. toArray() 或 iterator() 哪种技术最有可能执行得最快且对 GC(垃圾收集)影响最小?
  2. 鉴于我不知道所传递的 Set 的实现(假设最有可能是 HashSet 或 LinkedHashSet),每个 toArray() 和 iterator() 方法会产生多少开销?

In the code below, I needed to fetch an element, any element, from toSearch. I was unable to find a useful method on the Set interface definition to return just a single (random, but not required to be random) member of the set. So, I used the toArray()[0] technique (present in the code below).

private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart)
{
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();

    Set<Coordinate> toSearch = new LinkedHashSet<Coordinate>();
    toSearch.add(coordinateStart);
    while (toSearch.size() > 0)
    {
        Coordinate coordinate = (Coordinate)toSearch.toArray()[0];
        result.add(coordinate);
        toSearch.remove(coordinate);
        for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
        {
            if (this.query.getCoordinateValue(coordinateAdjacent) == value)
            {
                if (!result.contains(coordinateAdjacent))
                {
                    toSearch.add(coordinateAdjacent);
                }
            }
        }
    }

    return result;
}

The other technique I have seen discussed is to replace "(Coordinate)toSearch.toArray()[0]" with "toSearch.iterator().next()". Which technique, toArray() or iterator(), is the most likely to execute the most quickly with the least GC (Garbage Collection) impact?

My intuition (after composing this question) is that the second technique using the Iterator will be both faster in execution and lower overhead for the GC. Given I don't know the implementation of the Set being passed (assuming HashSet or LinkedHashSet as most likely), how much overhead is incurred in each of the toArray() or iterator() methods? Any insights on this would be greatly appreciated.

Questions (repeated from above):

  1. Which technique, toArray() or iterator(), is the most likely to execute the most quickly with the least GC (Garbage Collection) impact?
  2. Given I don't know the implementation of the Set being passed (assuming HashSet or LinkedHashSet as most likely), how much overhead is incurred in each of the toArray() and iterator() methods?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

未央 2024-10-13 12:36:56

toSearch.iterator().next() 速度更快,内存占用更少,因为它不需要复制任何数据,而 toArray 将分配并复制集合到数组中。这与实际实现无关:toArray总是必须复制数据。

toSearch.iterator().next() will be faster and less memory-intensive because it does not need to copy any data, whereas toArray will allocate and copy the contents of the set into the array. This is irrespective of the actual implementation: toArray will always have to copy data.

从﹋此江山别 2024-10-13 12:36:56

据我所知,您正在做 广度优先搜索

下面是示例不使用 toArray 实现:

    private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart) {
    final Set<Coordinate> visitedCoordinates = new LinkedHashSet<Coordinate>();
    final Deque<Coordinate> deque = new ArrayDeque<Coordinate>();

    deque.push(coordinateStart);

    while (!deque.isEmpty()) {
        final Coordinate currentVertex = deque.poll();
        visitedCoordinates.add(currentVertex);
        for (Coordinate coordinateAdjacent : getAdjacentCoordinates(currentVertex)) {
            if (this.query.getCoordinateValue(coordinateAdjacent) == value) {
                if (!visitedCoordinates.contains(coordinateAdjacent)) {
                    deque.add(coordinateAdjacent);
                }
            }
        }
    }

    return visitedCoordinates;
}

实现说明:

现在我担心 LinkedList 上的 contains() 方法的实现可能会在返回答案之前对内容进行完整扫描。

您对全扫描(又名线性搜索)的看法是正确的。尽管如此,在您的情况下,可以有额外的设置来跟踪已经访问过的顶点(顺便说一句,实际上这是您的结果!),这将在 O(1) 时间内解决 contains 方法的问题。

干杯

From what I can see you are doing Breadth First Search

Below is the example how it could be implemented without using toArray:

    private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart) {
    final Set<Coordinate> visitedCoordinates = new LinkedHashSet<Coordinate>();
    final Deque<Coordinate> deque = new ArrayDeque<Coordinate>();

    deque.push(coordinateStart);

    while (!deque.isEmpty()) {
        final Coordinate currentVertex = deque.poll();
        visitedCoordinates.add(currentVertex);
        for (Coordinate coordinateAdjacent : getAdjacentCoordinates(currentVertex)) {
            if (this.query.getCoordinateValue(coordinateAdjacent) == value) {
                if (!visitedCoordinates.contains(coordinateAdjacent)) {
                    deque.add(coordinateAdjacent);
                }
            }
        }
    }

    return visitedCoordinates;
}

Implementation notes:

And now I am concerned that the contains() method's implementation on LinkedList could be doing up to a a full scan of the contents before returning the answer.

You are right about full scan (aka linear search). Nevertheless, In your case it's possible to have additional set for tracking already visited vertexes(btw, actually it's your result!), that would solve issue with contains method in O(1) time.

Cheers

辞取 2024-10-13 12:36:56

以下是我的实现方法:

private Set<Coordinate> floodFill(Value value, Coordinate start) {
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();
    LinkedList<Coordinate> toSearch = new LinkedList<Coordinate>();
    toSearch.add(start);
    do {
        Coordinate coordinate = toSearch.removeFirst();
        if (result.add(coordinate)) {
            for (Coordinate ajacent: getAdjacentCoordinates(coordinate)) {
                if (this.query.getCoordinateValue(adjacent) == value) {
                    toSearch.add(adjacent);
                }
            }
        }
    } while (!toSearch.isEmpty());
    return result;
}

注意:

  1. 如果您考虑一下,toSearch 数据结构不需要包含唯一元素。
  2. 使用 LinkedList 进行 toSearch 意味着有一种简单的方法可以一次性获取元素并删除它。
  3. 我们可以利用 Set.add(...) 返回一个 boolean 来获取 result 集合中的查找次数。与使用 Set.contains() 相比。
  4. 最好使用 HashSet 而不是 LinkedHashSet 来获取结果......除非您需要知道填充添加坐标的顺序。
  5. 使用 == 来比较 Value 实例可能有点狡猾。

Here's how I'd implement this:

private Set<Coordinate> floodFill(Value value, Coordinate start) {
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();
    LinkedList<Coordinate> toSearch = new LinkedList<Coordinate>();
    toSearch.add(start);
    do {
        Coordinate coordinate = toSearch.removeFirst();
        if (result.add(coordinate)) {
            for (Coordinate ajacent: getAdjacentCoordinates(coordinate)) {
                if (this.query.getCoordinateValue(adjacent) == value) {
                    toSearch.add(adjacent);
                }
            }
        }
    } while (!toSearch.isEmpty());
    return result;
}

Notes:

  1. If you think about it, the toSearch data structure doesn't need to contain unique elements.
  2. Using a LinkedList for toSearch means that there is a simple method to get an element and remove it in one go.
  3. We can use the fact that Set.add(...) returns a boolean to have the number of lookups in the result set ... compared with using Set.contains().
  4. It would be better to use HashSet rather than LinkedHashSet for the results ... unless you need to know the order in which coordinates were added by the fill.
  5. Using == to compare Value instances is potentially a bit dodgy.
紅太極 2024-10-13 12:36:56

听完Petro的回复,我复制了这个方法,按照他的建议重新实现了。它看起来像这样:

private Set<Coordinate> floodFind2(Value value, Coordinate coordinateStart)
{
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();

    Queue<Coordinate> toSearch = new LinkedList<Coordinate>();
    toSearch.add(coordinateStart);
    while (!toSearch.isEmpty())
    {
        Coordinate coordinate = toSearch.remove();
        result.add(coordinate);
        for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
        {
            if (getCoordinateValue(coordinateAdjacent).equals(value))
            {
                if (!result.contains(coordinateAdjacent))
                {
                    if (!toSearch.contains(coordinateAdjacent))
                    {
                        toSearch.add(coordinateAdjacent);
                    }
                }
            }
        }
    }

    return result;
}

通过从“集合”转移到“队列”,我的效率问题转移到了我必须添加的新条件检查“if (!toSearch.contains(coordinateAdjacent))”。使用 Set 接口,它默默地阻止我添加重复项。使用队列接口,我必须检查以确保没有添加重复项。

现在我担心 LinkedList 上的 contains() 方法的实现可能会在返回答案之前对内容进行完整扫描。那么,将此方法与我最初发布的方法进行比较,哪种方法可能更有效(在我花大量时间进行实证测试之前)?

After Petro's response, I copied the method and reimplemented it according to his suggestions. It looks like this:

private Set<Coordinate> floodFind2(Value value, Coordinate coordinateStart)
{
    Set<Coordinate> result = new LinkedHashSet<Coordinate>();

    Queue<Coordinate> toSearch = new LinkedList<Coordinate>();
    toSearch.add(coordinateStart);
    while (!toSearch.isEmpty())
    {
        Coordinate coordinate = toSearch.remove();
        result.add(coordinate);
        for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
        {
            if (getCoordinateValue(coordinateAdjacent).equals(value))
            {
                if (!result.contains(coordinateAdjacent))
                {
                    if (!toSearch.contains(coordinateAdjacent))
                    {
                        toSearch.add(coordinateAdjacent);
                    }
                }
            }
        }
    }

    return result;
}

By moving from Set to Queue, my efficiency questions shifted to the new conditional check I had to add, "if (!toSearch.contains(coordinateAdjacent))". Using the Set interface, it silently stopped me from adding duplicates. Using the Queue interface, I have to check to ensure I'm not adding a duplicate.

And now I am concerned that the contains() method's implementation on LinkedList could be doing up to a a full scan of the contents before returning the answer. So, comparing this method to the one I originally posted, which is likely to be more efficient (before I go spend a good quantity of time doing the empirical testing)?

幸福丶如此 2024-10-13 12:36:56

好的,下面是我最新的实现,其中包含反馈(主要来自 Stephen、Cameron 和 Petro),其中包括完全消除 toArray()[]-vs-interator().next() 冲突。我还添加了一些评论,以便更准确地区分正在发生的事情及其原因。为了更好地阐明为什么我具体实施了 Petro 最初的“使用跟踪集”建议(Cameron 附议)。在代码片段之后,我将其与其他建议的解决方案进行对比。

private Set<Coordinate> floodFind3(Coordinate coordinate)
{
    Set<Coordinate> area = new LinkedHashSet<Coordinate>(); //includes only area of value (which is the same as at coordinate)

    area.add(coordinate);
    Value value = getCoordinateValue(coordinate); //value upon which to expand area
    Set<Coordinate> checked = new LinkedHashSet<Coordinate>(); //every coordinate evaluated regardless of value
    checked.add(coordinate);
    Queue<Coordinate> candidates = new LinkedList<Coordinate>(); //coordinates evaluated, were of value, and are queued to iterate through their adjacents
    candidates.add(nordinate);
    while (!candidates.isEmpty())
    {
        for (Nordinate coordinateAdjacent: this.query.getNordinates().getAdjacent(candidates.remove()).getOrthogonal())
        {
            if (checked.add(coordinateAdjacent)) //only expands containing value and !value
            {
                if (getCoordinateValue(coordinateAdjacent) == value)
                {
                    area.add(coordinateAdjacent); //only expands containing value
                    candidates.add(coordinateAdjacent); //expands and contracts containing value
                }
            }
        }
    }

    return area;
}

我已经在几个重要方面更新了该方法:

  1. 少一个方法参数:我​​删除了一个参数,因为它可以从搜索中导出,并消除了一个可能的逻辑问题,即起始坐标指向包含 !value 的位置。
  2. 三个集合跟踪搜索;区域(Set)、已检查(Set)和候选项(Queue)。代码注释阐明了每个的具体用途。使用 LinkedHashSet 实现可靠的可重复性,同时解决错误和性能问题 (http://stackoverflow.com/questions/2704597/iteration-order-of-hashset)。一旦稳定,我可能会恢复到更快的 HashSet 实现。
  3. 在“is value”测试之前重新排序“检查是否已评估”测试,以便仅访问每个坐标一次。这可以避免多次重新访问 !value 相邻坐标。还结合了 Stephen 对 Set add() 方法的巧妙双重使用。当洪水区域变得更加像迷宫(蛇形/蜘蛛形)时,这一点变得非常重要。
  4. 保留“==”来检查强制参考比较的值。 Value 被定义为 Java 1.5 Enum,我不想依赖 HotSpot 来内联 .equals() 方法调用并将其减少为引用比较。如果 Value 不再是 Enum,这个选择可能会反过来影响我。 Tyvm 感谢 Stephen 指出了这一点。

Petro 和 Stephan 的解决方案仅访问包含 value 的坐标一次,但需要多次重新访问包含 !value 的坐标,这可能会导致对由长迷宫般的隧道组成的区域进行相当多的重复获取/值检查。虽然“长迷宫般的隧道”可能被认为是一种病态情况,但它更典型地代表了我需要这种方法的特定领域。我的“第二个”尝试解决方案(LinkedList contains() 调用的性能很差)作为真正的答案是值得怀疑的(对此斯蒂芬表示同意)。

感谢您的所有反馈。

接下来,对数亿次调用中的单一变化/更改进行大量实证测试。我将在本周末的某个时候用详细信息更新此答案。

Okay, below is my latest implementation incorporating feedback (mainly from Stephen, Cameron and Petro) which includes completely eliminating the toArray()[]-vs-interator().next() conflict. And I have sprinkled in comments to more accurately distinguish what's occurring and why. And to better clarify why I concretely implemented Petro's original "use a tracking Set" advice (seconded by Cameron). And just after the code snippet, I will contrast it with the other proposed solutions.

private Set<Coordinate> floodFind3(Coordinate coordinate)
{
    Set<Coordinate> area = new LinkedHashSet<Coordinate>(); //includes only area of value (which is the same as at coordinate)

    area.add(coordinate);
    Value value = getCoordinateValue(coordinate); //value upon which to expand area
    Set<Coordinate> checked = new LinkedHashSet<Coordinate>(); //every coordinate evaluated regardless of value
    checked.add(coordinate);
    Queue<Coordinate> candidates = new LinkedList<Coordinate>(); //coordinates evaluated, were of value, and are queued to iterate through their adjacents
    candidates.add(nordinate);
    while (!candidates.isEmpty())
    {
        for (Nordinate coordinateAdjacent: this.query.getNordinates().getAdjacent(candidates.remove()).getOrthogonal())
        {
            if (checked.add(coordinateAdjacent)) //only expands containing value and !value
            {
                if (getCoordinateValue(coordinateAdjacent) == value)
                {
                    area.add(coordinateAdjacent); //only expands containing value
                    candidates.add(coordinateAdjacent); //expands and contracts containing value
                }
            }
        }
    }

    return area;
}

I have updated the method several significant ways:

  1. One less method parameter: I removed a parameter as it was derivable from the search and eliminated a possible logical issue where the starting coordinate is pointing at a location containing !value.
  2. Three collections track the search; area (Set), checked (Set) and candidates (Queue). The code comments clarify the specific use of each. Used LinkedHashSet for reliable reproducability while chasing bugs and performance issues (http://stackoverflow.com/questions/2704597/iteration-order-of-hashset). Once stable, I will likely revert to faster HashSet implementation.
  3. Reordered the "check if already evaluated" test prior to the "is value" test to only visit every coordinate exactly once. This avoids revisiting !value adjacent coordinates more than once. Also incorporated Stephen's clever double use of the Set add() method. This becomes very important as the area to flood becomes more maze-like (snakely/spidery).
  4. Kept the "==" for checking value forcing a reference comparison. Value is defined to be a Java 1.5 Enum and I didn't want to depend upon HotSpot to both inline the .equals() method call and reduce it to a reference comparison. If Value were ever to change from being an Enum, this choice could come back to bite me. Tyvm to Stephen for pointing this out.

Petro's and Stephan's solutions visit the coordinates containing value just once, but require revisiting the coordinates containing !value more than once, which can result in quite a few duplicate fetches/value checks for areas consisting of long maze-like tunnels. While "long maze-like tunnels" may be considered a pathological case, it is more typical of the particular domain for which I need this method. And my "2nd" attempted solution (which had the poor performance LinkedList contains() call) was questionable as a real answer ({nod} to Stephen on that one).

Thank you for all your feedback.

Next up, lots of empirical testing with single variations/changes over hundreds of millions of invocations. I'll update this answer with details sometime this weekend.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文