在 Java(1.5 或更高版本)中,从 Set 中获取(任何)元素的最佳执行方法是什么?
在下面的代码中,我需要从 toSearch 获取一个元素,任何元素。我无法在 Set 接口定义上找到有用的方法来仅返回集合的单个(随机,但不要求是随机)成员。因此,我使用了 toArray()[0] 技术(如下面的代码所示)。
private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart)
{
Set<Coordinate> result = new LinkedHashSet<Coordinate>();
Set<Coordinate> toSearch = new LinkedHashSet<Coordinate>();
toSearch.add(coordinateStart);
while (toSearch.size() > 0)
{
Coordinate coordinate = (Coordinate)toSearch.toArray()[0];
result.add(coordinate);
toSearch.remove(coordinate);
for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
{
if (this.query.getCoordinateValue(coordinateAdjacent) == value)
{
if (!result.contains(coordinateAdjacent))
{
toSearch.add(coordinateAdjacent);
}
}
}
}
return result;
}
我见过讨论的另一种技术是用“toSearch.iterator().next()”替换“(Cooperative)toSearch.toArray()[0]”。 toArray() 或 iterator() 哪种技术执行速度最快且 GC(垃圾收集)影响最小?
我的直觉(在提出这个问题之后)是使用迭代器的第二种技术执行速度更快,并且 GC 的开销更低。鉴于我不知道所传递的 Set 的实现(假设最有可能是 HashSet 或 LinkedHashSet),每个 toArray() 或 iterator() 方法会产生多少开销?对此的任何见解将不胜感激。
问题(从上面重复):
- toArray() 或 iterator() 哪种技术最有可能执行得最快且对 GC(垃圾收集)影响最小?
- 鉴于我不知道所传递的 Set 的实现(假设最有可能是 HashSet 或 LinkedHashSet),每个 toArray() 和 iterator() 方法会产生多少开销?
In the code below, I needed to fetch an element, any element, from toSearch. I was unable to find a useful method on the Set interface definition to return just a single (random, but not required to be random) member of the set. So, I used the toArray()[0] technique (present in the code below).
private Set<Coordinate> floodFill(Value value, Coordinate coordinateStart)
{
Set<Coordinate> result = new LinkedHashSet<Coordinate>();
Set<Coordinate> toSearch = new LinkedHashSet<Coordinate>();
toSearch.add(coordinateStart);
while (toSearch.size() > 0)
{
Coordinate coordinate = (Coordinate)toSearch.toArray()[0];
result.add(coordinate);
toSearch.remove(coordinate);
for (Coordinate coordinateAdjacent: getAdjacentCoordinates(coordinate))
{
if (this.query.getCoordinateValue(coordinateAdjacent) == value)
{
if (!result.contains(coordinateAdjacent))
{
toSearch.add(coordinateAdjacent);
}
}
}
}
return result;
}
The other technique I have seen discussed is to replace "(Coordinate)toSearch.toArray()[0]" with "toSearch.iterator().next()". Which technique, toArray() or iterator(), is the most likely to execute the most quickly with the least GC (Garbage Collection) impact?
My intuition (after composing this question) is that the second technique using the Iterator will be both faster in execution and lower overhead for the GC. Given I don't know the implementation of the Set being passed (assuming HashSet or LinkedHashSet as most likely), how much overhead is incurred in each of the toArray() or iterator() methods? Any insights on this would be greatly appreciated.
Questions (repeated from above):
- Which technique, toArray() or iterator(), is the most likely to execute the most quickly with the least GC (Garbage Collection) impact?
- Given I don't know the implementation of the Set being passed (assuming HashSet or LinkedHashSet as most likely), how much overhead is incurred in each of the toArray() and iterator() methods?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
toSearch.iterator().next()
速度更快,内存占用更少,因为它不需要复制任何数据,而toArray
将分配并复制集合到数组中。这与实际实现无关:toArray
总是必须复制数据。toSearch.iterator().next()
will be faster and less memory-intensive because it does not need to copy any data, whereastoArray
will allocate and copy the contents of the set into the array. This is irrespective of the actual implementation:toArray
will always have to copy data.据我所知,您正在做 广度优先搜索
下面是示例不使用 toArray 实现:
实现说明:
您对全扫描(又名线性搜索)的看法是正确的。尽管如此,在您的情况下,可以有额外的设置来跟踪已经访问过的顶点(顺便说一句,实际上这是您的结果!),这将在 O(1) 时间内解决 contains 方法的问题。
干杯
From what I can see you are doing Breadth First Search
Below is the example how it could be implemented without using toArray:
Implementation notes:
You are right about full scan (aka linear search). Nevertheless, In your case it's possible to have additional set for tracking already visited vertexes(btw, actually it's your result!), that would solve issue with contains method in O(1) time.
Cheers
以下是我的实现方法:
注意:
toSearch
数据结构不需要包含唯一元素。LinkedList
进行toSearch
意味着有一种简单的方法可以一次性获取元素并删除它。Set.add(...)
返回一个boolean
来获取result
集合中的查找次数。与使用Set.contains()
相比。HashSet
而不是LinkedHashSet
来获取结果......除非您需要知道填充添加坐标的顺序。==
来比较Value
实例可能有点狡猾。Here's how I'd implement this:
Notes:
toSearch
data structure doesn't need to contain unique elements.LinkedList
fortoSearch
means that there is a simple method to get an element and remove it in one go.Set.add(...)
returns aboolean
to have the number of lookups in theresult
set ... compared with usingSet.contains()
.HashSet
rather thanLinkedHashSet
for the results ... unless you need to know the order in which coordinates were added by the fill.==
to compareValue
instances is potentially a bit dodgy.听完Petro的回复,我复制了这个方法,按照他的建议重新实现了。它看起来像这样:
通过从“集合”转移到“队列”,我的效率问题转移到了我必须添加的新条件检查“if (!toSearch.contains(coordinateAdjacent))”。使用 Set 接口,它默默地阻止我添加重复项。使用队列接口,我必须检查以确保没有添加重复项。
现在我担心 LinkedList 上的 contains() 方法的实现可能会在返回答案之前对内容进行完整扫描。那么,将此方法与我最初发布的方法进行比较,哪种方法可能更有效(在我花大量时间进行实证测试之前)?
After Petro's response, I copied the method and reimplemented it according to his suggestions. It looks like this:
By moving from Set to Queue, my efficiency questions shifted to the new conditional check I had to add, "if (!toSearch.contains(coordinateAdjacent))". Using the Set interface, it silently stopped me from adding duplicates. Using the Queue interface, I have to check to ensure I'm not adding a duplicate.
And now I am concerned that the contains() method's implementation on LinkedList could be doing up to a a full scan of the contents before returning the answer. So, comparing this method to the one I originally posted, which is likely to be more efficient (before I go spend a good quantity of time doing the empirical testing)?
好的,下面是我最新的实现,其中包含反馈(主要来自 Stephen、Cameron 和 Petro),其中包括完全消除 toArray()[]-vs-interator().next() 冲突。我还添加了一些评论,以便更准确地区分正在发生的事情及其原因。为了更好地阐明为什么我具体实施了 Petro 最初的“使用跟踪集”建议(Cameron 附议)。在代码片段之后,我将其与其他建议的解决方案进行对比。
我已经在几个重要方面更新了该方法:
Petro 和 Stephan 的解决方案仅访问包含 value 的坐标一次,但需要多次重新访问包含 !value 的坐标,这可能会导致对由长迷宫般的隧道组成的区域进行相当多的重复获取/值检查。虽然“长迷宫般的隧道”可能被认为是一种病态情况,但它更典型地代表了我需要这种方法的特定领域。我的“第二个”尝试解决方案(LinkedList contains() 调用的性能很差)作为真正的答案是值得怀疑的(对此斯蒂芬表示同意)。
感谢您的所有反馈。
接下来,对数亿次调用中的单一变化/更改进行大量实证测试。我将在本周末的某个时候用详细信息更新此答案。
Okay, below is my latest implementation incorporating feedback (mainly from Stephen, Cameron and Petro) which includes completely eliminating the toArray()[]-vs-interator().next() conflict. And I have sprinkled in comments to more accurately distinguish what's occurring and why. And to better clarify why I concretely implemented Petro's original "use a tracking Set" advice (seconded by Cameron). And just after the code snippet, I will contrast it with the other proposed solutions.
I have updated the method several significant ways:
Petro's and Stephan's solutions visit the coordinates containing value just once, but require revisiting the coordinates containing !value more than once, which can result in quite a few duplicate fetches/value checks for areas consisting of long maze-like tunnels. While "long maze-like tunnels" may be considered a pathological case, it is more typical of the particular domain for which I need this method. And my "2nd" attempted solution (which had the poor performance LinkedList contains() call) was questionable as a real answer ({nod} to Stephen on that one).
Thank you for all your feedback.
Next up, lots of empirical testing with single variations/changes over hundreds of millions of invocations. I'll update this answer with details sometime this weekend.