用于高效浅克隆的哈希集实现或技术?
我的算法的一个关键部分涉及克隆对象集。我尝试了多种方法来优化此操作,包括:
- 重用元素,这样我只需要浅层复制 - 性能大幅提升。
- 依靠集合的
clone
方法,而不是手动插入对象 - 性能提升较小(可能的原因请参见下文)。 - 使包含该集合的类维护一个写时复制标志,以便克隆这些对象的时间复杂度为 O(1),并且只有在插入/删除时才会克隆集合 - 一些性能增益。
- 最后,我还考虑过是否可以完全放弃这种方法,但这是另一个问题的主题:)
但是,分析显示克隆操作仍然需要大量时间。我可以忍受它,但我一直想知道是否有任何技巧可以让我进一步优化此操作 - 例如,我听说过通过序列化进行克隆的技巧,尽管我不确定这是否会帮助。
我也愿意切换到 Java Set
接口的不同实现,只要它依赖于 hashCode
和 equals
即可。我的集合通常不是很大,最多有几十个项目,所以我也愿意牺牲传统集合运算符(add
、contains
)的一些效率,如果它可以提供帮助。我什至考虑过转向由 LinkedList 或某些跳过列表支持的自定义实现,但在此之前我想在这里询问是否有人对此有任何经验或见解。
编辑:对于那些问为什么我需要多次克隆这些集合的人 - 这是一个很好的问题!现在,该算法依赖于将一组集合初始化为与另一组集合相同,并且稍后两组可能会独立更新,这就是为什么克隆对我来说似乎是显而易见的解决方案 - 但肯定存在更好的解决方案。正如我在上面的第四个项目符号中所写的,我确实可能会选择不同的解决方案,但同时我想将这个问题集中在优化现有解决方案上。
A critical part of my algorithm involves cloning Set
s of objects. I have attempted several approaches to optimizing this operation, including:
- Reusing the elements, so that I only need shallow copying - big performance gain.
- Relying on the
clone
method of the sets, rather than manually inserting the objects - minor performance gain (see below for probable reason). - Making the classes containing the set maintain a copy-on-write flag, so that cloning those objects is O(1) and only upon insertion / removal are the sets cloned - some performance gain.
- Finally I've also thought whether I can drop this approach altogether, but that's a topic for another question :)
However, profiling reveals the cloning operation still takes a significant amount of time. I can live with it, but I've been wondering if there are any tricks that will allow me to further optimize this operation - for instance, I've heard about a trick for cloning via serialization, though I'm not sure that will help.
I'm also open for switching to a different implementation of the Java's Set
interface, as long as it relies on hashCode
and equals
. My sets are typically not very large, up to dozens of items, so I'm also willing to sacrifice some efficiency from the traditional set operators (add
, contains
) if it can help. I've even considered moving to a custom implementation backed by a LinkedList
or some skip list, but before that I wanted to ask here if anyone has any experience or insight regarding this.
EDIT: for those asking why I need to clone these sets so many times - it's a good question! Right now the algorithm relies on initializing one group of sets to be identical to another group of sets, and later both groups may be independently updated, which is why cloning seemed like the obvious solution for me - but a better solution might definitely exist. As I've wrote in the 4th bullet above I might indeed choose a different solution, but meanwhile I'd like to focus this question on optimizing the existing one.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
序列化比深度克隆慢得多。我怀疑这会有帮助。
最高效的解决方案是首先避免需要克隆,这样就根本不需要任何时间。也许通过更多关于您为什么使用克隆的详细信息,我们可以建议一种替代方案。
加速克隆的一种方法是使用一个 Set,它引用一个不可变的集合,并记住
Map
中的差异,其中值是 true 或 false 取决于它是否具有已添加或删除。如果更改数量相对较少,则这种方法效果最佳。Serialization is much slower than deep cloning. I doubt it will help.
The most performant solution is to avoid needing to clone in the first place, then it won't take any time at all. Perhaps with more details as to why you are using cloning we can suggest an alternative.
One way to speed up the cloning is to have a Set which refers to an immutable set and remembers the differences in say a
Map<E, Boolean>
where the value is true or false depending on whether it has been added or removed. This would work best if the number of changes is relatively low.