用于高效浅克隆的哈希集实现或技术?

发布于 2024-12-24 18:40:49 字数 855 浏览 0 评论 0原文

我的算法的一个关键部分涉及克隆对象集。我尝试了多种方法来优化此操作,包括:

  • 重用元素,这样我只需要浅层复制 - 性能大幅提升。
  • 依靠集合的clone 方法,而不是手动插入对象 - 性能提升较小(可能的原因请参见下文)。
  • 使包含该集合的类维护一个写时复制标志,以便克隆这些对象的时间复杂度为 O(1),并且只有在插入/删除时才会克隆集合 - 一些性能增益。
  • 最后,我还考虑过是否可以完全放弃这种方法,但这是另一个问题的主题:)

但是,分析显示克隆操作仍然需要大量时间。我可以忍受它,但我一直想知道是否有任何技巧可以让我进一步优化此操作 - 例如,我听说过通过序列化进行克隆的技巧,尽管我不确定这是否会帮助。

我也愿意切换到 Java Set 接口的不同实现,只要它依赖于 hashCodeequals 即可。我的集合通常不是很大,最多有几十个项目,所以我也愿意牺牲传统集合运算符(addcontains)的一些效率,如果它可以提供帮助。我什至考虑过转向由 LinkedList 或某些跳过列表支持的自定义实现,但在此之前我想在这里询问是否有人对此有任何经验或见解。

编辑:对于那些问为什么我需要多次克隆这些集合的人 - 这是一个很好的问题!现在,该算法依赖于将一组集合初始化为与另一组集合相同,并且稍后两组可能会独立更新,这就是为什么克隆对我来说似乎是显而易见的解决方案 - 但肯定存在更好的解决方案。正如我在上面的第四个项目符号中所写的,我确实可能会选择不同的解决方案,但同时我想将这个问题集中在优化现有解决方案上。

A critical part of my algorithm involves cloning Sets of objects. I have attempted several approaches to optimizing this operation, including:

  • Reusing the elements, so that I only need shallow copying - big performance gain.
  • Relying on the clone method of the sets, rather than manually inserting the objects - minor performance gain (see below for probable reason).
  • Making the classes containing the set maintain a copy-on-write flag, so that cloning those objects is O(1) and only upon insertion / removal are the sets cloned - some performance gain.
  • Finally I've also thought whether I can drop this approach altogether, but that's a topic for another question :)

However, profiling reveals the cloning operation still takes a significant amount of time. I can live with it, but I've been wondering if there are any tricks that will allow me to further optimize this operation - for instance, I've heard about a trick for cloning via serialization, though I'm not sure that will help.

I'm also open for switching to a different implementation of the Java's Set interface, as long as it relies on hashCode and equals. My sets are typically not very large, up to dozens of items, so I'm also willing to sacrifice some efficiency from the traditional set operators (add, contains) if it can help. I've even considered moving to a custom implementation backed by a LinkedList or some skip list, but before that I wanted to ask here if anyone has any experience or insight regarding this.

EDIT: for those asking why I need to clone these sets so many times - it's a good question! Right now the algorithm relies on initializing one group of sets to be identical to another group of sets, and later both groups may be independently updated, which is why cloning seemed like the obvious solution for me - but a better solution might definitely exist. As I've wrote in the 4th bullet above I might indeed choose a different solution, but meanwhile I'd like to focus this question on optimizing the existing one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

隱形的亼 2024-12-31 18:40:49

序列化比深度克隆慢得多。我怀疑这会有帮助。

最高效的解决方案是首先避免需要克隆,这样就根本不需要任何时间。也许通过更多关于您为什么使用克隆的详细信息,我们可以建议一种替代方案。

加速克隆的一种方法是使用一个 Set,它引用一个不可变的集合,并记住 Map 中的差异,其中值是 true 或 false 取决于它是否具有已添加或删除。如果更改数量相对较少,则这种方法效果最佳。

Serialization is much slower than deep cloning. I doubt it will help.

The most performant solution is to avoid needing to clone in the first place, then it won't take any time at all. Perhaps with more details as to why you are using cloning we can suggest an alternative.

One way to speed up the cloning is to have a Set which refers to an immutable set and remembers the differences in say a Map<E, Boolean> where the value is true or false depending on whether it has been added or removed. This would work best if the number of changes is relatively low.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文