访问 SortedSet 中特定元素的最有效方法是什么?
我想使用一个已排序的集合,但可以通过索引访问其中的元素,即我想要同时具有集合和列表特征的集合。 Java.util.TreeSet 非常接近我的需要,但不允许通过索引进行访问。
我可以想到几个选项:
- 每次需要特定元素时,我都可以迭代 TreeSet。
- 当我需要访问特定元素时,我可以维护一个 TreeSet 并从中生成一个列表。
- 和上面一样,只缓存List,直到Set改变。
- 我可以有一个列表,并在需要添加元素时自行对其进行排序。
- 等等。
各种选择之间存在各种权衡。我希望有人能给我一些好的建议。要回答有关“为什么要这样做?”的潜在问题,请阅读Apriori< /a> 算法。
I want to use a collection that is sorted, but one in which I can access elements by index, i.e. I want something that has characteristics of both a Set and a List. Java.util.TreeSet comes real close to what I need, but doesn't permit access via an index.
I can think of several options:
- I could iterate through a TreeSet every time I needed a particular element.
- I could maintain a TreeSet and generate a List from it when I needed to access a particular element.
- Same as above, only cache the List until the Set changes.
- I could have a List and sort it myself whenever I needed to add an element.
- etc.
There are various trade-offs between the various options. I'm hoping somebody can give me some good advice. To answer the potential questions as to "why would you ever want to do that?", please read about the Apriori algorithm.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
https://github.com/geniot/indexed-tree-map
我有同样的问题。于是我就拿了java.util.TreeMap的源码,写了IndexedTreeMap。它实现了我自己的IndexedNavigableMap:
该实现基于红黑树中节点权重发生变化时的更新。权重是给定节点下的子节点数量加一 - self。例如,当一棵树向左旋转时:
updateWeight 只是将权重更新到根:
当我们需要通过索引查找元素时,这里是使用权重的实现:
查找键的索引也非常方便:
您可以在 https://github.com/geniot/indexed-tree 找到这项工作的结果-地图
https://github.com/geniot/indexed-tree-map
I had the same problem. So I took the source code of java.util.TreeMap and wrote IndexedTreeMap. It implements my own IndexedNavigableMap:
The implementation is based on updating node weights in the red-black tree when it is changed. Weight is the number of child nodes beneath a given node, plus one - self. For example when a tree is rotated to the left:
updateWeight simply updates weights up to the root:
And when we need to find the element by index here is the implementation that uses weights:
Also comes in very handy finding the index of a key:
You can find the result of this work at https://github.com/geniot/indexed-tree-map
有几点:
有点无法回答,但是当我最后一次需要重新实现频繁项集挖掘算法时,我选择了 FP-growth,它的性能与先验相当(或更好)而且,在我看来,更容易实施。该技术是由Jiawei Han等人开发的,基本上在数据挖掘:概念与技术中有专门的章节。 有点没有答案,但是
有几种开源工具采用相当标准化的输入(每行一个整数列表;整数代表项目,行代表项目集)。其中一些让您可以选择算法。其中许多都可以在此处获得许可:http://fimi.ua.ac.be/src/
请记住,除非您专门使用数组/向量,否则仅使用任何
List
实现都不会获得O(1)
元素访问。更有可能的是,通过保留大部分或完全排序的数组(使用二分搜索查找超过特定限制的元素,以及用于随机访问的通常索引),您将获得更好的效果。A couple of points:
Sort of a non-answer, but when I last needed to re-implement a frequent itemset mining algorithm, I went with FP-growth, which has performance on-par (or better) than a priori and, in my opinion, is easier to implement. This technique was developed by Jiawei Han and others, basically has a dedicated chapter in Data Mining: Concepts and Techniques.
There are several open-source tools that take a pretty standardized input (one list of integers per line; integers represent items, lines represent itemsets). Some of them give you a choice of algorithms. Many of them are available here with permissive licenses: http://fimi.ua.ac.be/src/
Keep in mind that using just any
List
implementation doesn't get youO(1)
element access unless you specifically use an array/vector. More likely, you'll get better mileage out of keeping a mostly- or fully sorted array (with binary search for finding elements over a specific limit, and usual indexing for random access).也许是 Treeset 和 apache commons 集合 API CollectionUtils.get() 会解决你的问题
Perhaps a combination of Treeset and the apache commons collections API CollectionUtils.get() would solve your problem
我会研究LinkedHashSet。它维护 HashSet 的插入顺序。
I would look into LinkedHashSet. It maintains insertion order of a HashSet.