选择 Java Collection 实现的经验法则?

发布于 2024-07-04 02:01:59 字数 143 浏览 7 评论 0 原文

在 Java Collection 接口(如 List、Map 或 Set)的不同实现之间进行选择时,有人有一个好的经验法则吗?

例如,通常为什么或在什么情况下我更喜欢使用 Vector 或 ArrayList、Hashtable 或 HashMap?

Anyone have a good rule of thumb for choosing between different implementations of Java Collection interfaces like List, Map, or Set?

For example, generally why or in what cases would I prefer to use a Vector or an ArrayList, a Hashtable or a HashMap?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

泪冰清 2024-07-11 02:01:59

正如其他答案中所建议的,根据用例,有不同的场景可以使用正确的集合。 我列出了几点,

ArrayList:

  • 大多数情况下,您只需要存储或迭代“一堆东西”,然后再迭代它们。 由于基于索引,迭代速度更快。
  • 每当创建 ArrayList 时,都会为其分配固定数量的内存,一旦超过,它将复制整个数组

LinkedList:

  • 它使用双向链表,因此插入和删除操作会很快,因为它只会添加或删除节点。
  • 检索速度很慢,因为它必须遍历节点。

HashSet:

  • 对某个项目做出其他是/否决策,例如“该项目是英语单词吗”、“该项目是否在数据库中?” ,“该商品属于该类别吗?” 等等

  • 记住“您已经处理过哪些项目”,例如在进行网络抓取时;

HashMap:

  • 用于需要说“对于给定的 X,Y 是什么”的情况? 它对于实现内存缓存或索引(即键值对)通常很有用,例如:
    对于给定的用户 ID,其缓存的名称/用户对象是什么?
  • 始终使用 HashMap 来执行查找。

Vector 和 Hashtable 是同步的,因此速度较慢,如果需要同步,请使用 Collections.synchronizedCollection()。
检查以获取已排序的集合。
希望这有帮助。

As suggested in other answers, there are different scenarios to use correct collection depending on use case. I am listing few points,

ArrayList:

  • Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them. Iterating is faster as its index based.
  • Whenever you create an ArrayList, a fixed amount of memory is allocated to it and once exceeded, it copies the whole array

LinkedList:

  • It uses doubly linked list so insertion and deletion operation will be fast as it will only add or remove a node.
  • Retrieving is slow as it will have to iterate through the nodes.

HashSet:

  • Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.

  • Remembering "which items you've already processed", e.g. when doing a web crawl;

HashMap:

  • Used in cases where you need to say "for a given X, what is the Y"? It is often useful for implementing in-memory caches or indexes i.e key value pairs For example:
    For a given user ID, what is their cached name/User object?.
  • Always go with HashMap to perform a lookup.

Vector and Hashtable are synchronized and therefore bit slower and If synchronization is needed, use Collections.synchronizedCollection().
Check This for sorted collections.
Hope this hepled.

拍不死你 2024-07-11 02:01:59

列表允许重复的项目,而集合仅允许一个实例。

每当我需要执行查找时,我都会使用 Map。

对于具体的实现,映射和集合存在保序变体,但很大程度上取决于速度。 我倾向于将 ArrayList 用于相当小的列表,将 HashSet 用于相当小的集合,但是有很多实现(包括您自己编写的任何实现)。 HashMap 对于 Map 来说非常常见。 任何超过“相当小的”的东西,你都必须开始担心内存,这样在算法上就会更加具体。

此页面大量动画图像以及示例代码如果您对硬数字感兴趣,请测试 LinkedList 与 ArrayList。

编辑:我希望以下链接演示了这些东西实际上只是工具箱中的项目,您只需考虑您的需求是什么:请参阅 地图, 列表设置

Lists allow duplicate items, while Sets allow only one instance.

I'll use a Map whenever I'll need to perform a lookup.

For the specific implementations, there are order-preserving variations of Maps and Sets but largely it comes down to speed. I'll tend to use ArrayList for reasonably small Lists and HashSet for reasonably small sets, but there are many implementations (including any that you write yourself). HashMap is pretty common for Maps. Anything more than 'reasonably small' and you have to start worrying about memory so that'll be way more specific algorithmically.

This page has lots of animated images along with sample code testing LinkedList vs. ArrayList if you're interested in hard numbers.

EDIT: I hope the following links demonstrate how these things are really just items in a toolbox, you just have to think about what your needs are: See Commons-Collections versions of Map, List and Set.

以可爱出名 2024-07-11 02:01:59

我假设您从上面的答案中知道列表、集合和映射之间的区别。 为什么要在它们的实现类之间进行选择是另一回事。 例如:

List

  1. ArrayList检索速度快,但插入速度慢。 这对于读取大量数据但不插入/删除大量数据的实现很有用。 它将数据保存在一个连续的内存块中,因此每次需要扩展时,它都会复制整个数组。
  2. LinkedList 检索速度慢,但插入速度快。 这对于插入/删除很多但读取不多的实现很有用。 它不会将整个数组保存在一个连续的内存块中。

Set:

  1. HashSet 不保证迭代的顺序,因此是最快的集合。 它的开销很高,而且比 ArrayList 慢,所以除非数据量很大,当它的哈希速度成为一个因素时,你不应该使用它。
  2. TreeSet 保持数据有序,因此比 HashSet 慢。

Map: HashMap 和 TreeMap 的性能和行为与 Set 实现是并行的。

不应使用向量和哈希表。 它们是同步实现,在新的 Collection 层次结构发布之前,因此速度很慢。 如果需要同步,请使用 Collections.synchronizedCollection()。

I'll assume you know the difference between a List, Set and Map from the above answers. Why you would choose between their implementing classes is another thing. For example:

List:

  1. ArrayList is quick on retrieving, but slow on inserting. It's good for an implementation that reads a lot but doesn't insert/remove a lot. It keeps its data in one continuous block of memory, so every time it needs to expand, it copies the whole array.
  2. LinkedList is slow on retrieving, but quick on inserting. It's good for an implementation that inserts/removes a lot but doesn't read a lot. It doesn't keep the entire array in one continuous block of memory.

Set:

  1. HashSet doesn't guarantee the order of iteration, and therefore is fastest of the sets. It has high overhead and is slower than ArrayList, so you shouldn't use it except for a large amount of data when its hashing speed becomes a factor.
  2. TreeSet keeps the data ordered, therefore is slower than HashSet.

Map: The performance and behavior of HashMap and TreeMap are parallel to the Set implementations.

Vector and Hashtable should not be used. They are synchronized implementations, before the release of the new Collection hierarchy, thus slow. If synchronization is needed, use Collections.synchronizedCollection().

污味仙女 2024-07-11 02:01:59

我真的很喜欢 Sergiy Kovalchuk 博客文章中的备忘单,但不幸的是它已离线。 然而,Wayback Machine 有一个 历史副本:

Java Map/Collection Cheat Sheet

更详细的是 Alexander Zagniotov 的流程图,也是离线的,因此也是历史的 博客副本

Alexander Zaniotov 选择 Collection 实现的流程图

摘自博客中关于评论中提出的问题的内容:
“这份备忘单不包括很少使用的类,如 WeakHashMap、LinkedList 等,因为它们是为非常具体或奇异的任务而设计的,99% 的情况下都不应该选择它们。”

I really like this cheat sheet from Sergiy Kovalchuk's blog entry, but unfortunately it is offline. However, the Wayback Machine has a historical copy:

Java Map/Collection Cheat Sheet

More detailed was Alexander Zagniotov's flowchart, also offline therefor also a historical copy of the blog:

Alexander Zaniotov's flowchart for choosing Collection implementations

Excerpt from the blog on concerns raised in comments:
"This cheat sheet doesn't include rarely used classes like WeakHashMap, LinkedList, etc. because they are designed for very specific or exotic tasks and shouldn't be chosen in 99% cases."

小矜持 2024-07-11 02:01:59

嗯,这取决于你需要什么。 一般准则是:

列表是一个集合,其中数据按插入顺序保存,每个元素都有索引。

Set是一袋不重复的元素(如果重新插入相同的元素,则不会添加它)。 数据没有顺序的概念。

地图 您可以通过数据元素的键来访问和写入数据元素,该键可以是任何可能的对象。

输入图片此处描述
出处:https://stackoverflow.com/a/21974362/2811258

有关 Java 集合的更多信息,查看这篇文章

Well, it depends on what you need. The general guidelines are:

List is a collection where data is kept in order of insertion and each element got index.

Set is a bag of elements without duplication (if you reinsert the same element, it won't be added). Data doesn't have the notion of order.

Map You access and write your data elements by their key, which could be any possible object.

enter image description here
Attribution: https://stackoverflow.com/a/21974362/2811258

For more information about Java Collections, check out this article.

疏忽 2024-07-11 02:01:59

对于非排序的最佳选择,十有八九是:ArrayList、HashMap、HashSet。

Vector 和 Hashtable 是同步的,因此可能会慢一些。 您很少需要同步实现,并且当您这样做时,它们的接口不够丰富,无法让同步发挥作用。 对于 Map,ConcurrentMap 添加了额外的操作以使接口变得有用。 ConcurrentHashMap是ConcurrentMap的一个很好的实现。

LinkedList 几乎从来都不是一个好主意。 即使您正在进行大量插入和删除操作,如果您使用索引来指示位置,则需要迭代列表以找到正确的节点。 ArrayList 几乎总是更快。

对于 Map 和 Set,散列变体将比树/排序更快。 哈希算法往往具有 O(1) 性能,而树则为 O(log n)。

For non-sorted the best choice, more than nine times out of ten, will be: ArrayList, HashMap, HashSet.

Vector and Hashtable are synchronised and therefore might be a bit slower. It's rare that you would want synchronised implementations, and when you do their interfaces are not sufficiently rich for thier synchronisation to be useful. In the case of Map, ConcurrentMap adds extra operations to make the interface useful. ConcurrentHashMap is a good implementation of ConcurrentMap.

LinkedList is almost never a good idea. Even if you are doing a lot of insertions and removal, if you are using an index to indicate position then that requires iterating through the list to find the correct node. ArrayList is almost always faster.

For Map and Set, the hash variants will be faster than tree/sorted. Hash algortihms tend to have O(1) performance, whereas trees will be O(log n).

娇妻 2024-07-11 02:01:59

关于你的第一个问题...

列表、地图和集合有不同的用途。 我建议阅读有关 Java 集合框架的信息,网址为 http://java.lang. sun.com/docs/books/tutorial/collections/interfaces/index.html

更具体一点:

  • 如果您需要类似数组的数据结构并且需要迭代元素,请使用 List
  • 如果您需要类似字典的数据,请使用 Map
  • 如果您只需要确定某些内容是否属于集合,请使用 Set或不。

关于你的第二个问题...

Vector和ArrayList之间的主要区别是前者是同步的,后者不是同步的。 您可以在Java 并发实践中了解有关同步的更多信息。

Hashtable(注意T不是大写字母)和HashMap的区别类似,前者是同步的,后者不是同步的。

我想说,没有优先选择一种实现或另一种实现的经验法则,这实际上取决于您的需求。

About your first question...

List, Map and Set serve different purposes. I suggest reading about the Java Collections Framework at http://java.sun.com/docs/books/tutorial/collections/interfaces/index.html.

To be a bit more concrete:

  • use List if you need an array-like data structure and you need to iterate over the elements
  • use Map if you need something like a dictionary
  • use a Set if you only need to decide if something belongs to the set or not.

About your second question...

The main difference between Vector and ArrayList is that the former is synchronized, the latter is not synchronized. You can read more about synchronization in Java Concurrency in Practice.

The difference between Hashtable (note that the T is not a capital letter) and HashMap is similiar, the former is synchronized, the latter is not synchronized.

I would say that there are no rule of thumb for preferring one implementation or another, it really depends on your needs.

懵少女 2024-07-11 02:01:59

理论上,有一些有用的权衡,但实际上这些几乎无关紧要。

在现实世界的基准测试中,即使使用大列表和“在前面进行大量插入”等操作,ArrayList 的性能也优于 LinkedList。 学者们忽略了这样一个事实:真正的算法具有可以压倒渐近曲线的常数因素。 例如,链表需要为每个节点进行额外的对象分配,这意味着创建节点的速度较慢,并且内存访问特性也较差。

我的规则是:

  1. 始终从 ArrayList、HashSet 和 HashMap 开始(即不是 LinkedList 或 TreeMap)。
  2. 类型声明应该始终是一个接口(即List、Set、Map),因此如果分析器或代码审查证明不是这样,您可以在不破坏任何内容的情况下更改实现。

Theoretically there are useful Big-Oh tradeoffs, but in practice these almost never matter.

In real-world benchmarks, ArrayList out-performs LinkedList even with big lists and with operations like "lots of insertions near the front." Academics ignore the fact that real algorithms have constant factors that can overwhelm the asymptotic curve. For example, linked-lists require an additional object allocation for every node, meaning slower to create a node and vastly worse memory-access characteristics.

My rule is:

  1. Always start with ArrayList and HashSet and HashMap (i.e. not LinkedList or TreeMap).
  2. Type declarations should always be an interface (i.e. List, Set, Map) so if a profiler or code review proves otherwise you can change the implementation without breaking anything.
云雾 2024-07-11 02:01:59

我总是根据具体情况做出这些决定,具体取决于用例,例如:

  • 我是否需要保留订单?
  • 我会有空键/值吗? 重复?
  • 它会被多个线程访问吗?
  • 我需要键/值对吗?
  • 我需要随机访问吗?

然后我拿出我方便的第 5 版 Java in a Nutshell 并比较了大约 20 个选项。 第五章中有一些漂亮的小表格,可以帮助人们找出合适的内容。

好吧,也许如果我即兴知道一个简单的 ArrayList 或 HashSet 就可以解决问题,我就不会全部查找了。 ;)但如果我的预期用途有任何复杂的地方,你敢打赌我在书中。 顺便说一句,我认为 Vector 应该是“老帽子”——我已经很多年没有使用过了。

I've always made those decisions on a case by case basis, depending on the use case, such as:

  • Do I need the ordering to remain?
  • Will I have null key/values? Dups?
  • Will it be accessed by multiple threads
  • Do I need a key/value pair
  • Will I need random access?

And then I break out my handy 5th edition Java in a Nutshell and compare the ~20 or so options. It has nice little tables in Chapter five to help one figure out what is appropriate.

Ok, maybe if I know off the cuff that a simple ArrayList or HashSet will do the trick I won't look it all up. ;) but if there is anything remotely complex about my indended use, you bet I'm in the book. BTW, I though Vector is supposed to be 'old hat'--I've not used on in years.

哆啦不做梦 2024-07-11 02:01:59

使用 Map 进行键值配对

对于 键值 跟踪,使用 Map 实现。

例如,跟踪哪个人在周末的哪一天报道。 所以我们想要映射一个 DayOfWeek 对象到 Employee 对象。

Map < DayOfWeek , Employee > weekendWorker = 
    Map.of( 
        DayOfWeek.SATURDAY , alice ,
        DayOfWeek.SUNDAY , bob
    )
;

选择 Map的实现,有几个方面需要考虑。 其中包括:并发性、键和/或值中 NULL 值的容忍度、迭代键时的顺序、通过引用与内容进行跟踪以及文字语法的便利性。

这是我制作的图表,显示了十个 Map 与 Java 11 捆绑在一起的实现。

Java 11 中的地图实现表,比较其功能

Use Map for key-value pairing

For key-value tracking, use Map implementation.

For example, tracking which person is covering which day of the weekend. So we want to map a DayOfWeek object to an Employee object.

Map < DayOfWeek , Employee > weekendWorker = 
    Map.of( 
        DayOfWeek.SATURDAY , alice ,
        DayOfWeek.SUNDAY , bob
    )
;

When choosing one of the Map implementations, there are several aspects to consider. These include: concurrency, tolerance for NULL values in key and/or value, order when iterating keys, tracking by reference versus content, and convenience of literals syntax.

Here is a chart I made showing the various aspects of each of the ten Map implementations bundled with Java 11.

Table of map implementations in Java 11, comparing their features

囚我心虐我身 2024-07-11 02:01:59

我发现 Bruce Eckel 的《Java 思维》非常有帮助。 他很好地比较了不同的收藏。 我曾经在我的立方体墙上保留了他发布的显示继承层次结构的图表,作为快速参考。 我建议您做的一件事是牢记线程安全。 性能通常意味着线程不安全。

I found Bruce Eckel's Thinking in Java to be very helpful. He compares the different collections very well. I used to keep a diagram he published showing the inheritance heirachy on my cube wall as a quick reference. One thing I suggest you do is keep in mind thread safety. Performance usually means not thread safe.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文