在Python中搜索列表的最快方法

发布于 2024-11-07 05:49:54 字数 262 浏览 0 评论 0原文

当您在 a 中执行诸如 "test" in a 之类的操作(其中 a 是一个列表)时,Python 是否会对列表进行顺序搜索,或者是否创建哈希表表示形式来优化查找?在应用程序中,我需要这个,因为我将在列表中进行大量查找,因此最好执行类似 b = set(a) 的操作,然后在中执行 "test" b?另请注意,我将拥有的值列表不会有重复的数据,而且我实际上并不关心它的顺序;我只需要能够检查一个值是否存在。

When you do something like "test" in a where a is a list does python do a sequential search on the list or does it create a hash table representation to optimize the lookup? In the application I need this for I'll be doing a lot of lookups on the list so would it be best to do something like b = set(a) and then "test" in b? Also note that the list of values I'll have won't have duplicate data and I don't actually care about the order it's in; I just need to be able to check for the existence of a value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

旧故 2024-11-14 05:49:54

另请注意,我将拥有的值列表不会有重复的数据,而且我实际上并不关心它的顺序;我只需要能够检查值是否存在。

不要使用列表,使用 set() 相反。它完全具有您想要的属性,包括速度极快的 in 测试。

我发现在更改一组列表的地方(主要是繁重的数字运算),速度提高了 20 倍甚至更高。

Also note that the list of values I'll have won't have duplicate data and I don't actually care about the order it's in; I just need to be able to check for the existence of a value.

Don't use a list, use a set() instead. It has exactly the properties you want, including a blazing fast in test.

I've seen speedups of 20x and higher in places (mostly heavy number crunching) where one list was changed for a set.

淤浪 2024-11-14 05:49:54

"test" in a 与列表 a 将进行线性搜索。动态设置哈希表比线性搜索要昂贵得多。另一方面,b 中的“test”将执行 O(1) 哈希查找。

在您描述的情况下,似乎没有理由使用列表而不是集合。

"test" in a with a list a will do a linear search. Setting up a hash table on the fly would be much more expensive than a linear search. "test" in b on the other hand will do an amoirtised O(1) hash look-up.

In the case you describe, there doesn't seem to be a reason to use a list over a set.

浮云落日 2024-11-14 05:49:54

我认为按照既定的实施方式会更好。我知道集合的查找时间为 O(1)。我认为列表需要 O(n) 查找时间。但即使列表也是 O(1) 查找,切换到集合也不会造成任何损失。

此外,集合不允许重复值。这也会使你的程序的内存效率稍微提高一些

I think it would be better to go with the set implementation. I know for a fact that sets have O(1) lookup time. I think lists take O(n) lookup time. But even if lists are also O(1) lookup, you lose nothing with switching to sets.

Further, sets don't allow duplicate values. This will make your program slightly more memory efficient as well

一场信仰旅途 2024-11-14 05:49:54

列表和元组似乎有相同的时间,并且使用“in”对于大数据来说很慢:

>>> t = list(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.66235494614
>>> t = tuple(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.6594209671

这是更好的解决方案:在巨大列表中查找/搜索的最有效方法(python)

它非常快:

>>> from bisect import bisect_left
>>> t = list(range(0, 1000000))
>>> a=time.time();x = [t[bisect_left(t,b)]==b for b in range(100234,101234)];print(time.time()-a)
0.0054759979248

List and tuples seems to have the same time, and using "in" is slow for large data:

>>> t = list(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.66235494614
>>> t = tuple(range(0, 1000000))
>>> a=time.time();x = [b in t for b in range(100234,101234)];print(time.time()-a)
1.6594209671

Here is much better solution: Most efficient way for a lookup/search in a huge list (python)

It's super fast:

>>> from bisect import bisect_left
>>> t = list(range(0, 1000000))
>>> a=time.time();x = [t[bisect_left(t,b)]==b for b in range(100234,101234)];print(time.time()-a)
0.0054759979248
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文