为什么从过滤器创建集合比创建列表或元组快得多？

发布于 2024-12-15 04:18:56 字数 1489 浏览 7 评论 0原文

我正在交互上运行 filter ，并希望将结果存储在序列中（我需要一个序列，以便我可以在其上使用 random.choice ）。我注意到从 filter 对象创建集合比创建列表或元组要快得多。这是为什么？我首先认为过滤器类型是集合的子类型，这可以解释这一点，但是过滤器函数实际上与生成器表达式相同，因此它内部不能真正是集合。

我运行了以下测试来检查速度：

import time

def test ( n, seq ):
    for method in ( set, list, tuple ):
        t = time.time()
        for i in range( n ):
            method( seq )
        print( method.__name__, ( time.time() - t ) )

someFilter = filter( lambda x: x % 3 == 0, range( 1000 ) )
test( 10000000, someFilter )

结果清楚地说明了使用集合：

set 1.9240000247955322
list 8.82200002670288
tuple 7.031999826431274

那么为什么从过滤器创建集合要快得多？通常不应该花费与从序列创建集合一样长的时间，其中每个元素都必须进行散列吗？或者它是否以某种方式从内部过滤器表示中获得提升？

作为比较，在 range 表达式上运行测试时，set 花费的时间大约是 list 和 tuple 的两倍（两者的速度几乎相同）。

编辑：

斯文的答案是完全正确的，但为了完整性，更新的测试将在实际的过滤器上运行：

import time

def testFilter ( n, test, rangeSize ):
    for method in ( set, list, tuple ):
        t = time.time()
        for i in range( n ):
            method( filter( test, range( rangeSize ) ) )
        print( method.__name__, ( time.time() - t ) )

testFilter( 100000, lambda x: x % 3 == 0, 1000 )

结果实际上显示了 list 和 tuple 两者更有意义是最快的，尽管 set 并不慢，所以使用什么不会有任何区别：

set 27.868000030517578
list 27.131999969482422
tuple 27.138000011444092

原文

I’m running filter on an interable and want to store the result in a sequence (I need a sequence so that I can use random.choice on it). I noticed that creating a set from a filter object is a lot faster than creating a list or a tuple. Why is that? I first though that the filter type is a subtype of set, which would explain this, but the filter function is actually identical to a generator expression, so it cannot really be a set internally.

I ran the following test to check the speed:

import time

def test ( n, seq ):
    for method in ( set, list, tuple ):
        t = time.time()
        for i in range( n ):
            method( seq )
        print( method.__name__, ( time.time() - t ) )

someFilter = filter( lambda x: x % 3 == 0, range( 1000 ) )
test( 10000000, someFilter )

And the results were speaking clearly for using a set:

set 1.9240000247955322
list 8.82200002670288
tuple 7.031999826431274

So why is creating a set from a filter so much faster? Shouldn’t it usually take as long as creating a set from a sequence, where every element has to be hashed? Or is it somehow getting a boost from the internal filter representation?

For comparison, when running the test on a range expression, set takes about twice as long as list and tuple (which are both nearly identical in speed).

edit:

Sven’s answer is totally right, but for the completeness an updated test that will run on an actual filter:

import time

def testFilter ( n, test, rangeSize ):
    for method in ( set, list, tuple ):
        t = time.time()
        for i in range( n ):
            method( filter( test, range( rangeSize ) ) )
        print( method.__name__, ( time.time() - t ) )

testFilter( 100000, lambda x: x % 3 == 0, 1000 )

The result actually shows what makes more sense with list and tuple both being the fastest, although set isn’t really slower, so it won’t make any difference what to use:

set 27.868000030517578
list 27.131999969482422
tuple 27.138000011444092

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时常饿 2024-12-22 04:18:56

filter() 在 Python 3 中返回一个迭代器，并且该迭代器将在第一次运行内部 for 循环时被消耗。之后，您只需测量构造器的速度 - 这就是为什么您必须经常重复它以使其至少消耗一点时间。

因此，set() 的构造函数似乎是处理空迭代器最快的构造函数。

回复收藏 0 原文

凉城 2024-12-22 04:18:56

当计时表明结果不合逻辑时，通常是计时套件本身有问题；-)

尝试使用 timeit 模块可以帮助您避免常见的计时错误。特别是，您希望为每个测试运行一个新的设置，并且仅对循环体而不是循环体加上测试代码进行计时。

在这种情况下，它至少会使计时具有可比性（所有这些都将使用 Python 3 版本的 *filter 返回的新迭代器），并且它会显示出令人难以置信的快速计时（因为只有 方法（迭代器） 代码将在循环中计时）。

FWIW，pypy 会更难计时，因为过于简单的循环会被完全优化掉。

[编辑问题的答案] 新的计时是可比的（一个很好的改进），但结果仍然显示设置时间和循环时间的组合，因此很难看到可能显着的差异。您的期望应该是列表和元组胜过集合，因为集合必须做更多的工作（散列每个输入而不是简单地存储它们）。

回复收藏 0 原文

~没有更多了~

关于作者

巴黎夜雨

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

为什么从过滤器创建集合比创建列表或元组快得多？

编辑：

edit:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

为什么从过滤器创建集合比创建列表或元组快得多？

编辑：

edit:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。