在 Smalltalk 中使用其他集合选择对象的最快方法
给定两个集合:
srcCollection := #('Lorem' 'ipsum' 'dolor' 'sit' 'amet,' 'consectetur' 'adipisicing' 'elit,' 'sed' 'do' 'eiusmod' 'tempor' 'incididunt' 'ut' 'labore' 'et' 'dolore' 'magna' 'aliqua.' 'Ut' 'enim' 'ad' 'minim' 'veniam,' 'quis' 'nostrud' 'exercitation' 'ullamco' 'laboris' 'nisi' 'ut' 'aliquip' 'ex' 'ea' 'commodo' 'consequat.' 'Duis' 'aute' 'irure' 'dolor' 'in' 'reprehenderit' 'in' 'voluptate' 'velit' 'esse' 'cillum' 'dolore' 'eu' 'fugiat' 'nulla' 'pariatur.' 'Excepteur' 'sint' 'occaecat' 'cupidatat' 'non' 'proident,' 'sunt' 'in' 'culpa' 'qui' 'officia' 'deserunt' 'mollit' 'anim' 'id' 'est' 'laborum').
objCollection := #('Lorem' 'numquam' 'eius' 'modi' 'tempora' 'incidunt' 'ut' 'labore' 'et' 'dolore' 'magnam' 'aliquam' 'ipsum' 'dolor' 'ex' 'ea' 'commodo' 'consequat.' 'Duis' 'aute' 'irure' 'dolor' 'in' 'reprehenderit' 'in' 'voluptate' 'velit' 'esse' 'cillum' 'dolore' 'eu' 'fugiat' 'nulla' 'pariatur.' 'Excepteur' 'sint' 'occaecat' 'cupidatat' 'non' 'proident,' 'sunt' 'in' 'culpa' 'qui' 'officia' 'deserunt' 'mollit' 'anim' 'id' 'est' 'laborum' 'Sed' 'ut' 'perspiciatis' 'unde' 'omnis' 'iste' 'natus' 'error' 'sit' 'voluptatem' 'accusantium' 'doloremque' 'laudantium,' 'totam' 'rem' 'aperiam,' 'eaque' 'ipsa' 'quae' 'ab' 'illo' 'inventore' 'veritatis' 'et' 'quasi' 'architecto' 'sit' 'amet,' 'consectetur' 'adipisicing' 'elit,' 'sed' 'do' 'eiusmod' 'tempor' 'incididunt' 'ut' 'labore' 'et' 'dolore' 'magna' 'aliqua.' 'Ut' 'enim' 'ad' 'minim' 'veniam,' 'quis' 'nostrud' 'exercitation' 'ullamco' 'laboris' 'nisi' 'ut' 'aliquip' 'beatae' 'vitae' 'dicta' 'sunt' 'explicabo.' 'Nemo' 'enim' 'ipsam' 'voluptatem' 'quia' 'voluptas' 'sit' 'aspernatur' 'aut' 'odit' 'aut' 'fugit,' 'sed' 'quia' 'consequuntur' 'magni' 'dolores' 'eos' 'qui' 'ratione' 'voluptatem' 'sequi' 'nesciunt.' 'Neque' 'porro' 'quisquam' 'est,' 'qui' 'dolorem' 'ipsum' 'quia' 'dolor' 'sit' 'amet,' 'consectetur,' 'adipisci' 'velit,' 'sed' 'quia' 'non' 'quaerat' 'voluptatem.' 'Ut' 'enim' 'ad' 'minima' 'veniam,' 'quis' 'nostrum' 'exercitationem' 'ullam' 'corporis' 'suscipit' 'laboriosam,' 'nisi' 'ut' 'aliquid' 'ex' 'ea' 'commodi' 'consequatur?' 'Quis' 'autem' 'vel' 'eum' 'iure' 'reprehenderit' 'qui' 'in' 'ea' 'voluptate' 'velit' 'esse' 'quam' 'nihil' 'molestiae' 'consequatur,' 'vel' 'illum' 'qui' 'dolorem' 'eum' 'fugiat' 'quo' 'voluptas' 'nulla' 'pariatur?').
其中 objCollection 保证包含 srcCollection 中的所有元素。注意:在我的应用程序中,objCollection 实际上是复杂的对象,包含这些字符串作为标识符,没有重复项。
我一直在测量并尝试优化选择 objCollection 中也在 srcCollection 中的所有对象。以下时间以毫秒为单位,在具有 Stack VM 的 Pharo 1.2 和具有 2Gb 内存的 Windows XP 中使用 [ 1000 timesRepeat: [ ... ] ] timeToRun
。这些是我的尝试:
objCollection intersection: srcCollection
7537
7507
objCollection select: [: str | srcCollection includes: str ]
7471
7507
srcCollection collect: [: str | objCollection detect: [: obj | obj = str ] ]
4227
4323
有没有更快的方法?
Given two collections:
srcCollection := #('Lorem' 'ipsum' 'dolor' 'sit' 'amet,' 'consectetur' 'adipisicing' 'elit,' 'sed' 'do' 'eiusmod' 'tempor' 'incididunt' 'ut' 'labore' 'et' 'dolore' 'magna' 'aliqua.' 'Ut' 'enim' 'ad' 'minim' 'veniam,' 'quis' 'nostrud' 'exercitation' 'ullamco' 'laboris' 'nisi' 'ut' 'aliquip' 'ex' 'ea' 'commodo' 'consequat.' 'Duis' 'aute' 'irure' 'dolor' 'in' 'reprehenderit' 'in' 'voluptate' 'velit' 'esse' 'cillum' 'dolore' 'eu' 'fugiat' 'nulla' 'pariatur.' 'Excepteur' 'sint' 'occaecat' 'cupidatat' 'non' 'proident,' 'sunt' 'in' 'culpa' 'qui' 'officia' 'deserunt' 'mollit' 'anim' 'id' 'est' 'laborum').
objCollection := #('Lorem' 'numquam' 'eius' 'modi' 'tempora' 'incidunt' 'ut' 'labore' 'et' 'dolore' 'magnam' 'aliquam' 'ipsum' 'dolor' 'ex' 'ea' 'commodo' 'consequat.' 'Duis' 'aute' 'irure' 'dolor' 'in' 'reprehenderit' 'in' 'voluptate' 'velit' 'esse' 'cillum' 'dolore' 'eu' 'fugiat' 'nulla' 'pariatur.' 'Excepteur' 'sint' 'occaecat' 'cupidatat' 'non' 'proident,' 'sunt' 'in' 'culpa' 'qui' 'officia' 'deserunt' 'mollit' 'anim' 'id' 'est' 'laborum' 'Sed' 'ut' 'perspiciatis' 'unde' 'omnis' 'iste' 'natus' 'error' 'sit' 'voluptatem' 'accusantium' 'doloremque' 'laudantium,' 'totam' 'rem' 'aperiam,' 'eaque' 'ipsa' 'quae' 'ab' 'illo' 'inventore' 'veritatis' 'et' 'quasi' 'architecto' 'sit' 'amet,' 'consectetur' 'adipisicing' 'elit,' 'sed' 'do' 'eiusmod' 'tempor' 'incididunt' 'ut' 'labore' 'et' 'dolore' 'magna' 'aliqua.' 'Ut' 'enim' 'ad' 'minim' 'veniam,' 'quis' 'nostrud' 'exercitation' 'ullamco' 'laboris' 'nisi' 'ut' 'aliquip' 'beatae' 'vitae' 'dicta' 'sunt' 'explicabo.' 'Nemo' 'enim' 'ipsam' 'voluptatem' 'quia' 'voluptas' 'sit' 'aspernatur' 'aut' 'odit' 'aut' 'fugit,' 'sed' 'quia' 'consequuntur' 'magni' 'dolores' 'eos' 'qui' 'ratione' 'voluptatem' 'sequi' 'nesciunt.' 'Neque' 'porro' 'quisquam' 'est,' 'qui' 'dolorem' 'ipsum' 'quia' 'dolor' 'sit' 'amet,' 'consectetur,' 'adipisci' 'velit,' 'sed' 'quia' 'non' 'quaerat' 'voluptatem.' 'Ut' 'enim' 'ad' 'minima' 'veniam,' 'quis' 'nostrum' 'exercitationem' 'ullam' 'corporis' 'suscipit' 'laboriosam,' 'nisi' 'ut' 'aliquid' 'ex' 'ea' 'commodi' 'consequatur?' 'Quis' 'autem' 'vel' 'eum' 'iure' 'reprehenderit' 'qui' 'in' 'ea' 'voluptate' 'velit' 'esse' 'quam' 'nihil' 'molestiae' 'consequatur,' 'vel' 'illum' 'qui' 'dolorem' 'eum' 'fugiat' 'quo' 'voluptas' 'nulla' 'pariatur?').
where objCollection is guaranteed to contain all elements in srcCollection. Note: In my application objCollection are actually complex objects containing these strings as identifiers with no duplicates.
I've been measuring and trying to optimize selecting all the objects in objCollection that are also in srcCollection. The times below are in milliseconds using [ 1000 timesRepeat: [ ... ] ] timeToRun
in Pharo 1.2 with Stack VM and Windows XP with 2Gb of memory. These are my attempts:
objCollection intersection: srcCollection
7537
7507
objCollection select: [: str | srcCollection includes: str ]
7471
7507
srcCollection collect: [: str | objCollection detect: [: obj | obj = str ] ]
4227
4323
Is there a faster way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
前两个做同样的事情:
Collection>>> #intersection:
的实现是self select: [:each | aCollection 包括:每个]
。收藏>> #intersection:
最终使用self anySatisfy: [:x | x = mySearchObj]
来完成其工作,即使用#do:
迭代集合。#detect:
最终会做同样的事情。我怀疑您所看到的差异并不是因为三者中的任何一个比另一个更有效,而是垃圾收集等事物的产物。
鉴于此,我会选择
#intersection:
因为它的语义清晰。它说的是你想要什么,而不是其他两个,你只能看到如何获得你想要的东西,并且必须推断/推断出意图。The first two do the same thing:
Collection >> #intersection:
's implementation isself select: [:each | aCollection includes: each]
.Collection >> #intersection:
ultimately usesself anySatisfy: [:x | x = mySearchObj ]
to do its job, which iterates through the collection using#do:
.#detect:
ultimately does the same thing.I suspect the difference you're seeing's not because any of the three is more efficient than the other, but rather a product of things like garbage collection.
Given that, I'd choose
#intersection:
because of its semantic clarity. It says what you want, rather than the other two, where you only see how you get what you want and have to infer/deduce the intent.如果您有能力将
asSet
添加到这两个集合中,您可能会做得更快。我从 1480 毫秒变成了 197 毫秒。If you can afford to add
asSet
to the two collections, you might do faster. I went from 1480 to 197 ms.两个集合都有重复项。
如果您想处理重复项,假设您有一个可用的总订单 <<另一种可能性是使用这种方法(成本 n1*log(n1) + n2*log(n2) + n1 + n2 而不是 n1*n2 进行朴素交集)
这也可以使用堆,并删除第一个元素,但速度很慢。
最后,如果您没有可用的总订单,您可以简单地使用 Bag,但仍会考虑重复项
Both collections have duplicates.
If you want to handle duplicates, assuming you have a total order available via < another possibility is to use this method (cost n1*log(n1) + n2*log(n2) + n1 + n2 instead of n1*n2 for naive intersection)
That would also work using a Heap, and removing first element, but slowly.
Finally, if you haven't a total order available, you can simply use a Bag, still accounting for duplicates