令A
和B
为两个集合。我正在寻找真正快速或优雅的方法来计算集合差异(A - B
或 A \B
,具体取决于您的偏好)他们之间。正如标题所示,这两个集合作为 Javascript 数组进行存储和操作。
注意:
- Gecko 特定的技巧还可以,
- 我更喜欢坚持使用本机函数(但如果速度更快,我愿意使用轻量级库)
- 我已经看到过,但没有测试过 JS.Set (参见上一点)
编辑: 我注意到有关包含重复元素的集合的评论。当我说“集合”时,我指的是数学定义,这意味着(除其他外)它们不包含重复元素。
Let A
and B
be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B
or A \B
, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.
Notes:
- Gecko-specific tricks are okay
- I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
- I've seen, but not tested, JS.Set (see previous point)
Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.
发布评论
评论(15)
我不知道这是否最有效,但也许是最短的:
更新到ES6:
I don't know if this is most effective, but perhaps the shortest:
Updated to ES6:
好吧,7 年后,有了 ES6 的 Set 对象相当简单(但仍然不如 python
A - B),据报道对于大型数组比
indexOf
更快:Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as python's
A - B
), and reportedly faster thanindexOf
for large arrays:看看这些解决方案中的许多,它们对于小情况来说效果很好。但是,当你将它们增加到一百万个项目时,时间复杂度开始变得愚蠢。
这看起来像是一个 O(N^2) 的解决方案。既然有一个 O(N) 解决方案,让我们使用它,如果您的 JS 运行时不是最新的,您可以轻松修改为不成为生成器。
虽然这比许多其他解决方案要复杂一些,但当您有大型列表时,这会快得多。
让我们快速看一下性能差异,在 0...10,000 之间的一组 1,000,000 个随机整数上运行,我们会看到以下性能结果。
Looking at a lof of these solutions, they do fine for small cases. But, when you blow them up to a million items, the time complexity starts getting silly.
That starts looking like an O(N^2) solution. Since there is an O(N) solution, let's use it, you can easily modify to not be a generator if you're not up to date on your JS runtime.
While this is a bit more complex than many of the other solutions, when you have large lists this will be far faster.
Let's take a quick look at the performance difference, running it on a set of 1,000,000 random integers between 0...10,000 we see the following performance results.
如果您使用
Set
,它可能非常简单且高性能:由于
Set
在底层使用哈希函数*,因此has
函数比indexOf
快得多(如果您有超过 100 个项目,这一点很重要)。If you're using
Set
s, it can be quite simple and performant:Since
Set
s use Hash functions* under the hood, thehas
function is much faster thanindexOf
(this matters if you have, say, more than 100 items).您可以使用对象作为映射,以避免线性扫描
B
中的每个元素A
,如 user187291 的答案:非标准
toSource()
方法用于获取唯一的属性名称;如果所有元素已经具有唯一的字符串表示形式(与数字的情况一样),您可以通过删除toSource()
调用来加快代码速度。You can use an object as a map to avoid linearly scanning
B
for each element ofA
as in user187291's answer:The non-standard
toSource()
method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping thetoSource()
invocations.使用 jQuery 最短的是:
The shortest, using jQuery, is:
一些简单的功能,借用@milan的答案:
用法:
Some simple functions, borrowing from @milan's answer:
Usage:
我会对数组 B 进行哈希处理,然后保留数组 A 中不存在于 B 中的值:
I would hash the array B, then keep values from the array A not present in B:
使用 Underscore.js (函数式 JS 库)
Using Underscore.js (Library for functional JS)
更新响应
截至 2024 年 6 月,ECMAScript TC39 set 方法提案 处于第 3 阶段(候选人)。
更新:截至 2024 年 7 月 6 日,该提案处于第 4 阶段(草案)。
Set.prototype.intersection(other )
Set.prototype.difference(other)
Set.prototype.symmetryDifference(其他)
Set.prototype.isSubsetOf(other)
Set.prototype.isSupersetOf(other)
Set.prototype .isDisjointFrom(其他)
截至 2024 年 6 月 11 日,所有主要浏览器均支持以下方法。
原始响应
下面的函数是找到的方法的端口在Python的
set()
类中遵循 TC39 Set 方法提案。Updated response
As of June 2024, the ECMAScript TC39 proposal for set methods is in Stage 3 (Candidate).
Update: as of July 6th, 2024, the proposal is in Stage 4 (Draft).
Set.prototype.intersection(other)
Set.prototype.union(other)
Set.prototype.difference(other)
Set.prototype.symmetricDifference(other)
Set.prototype.isSubsetOf(other)
Set.prototype.isSupersetOf(other)
Set.prototype.isDisjointFrom(other)
All major browsers now support the following methods as of June 11, 2024.
Original response
The function below are ports of the methods found in Python's
set()
class and follows the TC39 Set methods proposal.结合 Christoph 的想法,并假设对数组和对象/散列(
each
和朋友)使用一些非标准迭代方法,我们可以在大约 20 行的线性时间内获得集合差、并集和交集总计:假设为数组定义了
each
和filter
,并且我们有两个实用方法:myUtils.keys(hash)
:返回一个带有哈希键的数组
myUtils.select(hash, fnSelector,
:返回一个数组fnEvaluator)
调用
fnEvaluator
的结果在键/值对上
fnSelector
返回 true。select()
受到 Common Lisp 的松散启发,只是将filter()
和map()
合二为一。 (最好在Object.prototype
上定义它们,但这样做会对 jQuery 造成严重破坏,所以我选择了静态实用方法。)性能:测试
给出了两个包含 50,000 和 66,666 个元素的集合。对于这些值,AB 大约需要 75 毫秒,而并集和交集各大约需要 150 毫秒。 (Mac Safari 4.0,使用 Javascript Date 进行计时。)
我认为这对于 20 行代码来说是不错的回报。
Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (
each
and friends), we can get set difference, union and intersection in linear time in about 20 lines total:This assumes that
each
andfilter
are defined for arrays, and that we have two utility methods:myUtils.keys(hash)
: returns anarray with the keys of the hash
myUtils.select(hash, fnSelector,
: returns an array withfnEvaluator)
the results of calling
fnEvaluator
on the key/value pairs for which
fnSelector
returns true.The
select()
is loosely inspired by Common Lisp, and is merelyfilter()
andmap()
rolled into one. (It would be better to have them defined onObject.prototype
, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)Performance: Testing with
gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)
I think that's decent payoff for 20 lines of code.
至于禁食方式,这不是那么优雅,但我已经进行了一些测试来确定。将一个数组作为对象加载可以更快地进行大量处理:
结果:
但是,这仅适用于字符串。如果您打算比较编号集,您将需要使用 parseFloat 映射结果。
As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:
Results:
However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseFloat.
这个可行,但我认为另一个更短,也更优雅
This works, but I think another one is much more shorter, and elegant too
使用
Set.prototype.difference( )
:理论上时间复杂度应该是
θ(n)
,其中n
是B<中的元素数量/代码>。
您还可以使用 [
core-js
] 将其填充到旧环境中:Use
Set.prototype.difference()
:In theory, the time complexity should be
Θ(n)
, wheren
is the number of elements inB
.You can also polyfill this into older environments with [
core-js
]:@koblas 提供的答案很好,但也返回两个数组中的项目。对我想要获得差异的用例进行了轻微修改(在 ES6 中)(目的是检索 array_j 中的新项目以及 array_i 中的项目) /code> 不在
array j
中作为单独的输出数组,以下是提供的 3 种主要方法:答案应该是数组 j 中的新项目,如
['b' , 'e', 'i']
以及数组 i 中不在数组 j 中的项['k', 'l', 'n']
所有三个方法返回:
在随机数组上计时,包括包含 5000 个项目的 [0,10000] 范围内的数字
返回:
所以是的,只需使用:
Answer provided by @koblas is good but returns items that are in both arrays aswell. With a slight modification (in ES6) for my use case where I want to get the difference, (with the intention of retrieving new items in
array_j
, as well as the the items inarray_i
that are not inarray j
as separate output arrays, these are the 3 main ways provided to do this:The answers should be the new items in array j as
['b', 'e', 'i']
as well as the items in array i that are not in array j as['k', 'l', 'n']
All three methods return:
Timing these on random array including numbers from range [0,10000] containing 5000 items
Returns:
So yeah, just use: