Scala:删除对象列表中的重复项
我有一个对象列表 List[Object]
,它们都是从同一个类实例化的。该类有一个必须是唯一的Object.property
字段。迭代对象列表并删除具有相同属性的所有对象(但第一个)的最干净的方法是什么?
I've got a list of objects List[Object]
which are all instantiated from the same class. This class has a field which must be unique Object.property
. What is the cleanest way to iterate the list of objects and remove all objects(but the first) with the same property?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
说明: groupBy 方法接受将元素转换为用于分组的键的函数。
_.property
只是elem: Object => 的简写elem.property
(编译器生成一个唯一的名称,例如x$1
)。现在我们有了一个地图Map[Property, List[Object]]
。Map[K,V]
扩展了Traversable[(K,V)]
。所以它可以像列表一样被遍历,但是元素是一个元组。这与 Java 的Map#entrySet()
类似。 map 方法通过迭代每个元素并向其应用函数来创建新集合。在这种情况下,函数是_._2.head
,它是elem 的简写: (Property, List[Object]) => elem._2.head
。_2
只是 Tuple 的一个返回第二个元素的方法。第二个元素是 List[Object],head
返回第一个元素要使结果成为您想要的类型:
简单地解释一下,
map
实际上需要两个参数,一个函数和用于构造结果的对象。在第一个代码片段中,您看不到第二个值,因为它被标记为隐式,因此由编译器从范围内的预定义值列表中提供。结果通常是从映射的容器中获取的。这通常是一件好事。 List 上的映射将返回 List,Array 上的映射将返回 Array 等。但是在这种情况下,我们希望将我们想要的容器表示为结果。这就是使用breakOut方法的地方。它仅通过查看所需的结果类型来构造构建器(构建结果的东西)。它是一个泛型方法,编译器会推断出它的泛型类型,因为我们显式地将 l2 键入为List[Object]
,或者为了保留顺序(假设Object#property
的类型为Property
):foldRight
是一个接受初始结果的方法和一个接受元素并返回更新结果的函数。该方法迭代每个元素,根据将函数应用于每个元素来更新结果并返回最终结果。我们从右到左(而不是使用foldLeft
从左到右),因为我们要在objects
前面添加 - 这是 O(1),但追加是 O(N) 。还要观察这里的良好样式,我们使用模式匹配来提取元素。在这种情况下,初始结果是空列表和集合的一对(元组)。该列表是我们感兴趣的结果,该集合用于跟踪我们已经遇到的属性。在每次迭代中,我们检查集合
props
是否已经包含该属性(在 Scala 中,obj(x)
被转换为obj.apply(x)
在Set
中,方法apply
是def apply(a: A): Boolean
,即接受一个元素并返回true /。如果存在或不存在则为 false)。如果该属性存在(已经遇到),则按原样返回结果。否则,结果将更新为包含对象 (o ::objects
) 并记录属性 (props + o.property
)更新:@andreypopp 想要一个通用方法:
使用:
还要注意,由于我们使用构建器,这非常有效。如果您有非常大的列表,您可能需要使用可变的 HashSet 而不是常规集合并对性能进行基准测试。
Explanation: The groupBy method accepts a function that converts an element to a key for grouping.
_.property
is just shorthand forelem: Object => elem.property
(the compiler generates a unique name, something likex$1
). So now we have a mapMap[Property, List[Object]]
. AMap[K,V]
extendsTraversable[(K,V)]
. So it can be traversed like a list, but elements are a tuple. This is similar to Java'sMap#entrySet()
. The map method creates a new collection by iterating each element and applying a function to it. In this case the function is_._2.head
which is shorthand forelem: (Property, List[Object]) => elem._2.head
._2
is just a method of Tuple that returns the second element. The second element is List[Object] andhead
returns the first elementTo get the result to be a type you want:
To explain briefly,
map
actually expects two arguments, a function and an object that is used to construct the result. In the first code snippet you don't see the second value because it is marked as implicit and so provided by the compiler from a list of predefined values in scope. The result is usually obtained from the mapped container. This is usually a good thing. map on List will return List, map on Array will return Array etc. In this case however, we want to express the container we want as result. This is where the breakOut method is used. It constructs a builder (the thing that builds results) by only looking at the desired result type. It is a generic method and the compiler infers its generic types because we explicitly typed l2 to beList[Object]
or, to preserve order (assumingObject#property
is of typeProperty
):foldRight
is a method that accepts an initial result and a function that accepts an element and returns an updated result. The method iterates each element, updating the result according to applying the function to each element and returning the final result. We go from right to left (rather than left to right withfoldLeft
) because we are prepending toobjects
- this is O(1), but appending is O(N). Also observe the good styling here, we are using a pattern match to extract the elements.In this case, the initial result is a pair (tuple) of an empty list and a set. The list is the result we're interested in and the set is used to keep track of what properties we already encountered. In each iteration we check if the set
props
already contains the property (in Scala,obj(x)
is translated toobj.apply(x)
. InSet
, the methodapply
isdef apply(a: A): Boolean
. That is, accepts an element and returns true / false if it exists or not). If the property exists (already encountered), the result is returned as-is. Otherwise the result is updated to contain the object (o :: objects
) and the property is recorded (props + o.property
)Update: @andreypopp wanted a generic method:
to use:
Also note that this is pretty efficient as we are using a builder. If you have really large lists, you may want to use a mutable HashSet instead of a regular set and benchmark the performance.
从 Scala 2.13 开始,大多数集合现在都提供了
distinctBy
方法,该方法返回序列的所有元素,在应用给定转换后忽略重复项功能:例如:
Starting
Scala 2.13
, most collections are now provided with adistinctBy
method which returns all elements of the sequence ignoring the duplicates after applying a given transforming function:For instance:
这是一个有点偷偷摸摸但快速的保留顺序的解决方案:
虽然它在内部使用 var,但我认为它比 FoldLeft 解决方案更容易理解和阅读。
Here is a little bit sneaky but fast solution that preserves order:
Although it uses internally a var, I think it is easier to understand and to read than the foldLeft-solution.
上面有很多很好的答案。然而,
distinctBy
已经在 Scala 中,但在一个不太明显的地方。也许你可以像这样使用它A lot of good answers above. However,
distinctBy
is already in Scala, but in a not-so-obvious place. Perhaps you can use it like保留订单:
With preserve order:
另一种解决方案
One more solution
我找到了一种让它与 groupBy 一起使用的方法,只需一个中间步骤:
像这样使用它:
与 IttayD 的第一个解决方案类似,但它根据唯一值集过滤原始集合。如果我的期望是正确的,这将执行三次遍历:一次用于
groupBy
,一次用于map
,一次用于filter
。它保持原始集合的顺序,但不一定采用每个属性的第一个值。例如,它可能会返回List(bluePrius, redLeon)
。当然,IttayD 的解决方案仍然更快,因为它只进行一次遍历。
我的解决方案还有一个缺点,如果集合中有实际上相同的
Car
,则两者都将出现在输出列表中。可以通过删除filter
并直接返回uniqueValues
(类型为From[T]
)来解决此问题。但是,似乎CanBuildFrom[Map[P, From[T]], T, From[T]]
不存在...欢迎建议!I found a way to make it work with groupBy, with one intermediary step:
Use it like this:
Similar to IttayD's first solution, but it filters the original collection based on the set of unique values. If my expectations are correct, this does three traversals: one for
groupBy
, one formap
and one forfilter
. It maintains the ordering of the original collection, but does not necessarily take the first value for each property. For example, it could have returnedList(bluePrius, redLeon)
instead.Of course, IttayD's solution is still faster since it does only one traversal.
My solution also has the disadvantage that, if the collection has
Car
s that are actually the same, both will be in the output list. This could be fixed by removingfilter
and returninguniqueValues
directly, with typeFrom[T]
. However, it seems likeCanBuildFrom[Map[P, From[T]], T, From[T]]
does not exist... suggestions are welcome!通过集合和从记录到键的函数,可以生成按键区分的记录列表。目前尚不清楚 groupBy 是否会保留原始集合中的顺序。它甚至可能取决于集合的类型。我猜测
head
或last
将始终产生最早的元素。Scala 什么时候会获得
nubBy
?它已经在 Haskell 中存在了几十年了。With a collection and a function from a record to a key this yields a list of records distinct by key. It's not clear whether groupBy will preserve the order in the original collection. It may even depend on the type of collection. I'm guessing either
head
orlast
will consistently yield the earliest element.When will Scala get a
nubBy
? It's been in Haskell for decades.如果您想删除重复项并保留列表的顺序,您可以尝试以下两个行:
If you want to remove duplicates and preserve the order of the list you can try this two liner:
这完全是 @IttayD 答案的翻版,但不幸的是我没有足够的声誉来发表评论。
您可以简单地创建一个隐式类,而不是创建一个隐式函数来转换迭代器:
this is entirely a rip of @IttayD 's answer, but unfortunately I don't have enough reputation to comment.
Rather than creating an implicit function to convert your iteratble, you can simply create an implicit class: