当前位置：文江博客话题详情

Scala：删除对象列表中的重复项

发布于 2024-09-27 05:49:53 字数 135 浏览 5 评论 0原文

我有一个对象列表 List[Object]，它们都是从同一个类实例化的。该类有一个必须是唯一的Object.property字段。迭代对象列表并删除具有相同属性的所有对象（但第一个）的最干净的方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往事风中埋 2024-10-04 05:49:53

list.groupBy(_.property).map(_._2.head)

说明： groupBy 方法接受将元素转换为用于分组的键的函数。 _.property 只是 elem: Object => 的简写elem.property （编译器生成一个唯一的名称，例如 x$1）。现在我们有了一个地图 Map[Property, List[Object]]。 Map[K,V] 扩展了 Traversable[(K,V)]。所以它可以像列表一样被遍历，但是元素是一个元组。这与 Java 的 Map#entrySet() 类似。 map 方法通过迭代每个元素并向其应用函数来创建新集合。在这种情况下，函数是 _._2.head ，它是 elem 的简写： (Property, List[Object]) => elem._2.head。 _2 只是 Tuple 的一个返回第二个元素的方法。第二个元素是 List[Object]，head 返回第一个元素

要使结果成为您想要的类型：

import collection.breakOut
val l2: List[Object] = list.groupBy(_.property).map(_._2.head)(breakOut)

简单地解释一下，map 实际上需要两个参数，一个函数和用于构造结果的对象。在第一个代码片段中，您看不到第二个值，因为它被标记为隐式，因此由编译器从范围内的预定义值列表中提供。结果通常是从映射的容器中获取的。这通常是一件好事。 List 上的映射将返回 List，Array 上的映射将返回 Array 等。但是在这种情况下，我们希望将我们想要的容器表示为结果。这就是使用breakOut方法的地方。它仅通过查看所需的结果类型来构造构建器（构建结果的东西）。它是一个泛型方法，编译器会推断出它的泛型类型，因为我们显式地将 l2 键入为 List[Object]，或者为了保留顺序（假设 Object#property 的类型为Property)：

list.foldRight((List[Object](), Set[Property]())) {
  case (o, cum@(objects, props)) => 
    if (props(o.property)) cum else (o :: objects, props + o.property))
}._1

foldRight 是一个接受初始结果的方法和一个接受元素并返回更新结果的函数。该方法迭代每个元素，根据将函数应用于每个元素来更新结果并返回最终结果。我们从右到左（而不是使用 foldLeft 从左到右），因为我们要在 objects 前面添加 - 这是 O(1)，但追加是 O(N) 。还要观察这里的良好样式，我们使用模式匹配来提取元素。

在这种情况下，初始结果是空列表和集合的一对（元组）。该列表是我们感兴趣的结果，该集合用于跟踪我们已经遇到的属性。在每次迭代中，我们检查集合 props 是否已经包含该属性（在 Scala 中，obj(x) 被转换为 obj.apply(x) 在Set中，方法apply是def apply(a: A): Boolean，即接受一个元素并返回true /。如果存在或不存在则为 false）。如果该属性存在（已经遇到），则按原样返回结果。否则，结果将更新为包含对象 (o ::objects) 并记录属性 (props + o.property)

更新：@andreypopp 想要一个通用方法：

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while (i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs)

使用：

scala> list.distinctBy(_.property)
res7: List[Obj] = List(Obj(1), Obj(2), Obj(3))

还要注意，由于我们使用构建器，这非常有效。如果您有非常大的列表，您可能需要使用可变的 HashSet 而不是常规集合并对性能进行基准测试。

list.groupBy(_.property).map(_._2.head)

Explanation: The groupBy method accepts a function that converts an element to a key for grouping. _.property is just shorthand for elem: Object => elem.property (the compiler generates a unique name, something like x$1). So now we have a map Map[Property, List[Object]]. A Map[K,V] extends Traversable[(K,V)]. So it can be traversed like a list, but elements are a tuple. This is similar to Java's Map#entrySet(). The map method creates a new collection by iterating each element and applying a function to it. In this case the function is _._2.head which is shorthand for elem: (Property, List[Object]) => elem._2.head. _2 is just a method of Tuple that returns the second element. The second element is List[Object] and head returns the first element

To get the result to be a type you want:

import collection.breakOut
val l2: List[Object] = list.groupBy(_.property).map(_._2.head)(breakOut)

To explain briefly, map actually expects two arguments, a function and an object that is used to construct the result. In the first code snippet you don't see the second value because it is marked as implicit and so provided by the compiler from a list of predefined values in scope. The result is usually obtained from the mapped container. This is usually a good thing. map on List will return List, map on Array will return Array etc. In this case however, we want to express the container we want as result. This is where the breakOut method is used. It constructs a builder (the thing that builds results) by only looking at the desired result type. It is a generic method and the compiler infers its generic types because we explicitly typed l2 to be List[Object] or, to preserve order (assuming Object#property is of type Property):

list.foldRight((List[Object](), Set[Property]())) {
  case (o, cum@(objects, props)) => 
    if (props(o.property)) cum else (o :: objects, props + o.property))
}._1

foldRight is a method that accepts an initial result and a function that accepts an element and returns an updated result. The method iterates each element, updating the result according to applying the function to each element and returning the final result. We go from right to left (rather than left to right with foldLeft) because we are prepending to objects - this is O(1), but appending is O(N). Also observe the good styling here, we are using a pattern match to extract the elements.

In this case, the initial result is a pair (tuple) of an empty list and a set. The list is the result we're interested in and the set is used to keep track of what properties we already encountered. In each iteration we check if the set props already contains the property (in Scala, obj(x) is translated to obj.apply(x). In Set, the method apply is def apply(a: A): Boolean. That is, accepts an element and returns true / false if it exists or not). If the property exists (already encountered), the result is returned as-is. Otherwise the result is updated to contain the object (o :: objects) and the property is recorded (props + o.property)

Update: @andreypopp wanted a generic method:

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while (i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs)

to use:

scala> list.distinctBy(_.property)
res7: List[Obj] = List(Obj(1), Obj(2), Obj(3))

Also note that this is pretty efficient as we are using a builder. If you have really large lists, you may want to use a mutable HashSet instead of a regular set and benchmark the performance.

回复收藏 0 原文

轻许诺言 2024-10-04 05:49:53

从 Scala 2.13 开始，大多数集合现在都提供了 distinctBy 方法，该方法返回序列的所有元素，在应用给定转换后忽略重复项功能：

list.distinctBy(_.property)

例如：

List(("a", 2), ("b", 2), ("a", 5)).distinctBy(_._1) // List((a,2), (b,2))
List(("a", 2.7), ("b", 2.1), ("a", 5.4)).distinctBy(_._2.floor) // List((a,2.7), (a,5.4))

Starting Scala 2.13, most collections are now provided with a distinctBy method which returns all elements of the sequence ignoring the duplicates after applying a given transforming function:

list.distinctBy(_.property)

For instance:

List(("a", 2), ("b", 2), ("a", 5)).distinctBy(_._1) // List((a,2), (b,2))
List(("a", 2.7), ("b", 2.1), ("a", 5.4)).distinctBy(_._2.floor) // List((a,2.7), (a,5.4))

回复收藏 0 原文

过期情话 2024-10-04 05:49:53

这是一个有点偷偷摸摸但快速的保留顺序的解决方案：

list.filterNot{ var set = Set[Property]()
    obj => val b = set(obj.property); set += obj.property; b}

虽然它在内部使用 var，但我认为它比 FoldLeft 解决方案更容易理解和阅读。

Here is a little bit sneaky but fast solution that preserves order:

list.filterNot{ var set = Set[Property]()
    obj => val b = set(obj.property); set += obj.property; b}

Although it uses internally a var, I think it is easier to understand and to read than the foldLeft-solution.

回复收藏 0 原文

听不够的曲调 2024-10-04 05:49:53

上面有很多很好的答案。然而，distinctBy 已经在 Scala 中，但在一个不太明显的地方。也许你可以像这样使用它

def distinctBy[A, B](xs: List[A])(f: A => B): List[A] =
  scala.reflect.internal.util.Collections.distinctBy(xs)(f)

A lot of good answers above. However, distinctBy is already in Scala, but in a not-so-obvious place. Perhaps you can use it like

def distinctBy[A, B](xs: List[A])(f: A => B): List[A] =
  scala.reflect.internal.util.Collections.distinctBy(xs)(f)

回复收藏 0 原文

二货你真萌 2024-10-04 05:49:53

保留订单：

def distinctBy[L, E](list: List[L])(f: L => E): List[L] =
  list.foldLeft((Vector.empty[L], Set.empty[E])) {
    case ((acc, set), item) =>
      val key = f(item)
      if (set.contains(key)) (acc, set)
      else (acc :+ item, set + key)
  }._1.toList

distinctBy(list)(_.property)

With preserve order:

def distinctBy[L, E](list: List[L])(f: L => E): List[L] =
  list.foldLeft((Vector.empty[L], Set.empty[E])) {
    case ((acc, set), item) =>
      val key = f(item)
      if (set.contains(key)) (acc, set)
      else (acc :+ item, set + key)
  }._1.toList

distinctBy(list)(_.property)

回复收藏 0 原文

少女的英雄梦 2024-10-04 05:49:53

另一种解决方案

@tailrec
def collectUnique(l: List[Object], s: Set[Property], u: List[Object]): List[Object] = l match {
  case Nil => u.reverse
  case (h :: t) => 
    if (s(h.property)) collectUnique(t, s, u) else collectUnique(t, s + h.prop, h :: u)
}

One more solution

@tailrec
def collectUnique(l: List[Object], s: Set[Property], u: List[Object]): List[Object] = l match {
  case Nil => u.reverse
  case (h :: t) => 
    if (s(h.property)) collectUnique(t, s, u) else collectUnique(t, s + h.prop, h :: u)
}

回复收藏 0 原文

穿越时光隧道 2024-10-04 05:49:53

我找到了一种让它与 groupBy 一起使用的方法，只需一个中间步骤：

def distinctBy[T, P, From[X] <: TraversableLike[X, From[X]]](collection: From[T])(property: T => P): From[T] = {
  val uniqueValues: Set[T] = collection.groupBy(property).map(_._2.head)(breakOut)
  collection.filter(uniqueValues)
}

像这样使用它：

scala> distinctBy(List(redVolvo, bluePrius, redLeon))(_.color)
res0: List[Car] = List(redVolvo, bluePrius)

与 IttayD 的第一个解决方案类似，但它根据唯一值集过滤原始集合。如果我的期望是正确的，这将执行三次遍历：一次用于 groupBy，一次用于 map，一次用于 filter。它保持原始集合的顺序，但不一定采用每个属性的第一个值。例如，它可能会返回 List(bluePrius, redLeon)。

当然，IttayD 的解决方案仍然更快，因为它只进行一次遍历。

我的解决方案还有一个缺点，如果集合中有实际上相同的 Car，则两者都将出现在输出列表中。可以通过删除 filter 并直接返回 uniqueValues（类型为 From[T]）来解决此问题。但是，似乎 CanBuildFrom[Map[P, From[T]], T, From[T]] 不存在...欢迎建议！

I found a way to make it work with groupBy, with one intermediary step:

def distinctBy[T, P, From[X] <: TraversableLike[X, From[X]]](collection: From[T])(property: T => P): From[T] = {
  val uniqueValues: Set[T] = collection.groupBy(property).map(_._2.head)(breakOut)
  collection.filter(uniqueValues)
}

Use it like this:

scala> distinctBy(List(redVolvo, bluePrius, redLeon))(_.color)
res0: List[Car] = List(redVolvo, bluePrius)

Similar to IttayD's first solution, but it filters the original collection based on the set of unique values. If my expectations are correct, this does three traversals: one for groupBy, one for map and one for filter. It maintains the ordering of the original collection, but does not necessarily take the first value for each property. For example, it could have returned List(bluePrius, redLeon) instead.

Of course, IttayD's solution is still faster since it does only one traversal.

My solution also has the disadvantage that, if the collection has Cars that are actually the same, both will be in the output list. This could be fixed by removing filter and returning uniqueValues directly, with type From[T]. However, it seems like CanBuildFrom[Map[P, From[T]], T, From[T]] does not exist... suggestions are welcome!

回复收藏 0 原文

沒落の蓅哖 2024-10-04 05:49:53

通过集合和从记录到键的函数，可以生成按键区分的记录列表。目前尚不清楚 groupBy 是否会保留原始集合中的顺序。它甚至可能取决于集合的类型。我猜测 head 或 last 将始终产生最早的元素。

collection.groupBy(keyFunction).values.map(_.head)

Scala 什么时候会获得 nubBy ？它已经在 Haskell 中存在了几十年了。

With a collection and a function from a record to a key this yields a list of records distinct by key. It's not clear whether groupBy will preserve the order in the original collection. It may even depend on the type of collection. I'm guessing either head or last will consistently yield the earliest element.

collection.groupBy(keyFunction).values.map(_.head)

When will Scala get a nubBy? It's been in Haskell for decades.

回复收藏 0 原文

一杆小烟枪 2024-10-04 05:49:53

如果您想删除重复项并保留列表的顺序，您可以尝试以下两个行：

val tmpUniqueList = scala.collection.mutable.Set[String]()
val myUniqueObjects = for(o <- myObjects if tmpUniqueList.add(o.property)) yield o

If you want to remove duplicates and preserve the order of the list you can try this two liner:

val tmpUniqueList = scala.collection.mutable.Set[String]()
val myUniqueObjects = for(o <- myObjects if tmpUniqueList.add(o.property)) yield o

回复收藏 0 原文

蔚蓝源自深海 2024-10-04 05:49:53

这完全是 @IttayD 答案的翻版，但不幸的是我没有足够的声誉来发表评论。
您可以简单地创建一个隐式类，而不是创建一个隐式函数来转换迭代器：

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

implicit class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while (i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

this is entirely a rip of @IttayD 's answer, but unfortunately I don't have enough reputation to comment.
Rather than creating an implicit function to convert your iteratble, you can simply create an implicit class:

import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom

implicit class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
  def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
    val builder = cbf(xs.repr)
    val i = xs.iterator
    var set = Set[B]()
    while (i.hasNext) {
      val o = i.next
      val b = f(o)
      if (!set(b)) {
        set += b
        builder += o
      }
    }
    builder.result
  }
}

回复收藏 0 原文

~没有更多了~