subsetOf 与 forall contains

发布于 2024-10-26 11:51:55 字数 1556 浏览 2 评论 0原文

考虑我有：

case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result

虽然以下内容成立：

xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true

我面临：

xs forall(ys contains _)    // true
xs.toSet subsetOf ys.toSet  // false

为什么？我的意思是，很明显，从 Seq 中创建 Set 会选择随机元素以防重复，但由于“(. ...).distinct.sameElements(...)"。

我当然需要更深入地了解这种相等性检查...

编辑：

经过长时间的搜索，我发现了问题并将其浓缩为以下内容：

我的元素不相同，但是我必须仔细看看为什么 distinct.sameElements 没有抱怨。但同时出现了一个新问题：

考虑一下：

val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)

// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])

// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [   5], 1 -> [  11], 2 -> [  22], 3 -> [ 333])
println(m2) // Map(5 -> [  55], 1 -> [  11], 2 -> [    ], 3 -> [    ])

所以我的第一个序列中的元素并不相同，因为它们依赖于 m2 构造，因此每次访问它们时它们都是不同的。

我的新问题是，为什么 m2 的行为与 m1 相比就像一个函数，尽管两者都是不可变的映射。这对我来说并不直观。

原文

Consider I have:

case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result

While the following holds:

xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true

I am facing:

xs forall(ys contains _)    // true
xs.toSet subsetOf ys.toSet  // false

Why? I mean, it´s clear that making a Set out of a Seq chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)".

I certainly need a deeper understanding of the kind of equality check...

EDIT:

After a long search, I found the problem and condensed it to the following:

My elements are not the same, however I must take a closer look why distinct.sameElements isn´t complaining. But meanwhile a new question arose:

Consider this:

val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)

// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])

// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [   5], 1 -> [  11], 2 -> [  22], 3 -> [ 333])
println(m2) // Map(5 -> [  55], 1 -> [  11], 2 -> [    ], 3 -> [    ])

So my elements in my first to sequences aren´t the same because they depend on a m2-construct and so each time a accessing them they are different.

My new question is, why does m2 behave like a function in contrast to m1 although both are immutable maps. That isn´t intuitively for me.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

看春风乍起 2024-11-02 11:51:55

该领域出现问题的最常见原因（测试集合相等性等）是

hashCode 与 equals 不一致
您的值不稳定（因此之前的 >hashCode 与当前的 equals 不一致）

原因是，这很重要，因为 distinct 和 toSet 使用哈希码来构建集合，而 contains 只是用 exists 遍历集合：

xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )

由于许多集合直到它们变得更大才开始使用哈希码，这使得情况变得更加复杂比某个最小尺寸（通常为 4）大，因此您在测试中并不总是注意到这一点。但让我们向自己证明一下：

class Liar(s: String) {
  override def equals(o: Any) = o match {
    case l: Liar => s == l.s
    case _ => _
  }
  // No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length          // 8
lies.distinct.length // 8!
lies.toSet.size      // 8!
lies forall( truly_distinct contains _ )   // True, because it's true
lies.toSet subsetOf truly_distinct.toSet   // False--not even the same size!

好的，现在我们知道对于大多数这些操作来说，匹配 hashCode 和 equals 是一件好事。

警告：在 Java 中，即使对于基元，不匹配也经常发生：

new java.lang.Float(1.0) == new java.lang.Integer(1)                       // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh

但 Scala 现在至少捕获了这一点（希望每次都如此）：

(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).##   // Whew

案例类也可以正确执行此操作，因此我们剩下三种可能性

您覆盖了 equals但不是 hashCode 匹配
你的值不稳定
有一个错误，Java 包装的原始 hashCode 不匹配会回来咬你

第一个很容易。

第二个问题似乎是您的问题，它是由于 mapValues 实际上创建了原始集合的视图，而不是新集合这一事实引起的。（filterKeys 也这样做。）就我个人而言，我认为这是一个有问题的设计选择，因为通常当您有一个视图并且想要创建它的单个具体实例时，您 .强制它。但默认地图没有 .force 因为它们没有意识到它们可能是视图。因此，

myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)

如果您正在执行诸如文件 IO 之类的操作来映射您的值（例如，如果您的值是文件名并且您要映射到其内容）并且您不想读取一遍又一遍地归档。

但是您的情况（分配随机值）是另一个重要的情况，即选择单个副本，而不是一遍又一遍地重新创建值。

The most common reasons for problems in this area--testing set equality and the like--are

hashCode does not agree with equals
Your values are not stable (so previous hashCode does not agree with current equals)

The reason is that this matters is that distinct and toSet use hash codes to build sets, whereas contains simply runs over the collection with an exists:

xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )

This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:

class Liar(s: String) {
  override def equals(o: Any) = o match {
    case l: Liar => s == l.s
    case _ => _
  }
  // No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length          // 8
lies.distinct.length // 8!
lies.toSet.size      // 8!
lies forall( truly_distinct contains _ )   // True, because it's true
lies.toSet subsetOf truly_distinct.toSet   // False--not even the same size!

Okay, so now we know that for most of these operations, matching up hashCode and equals is a Good Thing.

Warning: in Java, mismatches happens frequently even with primitives:

new java.lang.Float(1.0) == new java.lang.Integer(1)                       // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh

but Scala now at least catches that (hopefully every time):

(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).##   // Whew

Case classes also do this properly, so we're left with three possibilities

You overrode equals but not hashCode to match
Your values are not stable
There is a bug and Java wrapped primitive hashCode mismatch is coming back to bite you

The first one is easy enough.

The second one seems to be your problem, and it arises from the fact that mapValues actually creates a view of the original collection, not a new collection. (filterKeys does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you .force it. But default maps don't have a .force because they don't realize that they might be views. So you have to resort to things like

myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)

This is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.

But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.

回复收藏 0 原文

~没有更多了~