subsetOf 与 forall contains
考虑我有:
case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result
虽然以下内容成立:
xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true
我面临:
xs forall(ys contains _) // true
xs.toSet subsetOf ys.toSet // false
为什么?我的意思是,很明显,从 Seq
中创建 Set
会选择随机元素以防重复,但由于“(. ...).distinct.sameElements(...)
"。
我当然需要更深入地了解这种相等性检查...
编辑:
经过长时间的搜索,我发现了问题并将其浓缩为以下内容:
我的元素不相同,但是我必须仔细看看为什么 distinct.sameElements
没有抱怨。但同时出现了一个新问题:
考虑一下:
val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)
// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [ 5], 1 -> [ 11], 2 -> [ 22], 3 -> [ 333])
println(m2) // Map(5 -> [ 55], 1 -> [ 11], 2 -> [ ], 3 -> [ ])
所以我的第一个序列中的元素并不相同,因为它们依赖于 m2 构造,因此每次访问它们时它们都是不同的。
我的新问题是,为什么 m2
的行为与 m1
相比就像一个函数,尽管两者都是不可变的映射。这对我来说并不直观。
Consider I have:
case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result
While the following holds:
xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true
I am facing:
xs forall(ys contains _) // true
xs.toSet subsetOf ys.toSet // false
Why? I mean, it´s clear that making a Set
out of a Seq
chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)
".
I certainly need a deeper understanding of the kind of equality check...
EDIT:
After a long search, I found the problem and condensed it to the following:
My elements are not the same, however I must take a closer look why distinct.sameElements
isn´t complaining. But meanwhile a new question arose:
Consider this:
val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)
// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [ 5], 1 -> [ 11], 2 -> [ 22], 3 -> [ 333])
println(m2) // Map(5 -> [ 55], 1 -> [ 11], 2 -> [ ], 3 -> [ ])
So my elements in my first to sequences aren´t the same because they depend on a m2
-construct and so each time a accessing them they are different.
My new question is, why does m2
behave like a function in contrast to m1
although both are immutable maps. That isn´t intuitively for me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该领域出现问题的最常见原因(测试集合相等性等)是
hashCode
与equals
不一致>hashCode
与当前的equals
不一致)原因是,这很重要,因为
distinct
和toSet
使用哈希码来构建集合,而contains
只是用exists
遍历集合:由于许多集合直到它们变得更大才开始使用哈希码,这使得情况变得更加复杂比某个最小尺寸(通常为 4)大,因此您在测试中并不总是注意到这一点。但让我们向自己证明一下:
好的,现在我们知道对于大多数这些操作来说,匹配
hashCode
和equals
是一件好事。警告:在 Java 中,即使对于基元,不匹配也经常发生:
但 Scala 现在至少捕获了这一点(希望每次都如此):
案例类也可以正确执行此操作,因此我们剩下三种可能性
equals
但不是hashCode
匹配第一个很容易。
第二个问题似乎是您的问题,它是由于
mapValues
实际上创建了原始集合的视图,而不是新集合这一事实引起的。 (filterKeys
也这样做。)就我个人而言,我认为这是一个有问题的设计选择,因为通常当您有一个视图并且想要创建它的单个具体实例时,您.强制它。但默认地图没有
.force
因为它们没有意识到它们可能是视图。因此,如果您正在执行诸如文件 IO 之类的操作来映射您的值(例如,如果您的值是文件名并且您要映射到其内容)并且您不想读取一遍又一遍地归档。
但是您的情况(分配随机值)是另一个重要的情况,即选择单个副本,而不是一遍又一遍地重新创建值。
The most common reasons for problems in this area--testing set equality and the like--are
hashCode
does not agree withequals
hashCode
does not agree with currentequals
)The reason is that this matters is that
distinct
andtoSet
use hash codes to build sets, whereascontains
simply runs over the collection with anexists
:This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:
Okay, so now we know that for most of these operations, matching up
hashCode
andequals
is a Good Thing.Warning: in Java, mismatches happens frequently even with primitives:
but Scala now at least catches that (hopefully every time):
Case classes also do this properly, so we're left with three possibilities
equals
but nothashCode
to matchThe first one is easy enough.
The second one seems to be your problem, and it arises from the fact that
mapValues
actually creates a view of the original collection, not a new collection. (filterKeys
does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you.force
it. But default maps don't have a.force
because they don't realize that they might be views. So you have to resort to things likeThis is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.
But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.