两张地图之间的差异
我需要非常有效地比较 Clojure/Java 中的两个映射,并返回由 Java 的 .equals(..) 确定的差异,其中 nil/null 相当于“不存在”。
即我正在寻找最有效的方法来编写这样的函数:
(map-difference
{:a 1, :b nil, :c 2, :d 3}
{:a 1, :b "Hidden", :c 3, :e 5})
=> {:b nil, :c 2, :d 3, :e nil}
我更喜欢使用不可变的 Clojure 映射作为输出,但如果性能改进显着,Java 映射也可以。
就其价值而言,我的基本测试用例/行为期望是,对于任何两个映射 a 和 b,以下内容将相等(直到 null =“不存在”的等价性):
a
(merge b (difference a b))
实现此功能的最佳方法是什么?
I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java's .equals(..), with nil/null equivalent to "not present".
i.e. I am looking for the most efficient way to a write a function like:
(map-difference
{:a 1, :b nil, :c 2, :d 3}
{:a 1, :b "Hidden", :c 3, :e 5})
=> {:b nil, :c 2, :d 3, :e nil}
I'd prefer an immutable Clojure map as output, but a Java map would also be fine if the performance improvement would be significant.
For what it's worth, my basic test case / expectation of behaviour is that the following will be equal (up to the equivalence of null = "Not present") for any two maps a and b:
a
(merge b (difference a b))
What would be the best way to implement this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
在 Java 中,Google Commons Collections 提供了相当高性能解决方案。
In Java, Google Commons Collections offer a quite performant solution.
Clojure 映射会很好,因为读取 clojure 映射非常快。
我无法回答你,但我可以给你一些东西让你看看。 Brenton Ashworth 制作了一个测试工具,他通过地图比较解决了问题。也许你可以看看他的代码来获得实现的提示。 http://formpluslogic.blogspot.com/ 2010/07/better-clojure-test-results-with-deview.html
和
http://github.com/brentonashworth/deview
我认为没有更好的实现可以比较键并查找是否不同。
Clojure maps will be fine because reading clojure maps is very fast.
I can't answer you but I can give you something to look at. Brenton Ashworth made a testtool where he solved the problem with map compares. Maybe you can look at his code to get hint for implementation. http://formpluslogic.blogspot.com/2010/07/better-clojure-test-results-with-deview.html
and
http://github.com/brentonashworth/deview
I don't think there is a better implementation that compare the keys and look up if the are different.
使用内置集合 API:
如果需要将其转换回地图,则必须进行迭代。在这种情况下,我建议:
Use the built-in collections API:
If you need to convert that back into a map, you must iterate. In that case, I suggest:
我不确定它的性能,
我使用
:missing
键来避免原始映射中的nil
值和丢失键之间的歧义 - 您当然可以更改它到nil
。I am not sure about its performance
I used
:missing
key to avoid ambiguity between anil
value in the original map, and a missing key -- you can of course change it tonil
.您也可以使用 Maps.difference(..) 来自 Google 的 Guava 库的方法
You could also just use Maps.difference(..) method from Google's Guava libraries
怎么样...
What about...
我不确定绝对最有效的方法是什么,但这里有一些可能有用的事情:
问题文本的基本行为期望是不可能的:如果
a
和b
是映射,使得b
包含至少一个a
中不存在的键,(merge b)
不能等于a
。如果您最终选择了互操作解决方案,但随后需要在某个时候返回到
PersistentHashMap
,那么总会有如果您需要传递以下键集: Clojure 映射到 Java 方法,您可以使用
如果保证涉及的所有键都
Comparable
,则可以是用于在具有多个键的地图上有效计算差异
(排序和合并扫描)。对于不受约束的键,这当然是不行的,对于小地图,它实际上可能会损害性能。如果只是为了设定基准性能预期,最好有一个用 Clojure 编写的版本。这是一个:(已更新)
我认为,在大多数情况下,仅执行
(concat (keys m1) (keys m2)) 并可能重复一些工作可能比检查给定键是否在“另一个映射”中更有效“每一步也是如此。
为了总结答案,这里有一个非常简单的基于集合的版本,它有一个很好的属性,它说明了它的作用——如果我误解了规范,在这里应该很明显。 :-)
I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:
The basic expectation of behaviour from the question text is impossible: if
a
andb
are maps such thatb
contains at least one key not present ina
,(merge b <sth>)
cannot be equal toa
.If you end up going with an interop solution but then need to go back to a
PersistentHashMap
at some point, there's alwaysIf you need to pass the keyset of a Clojure map to a Java method, you can use
If all keys involved are guaranteed to be
Comparable
, this could be exploited for efficient computation ofdifference
on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)
I think that just doing
(concat (keys m1) (keys m2))
and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)