两张地图之间的差异

发布于 2024-09-12 14:51:51 字数 470 浏览 3 评论 0原文

我需要非常有效地比较 Clojure/Java 中的两个映射，并返回由 Java 的 .equals(..) 确定的差异，其中 nil/null 相当于“不存在”。

即我正在寻找最有效的方法来编写这样的函数：

(map-difference
  {:a 1, :b nil, :c 2, :d 3}
  {:a 1, :b "Hidden", :c 3, :e 5})

=> {:b nil, :c 2, :d 3, :e nil}

我更喜欢使用不可变的 Clojure 映射作为输出，但如果性能改进显着，Java 映射也可以。

就其价值而言，我的基本测试用例/行为期望是，对于任何两个映射 a 和 b，以下内容将相等（直到 null =“不存在”的等价性）：

a 
(merge b (difference a b))

实现此功能的最佳方法是什么？

原文

I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java's .equals(..), with nil/null equivalent to "not present".

i.e. I am looking for the most efficient way to a write a function like:

(map-difference
  {:a 1, :b nil, :c 2, :d 3}
  {:a 1, :b "Hidden", :c 3, :e 5})

=> {:b nil, :c 2, :d 3, :e nil}

I'd prefer an immutable Clojure map as output, but a Java map would also be fine if the performance improvement would be significant.

For what it's worth, my basic test case / expectation of behaviour is that the following will be equal (up to the equivalence of null = "Not present") for any two maps a and b:

a 
(merge b (difference a b))

What would be the best way to implement this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆流 2024-09-19 14:51:52

在 Java 中，Google Commons Collections 提供了相当高性能解决方案。

回复收藏 0 原文

眼前雾蒙蒙 2024-09-19 14:51:52

Clojure 映射会很好，因为读取 clojure 映射非常快。
我无法回答你，但我可以给你一些东西让你看看。 Brenton Ashworth 制作了一个测试工具，他通过地图比较解决了问题。也许你可以看看他的代码来获得实现的提示。 http://formpluslogic.blogspot.com/ 2010/07/better-clojure-test-results-with-deview.html
和
http://github.com/brentonashworth/deview
我认为没有更好的实现可以比较键并查找是否不同。

回复收藏 0 原文

神经大条 2024-09-19 14:51:52

使用内置集合 API：

Set<Map.Entry<K,V>> difference = a.entrySet().removeAll(b.entrySet());

如果需要将其转换回地图，则必须进行迭代。在这种情况下，我建议：

Map<K,V> result = new HashMap<K,V>(Math.max(a.size()), b.size()));
Set<Map.Entry<K,V>> filter = b.entrySet();
for( Map.Entry<K,V> entry : a.entrySet ) {
    if( !filter.contains( entry ) {
        result.put(entry.getKey(), entry.getValue());
    }
}

Use the built-in collections API:

Set<Map.Entry<K,V>> difference = a.entrySet().removeAll(b.entrySet());

If you need to convert that back into a map, you must iterate. In that case, I suggest:

Map<K,V> result = new HashMap<K,V>(Math.max(a.size()), b.size()));
Set<Map.Entry<K,V>> filter = b.entrySet();
for( Map.Entry<K,V> entry : a.entrySet ) {
    if( !filter.contains( entry ) {
        result.put(entry.getKey(), entry.getValue());
    }
}

回复收藏 0 原文

月亮坠入山谷 2024-09-19 14:51:52

我不确定它的性能，

(defn map-difference
  [orig other]
  (let [changed (set/difference (set orig) (set other))
        added (set/difference (set (keys other)) (set (keys orig)))]
    (reduce (fn [acc key]
              (assoc acc key :missing))
      (into {} changed)
      added)))

我使用 :missing 键来避免原始映射中的 nil 值和丢失键之间的歧义 - 您当然可以更改它到nil。

I am not sure about its performance

(defn map-difference
  [orig other]
  (let [changed (set/difference (set orig) (set other))
        added (set/difference (set (keys other)) (set (keys orig)))]
    (reduce (fn [acc key]
              (assoc acc key :missing))
      (into {} changed)
      added)))

I used :missing key to avoid ambiguity between a nil value in the original map, and a missing key -- you can of course change it to nil.

回复收藏 0 原文

夏夜暖风 2024-09-19 14:51:52

您也可以使用 Maps.difference(..) 来自 Google 的 Guava 库的方法

回复收藏 0 原文

蓝礼 2024-09-19 14:51:52

怎么样...

(defn map-diff [m1 m2]
  ;; m1: hashmap
  ;; m2: hashmap
  ;; => the difference between them
  (reduce merge
          (map #(hash-map % (- (or (% m1) 0) (or (% m2) 0)))
               (keys (merge m1 m2)))))

What about...

(defn map-diff [m1 m2]
  ;; m1: hashmap
  ;; m2: hashmap
  ;; => the difference between them
  (reduce merge
          (map #(hash-map % (- (or (% m1) 0) (or (% m2) 0)))
               (keys (merge m1 m2)))))

回复收藏 0 原文

流星番茄 2024-09-19 14:51:51

我不确定绝对最有效的方法是什么，但这里有一些可能有用的事情：

问题文本的基本行为期望是不可能的：如果 a 和 b 是映射，使得 b 包含至少一个 a 中不存在的键，(merge b) 不能等于 a。

如果您最终选择了互操作解决方案，但随后需要在某个时候返回到 PersistentHashMap，那么总会有

(clojure.lang.PersistentHashMap/create
 （doto（java.util.HashMap。）
   (.put:foo 1)
   (.put :栏 2)))
; => {:foo 1 :bar 2}

如果您需要传递以下键集： Clojure 映射到 Java 方法，您可以使用
```
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
```
如果保证涉及的所有键都Comparable，则可以是用于在具有多个键的地图上有效计算差异（排序和合并扫描）。对于不受约束的键，这当然是不行的，对于小地图，它实际上可能会损害性能。

如果只是为了设定基准性能预期，最好有一个用 Clojure 编写的版本。这是一个：（已更新）

(defn 地图差异 [m1 m2]
        (循环[m(瞬态{})
               ks（concat（键m1）（键m2））]
          (if-let [k (第一个 ks)]
            (让 [e1 (找到 m1 k)
                  e2（求 m2 k）]
              (cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc!mk (e1 1)) (next ks))
                    (不是 e1) (recur (assoc!mk (e2 1)) (下一个 ks))
                    (不是 e2) (recur (assoc!mk (e1 1)) (下一个 ks))
                    :else (重复 m (下一个 ks))))
            （坚持！m））））

我认为，在大多数情况下，仅执行 (concat (keys m1) (keys m2)) 并可能重复一些工作可能比检查给定键是否在“另一个映射”中更有效“每一步也是如此。

为了总结答案，这里有一个非常简单的基于集合的版本，它有一个很好的属性，它说明了它的作用——如果我误解了规范，在这里应该很明显。 :-)

(defn map-difference [m1 m2]
  (let [ks1 (set (keys m1))
        ks2 (set (keys m2))
        ks1-ks2 (set/difference ks1 ks2)
        ks2-ks1 (set/difference ks2 ks1)
        ks1*ks2 (set/intersection ks1 ks2)]
    (merge (select-keys m1 ks1-ks2)
           (select-keys m2 ks2-ks1)
           (select-keys m1
                        (remove (fn [k] (= (m1 k) (m2 k)))
                                ks1*ks2)))))

I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:

The basic expectation of behaviour from the question text is impossible: if a and b are maps such that b contains at least one key not present in a, (merge b <sth>) cannot be equal to a.

If you end up going with an interop solution but then need to go back to a PersistentHashMap at some point, there's always

(clojure.lang.PersistentHashMap/create
 (doto (java.util.HashMap.)
   (.put :foo 1)
   (.put :bar 2)))
; => {:foo 1 :bar 2}

If you need to pass the keyset of a Clojure map to a Java method, you can use
```
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
```
If all keys involved are guaranteed to be Comparable, this could be exploited for efficient computation of difference on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.

It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)

(defn map-difference [m1 m2]
        (loop [m (transient {})
               ks (concat (keys m1) (keys m2))]
          (if-let [k (first ks)]
            (let [e1 (find m1 k)
                  e2 (find m2 k)]
              (cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc! m k (e1 1)) (next ks))
                    (not e1) (recur (assoc! m k (e2 1)) (next ks))
                    (not e2) (recur (assoc! m k (e1 1)) (next ks))
                    :else    (recur m (next ks))))
            (persistent! m))))

I think that just doing (concat (keys m1) (keys m2)) and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.

To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)

(defn map-difference [m1 m2]
  (let [ks1 (set (keys m1))
        ks2 (set (keys m2))
        ks1-ks2 (set/difference ks1 ks2)
        ks2-ks1 (set/difference ks2 ks1)
        ks1*ks2 (set/intersection ks1 ks2)]
    (merge (select-keys m1 ks1-ks2)
           (select-keys m2 ks2-ks1)
           (select-keys m1
                        (remove (fn [k] (= (m1 k) (m2 k)))
                                ks1*ks2)))))

回复收藏 0 原文

~没有更多了~