无空“映射”：回调解决方案是否比 tryGet() 慢？

发布于 2024-08-30 01:22:53 字数 5454 浏览 2 评论 0原文

在对“如何在 null free 设计中实现 List、Set 和 Map？”，Steven Sudit 我进入了关于使用回调（带有“找到”和“未找到”情况的处理程序）与使用 tryGet() 方法（采用 out 参数并返回布尔值）的讨论指示 out 参数是否已填充。史蒂文坚持认为回调方法更复杂，而且几乎肯定会更慢；我认为复杂性并没有增加，而且性能在最坏的情况下也相同。

但代码胜于雄辩，所以我想我应该实现这两者并看看我得到了什么。最初的问题是关于语言的相当理论性的问题（“为了论证，让我们假设这种语言甚至没有 null”）——我在这里使用了 Java，因为这就是我所使用的。得心应手了。 Java 没有 out 参数，但它也没有一流的函数，因此就风格而言，这两种方法应该同样糟糕。

（题外话：就复杂性而言：我喜欢回调设计，因为它本质上迫使 API 用户处理这两种情况，而 tryGet() 设计要求调用者执行自己的样板条件检查，他们可能会忘记或出错。但是现在实现了这两个，我可以明白为什么 tryGet() 设计看起来更简单，至少在短期内是这样。

），回调示例：

class CallbackMap<K, V> {
    private final Map<K, V> backingMap;

    public CallbackMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    void lookup(K key, Callback<K, V> handler) {
        V val = backingMap.get(key);
        if (val == null) {
            handler.handleMissing(key);
        } else {
            handler.handleFound(key, val);
        }
    }
}

interface Callback<K, V> {
    void handleFound(K key, V value);
    void handleMissing(K key);
}

class CallbackExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;
    private Callback<String, String> handler;

    public CallbackExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
        handler = new Callback<String, String>() {
            public void handleFound(String key, String value) {
                found.add(key + ": " + value);
            }

            public void handleMissing(String key) {
                missing.add(key);
            }
        };
    }

    void test() {
        CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            cbMap.lookup(key, handler);
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

现在，tryGet() 示例 - 尽我所能理解该模式（我很可能是错误的）：

class TryGetMap<K, V> {
    private final Map<K, V> backingMap;

    public TryGetMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    boolean tryGet(K key, OutParameter<V> valueParam) {
        V val = backingMap.get(key);
        if (val == null) {
            return false;
        }
        valueParam.value = val;
        return true;
    }
}

class OutParameter<V> {
    V value;
}

class TryGetExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;

    private final OutParameter<String> out = new OutParameter<String>();

    public TryGetExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
    }

    void test() {
        TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);

        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            if (tgMap.tryGet(key, out)) {
                found.add(key + ": " + out.value);
            } else {
                missing.add(key);
            }
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

最后，性能测试代码：

public static void main(String[] args) {
    int size = 200000;
    Map<String, String> map = new HashMap<String, String>();
    for (int i = 0; i < size; i++) {
        String val = (i % 5 == 0) ? null : "value" + i;
        map.put("key" + i, val);
    }

    long totalCallback = 0;
    long totalTryGet = 0;

    int iterations = 20;
    for (int i = 0; i < iterations; i++) {
        {
            TryGetExample tryGet = new TryGetExample(map);
            long tryGetStart = System.currentTimeMillis();
            tryGet.test();
            totalTryGet += (System.currentTimeMillis() - tryGetStart);
        }
        System.gc();

        {
            CallbackExample callback = new CallbackExample(map);
            long callbackStart = System.currentTimeMillis();
            callback.test();
            totalCallback += (System.currentTimeMillis() - callbackStart);
        }
        System.gc();
    }

    System.out.println("Avg. callback: " + (totalCallback / iterations));
    System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}

在我的第一次尝试中，我回调的性能比 tryGet() 差 50%，这确实让我感到惊讶。但是，凭直觉，我添加了一些垃圾收集，性能损失就消失了。

这符合我的直觉，即我们基本上讨论的是采用相同数量的方法调用、条件检查等并重新排列它们。但后来，我编写了代码，所以我很可能编写了一个次优或潜意识惩罚的 tryGet() 实现。想法？

更新：根据 Michael Aaron Safyan 的评论，已修复 TryGetExample< /code> 重用 OutParameter。

原文

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same.

But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches.

(Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.)

First, the callback example:

class CallbackMap<K, V> {
    private final Map<K, V> backingMap;

    public CallbackMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    void lookup(K key, Callback<K, V> handler) {
        V val = backingMap.get(key);
        if (val == null) {
            handler.handleMissing(key);
        } else {
            handler.handleFound(key, val);
        }
    }
}

interface Callback<K, V> {
    void handleFound(K key, V value);
    void handleMissing(K key);
}

class CallbackExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;
    private Callback<String, String> handler;

    public CallbackExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
        handler = new Callback<String, String>() {
            public void handleFound(String key, String value) {
                found.add(key + ": " + value);
            }

            public void handleMissing(String key) {
                missing.add(key);
            }
        };
    }

    void test() {
        CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            cbMap.lookup(key, handler);
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong):

class TryGetMap<K, V> {
    private final Map<K, V> backingMap;

    public TryGetMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    boolean tryGet(K key, OutParameter<V> valueParam) {
        V val = backingMap.get(key);
        if (val == null) {
            return false;
        }
        valueParam.value = val;
        return true;
    }
}

class OutParameter<V> {
    V value;
}

class TryGetExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;

    private final OutParameter<String> out = new OutParameter<String>();

    public TryGetExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
    }

    void test() {
        TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);

        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            if (tgMap.tryGet(key, out)) {
                found.add(key + ": " + out.value);
            } else {
                missing.add(key);
            }
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

And finally, the performance test code:

public static void main(String[] args) {
    int size = 200000;
    Map<String, String> map = new HashMap<String, String>();
    for (int i = 0; i < size; i++) {
        String val = (i % 5 == 0) ? null : "value" + i;
        map.put("key" + i, val);
    }

    long totalCallback = 0;
    long totalTryGet = 0;

    int iterations = 20;
    for (int i = 0; i < iterations; i++) {
        {
            TryGetExample tryGet = new TryGetExample(map);
            long tryGetStart = System.currentTimeMillis();
            tryGet.test();
            totalTryGet += (System.currentTimeMillis() - tryGetStart);
        }
        System.gc();

        {
            CallbackExample callback = new CallbackExample(map);
            long callbackStart = System.currentTimeMillis();
            callback.test();
            totalCallback += (System.currentTimeMillis() - callbackStart);
        }
        System.gc();
    }

    System.out.println("Avg. callback: " + (totalCallback / iterations));
    System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}

On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished.

This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?

Updated: Per comment from Michael Aaron Safyan, fixed TryGetExample to reuse OutParameter.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

贱人配狗天长地久 2024-09-06 01:22:53

我想说，无论性能如何，这两种设计在实践中都没有意义。我认为这两种机制都过于复杂，更重要的是，没有考虑实际使用情况。

实际使用情况
如果用户在映射中查找某个值但该值不存在，则用户很可能需要执行以下操作之一：

将带有该键的某个值插入到映射中
获取某个默认值
被告知该值不存在因此

，我认为更好的、无 null 的 API 是：

has(key)，它指示密钥是否存在（如果只想检查密钥是否存在）。
get(key) 如果键存在则报告值；否则，抛出 NoSuchElementException。
get(key,defaultval) 报告键的值，如果键不存在则报告 defaultval。
setdefault(key,defaultval) 如果 key 不存在，则插入 (key,defaultval)，并返回与 key 关联的值（如果没有先前的映射，则为 defaultval，否则为先前的映射）。

返回 null 的唯一方法是像 get(key,null) 中那样明确请求它。这个 API 非常简单，但能够处理最常见的地图相关任务（在我遇到的大多数用例中）。

我还应该补充一点，在 Java 中，has() 将被称为 containsKey()，而 setdefault() 将被称为 putIfAbsent()。因为 get() 通过 NoSuchElementException 发出对象不存在的信号，所以可以将键与 null 关联并将其视为合法关联......如果 get() 返回 null，则意味着键已与值关联null，并不是说键不存在（尽管您可以将 API 定义为不允许使用 null 值（如果您愿意的话），在这种情况下，如果给定值为 null，您将从用于添加关联的函数中抛出 IllegalArgumentException）。该 API 的另一个优点是 setdefault() 只需执行一次查找过程，而不是两次，如果您使用 if( ! dict.has(key) ){ dict.set(key,val) 就会出现这种情况; }。另一个优点是，编写类似 dict.get(key).doSomething() 之类的开发人员不会感到惊讶，他们假设 get() 将始终返回非空对象（因为他们从未将空值插入到字典中） ... 相反，如果该键没有值，它们会得到 NoSuchElementException，这与 Java 中的其余错误检查更加一致，并且比 NullPointerException 更容易理解和调试。

回答问题
要回答原来的问题，是的，您不公平地惩罚了 tryGet 版本......在基于回调的机制中，您仅构造回调对象一次并在所有后续调用中使用它；而在您的 tryGet 示例中，您在每次迭代中构造输出参数对象。尝试使用该行：

OutParameter out = new OutParameter();

将上面的行从 for 循环中取出，看看是否可以提高 tryGet 示例的性能。换句话说，将该行放在 for 循环上方，并在每次迭代中重复使用 out 参数。

I would say that neither design makes sense in practice, regardless of the performance. I would argue that both mechanisms are overly complicated and, more importantly, don't take into account actual usage.

Actual Usage
If a user looks up a value in a map and it isn't there, most likely the user wants one of the following:

To insert some value with that key into the map
To get back some default value
To be informed that the value isn't there

Thus I would argue that a better, null-free API would be:

has(key) which indicates if the key is present (if one only wishes to check for the key's existence).
get(key) which reports the value if the key is present; otherwise, throws NoSuchElementException.
get(key,defaultval) which reports the value for the key, or defaultval if the key isn't present.
setdefault(key,defaultval) which inserts (key,defaultval) if key isn't present, and returns the value associated with key (which is defaultval if there is no previous mapping, otherwise prev mapping).

The only way to get back null is if you explicity ask for it as in get(key,null). This API is incredibly simple, and yet is able to handle the most common map-related tasks (in most use cases that I have encountered).

I should also add that in Java, has() would be called containsKey() while setdefault() would be called putIfAbsent(). Because get() signals an object's absence via a NoSuchElementException, it is then possible to associate a key with null and treat it as a legitimate association.... if get() returns null, it means the key has been associated with the value null, not that the key is absent (although you can define your API to disallow a value of null if you so choose, in which case you would throw an IllegalArgumentException from the functions that are used to add associations if the value given is null). Another advantage to this API, is that setdefault() only needs to perform the lookup procedure once instead of twice, which would be the case if you used if( ! dict.has(key) ){ dict.set(key,val); }. Another advantage is that you do not surprise developers who write something like dict.get(key).doSomething() who assume that get() will always return a non-null object (because they have never inserted a null value into the dictionary)... instead, they get a NoSuchElementException if there is no value for that key, which is more consistent with the rest of the error checking in Java and which is also a much easier to understand and debug than NullPointerException.

Answer To Question
To answer original question, yes, you are unfairly penalizing the tryGet version.... in your callback based mechanism you construct the callback object only once and use it in all subsequent calls; whereas in your tryGet example, you construct your out parameter object in every single iteration. Try taking the line:

OutParameter out = new OutParameter();

Take the line above out of the for-loop and see if that improves the performance of the tryGet example. In other words, place the line above the for-loop, and re-use the out parameter in each iteration.

回复收藏 0 原文

纸伞微斜 2024-09-06 01:22:53

大卫，感谢您花时间写这篇文章。我是一名 C# 程序员，所以现在我的 Java 技能有点模糊。因此，我决定移植您的代码并亲自测试。我发现了一些有趣的差异和相似之处，就我而言，这些都非常值得入场费。主要区别包括：

我不必实现 TryGet，因为它内置于 Dictionary 中。
为了使用本机 TryGet，我没有插入空值来模拟未命中，而是省略了这些值。这仍然意味着 v = map[k] 会将 v 设置为 null，所以我认为这是一个正确的移植。事后看来，我可以插入空值并将 (_map.TryGetValue(key, out value)) 更改为 (_map.TryGetValue(key, out value) && value != null))，但我很高兴我没有。
我想要非常公平。因此，为了使代码尽可能紧凑和可维护，我使用了 lambda 演算表示法，这让我可以轻松地定义回调。这隐藏了设置匿名委托的大部分复杂性，并允许我无缝地使用闭包。讽刺的是，Lookup 的实现在内部使用了 TryGet。
我没有声明新类型的 Dictionary，而是使用扩展方法将 Lookup 移植到标准字典上，大大简化了代码。

对于代码质量不够专业的情况，我深表歉意，如下：

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication1
{
    static class CallbackDictionary
    {
        public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
        {
            V v;
            if (map.TryGetValue(key, out v))
                found(key, v);
            else
                missed(key);
        }
    }

    class TryGetExample
    {
        private Dictionary<string, string> _map;
        private List<string> _found;
        private List<string> _missing;

        public TryGetExample(Dictionary<string, string> map)
        {
            _map = map;
            _found = new List<string>(_map.Count);
            _missing = new List<string>(_map.Count);
        }

        public void TestTryGet()
        {
            for (int i = 0; i < _map.Count; i++)
            {
                string key = "key" + i;
                string value;
                if (_map.TryGetValue(key, out value))
                    _found.Add(key + ": " + value);
                else
                    _missing.Add(key);
            }

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }

        public void TestCallback()
        {
            for (int i = 0; i < _map.Count; i++)
                _map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            int size = 2000000;
            var map = new Dictionary<string, string>(size);
            for (int i = 0; i < size; i++)
                if (i % 5 != 0)
                    map.Add("key" + i, "value" + i);

            long totalCallback = 0;
            long totalTryGet = 0;

            int iterations = 20;
            TryGetExample tryGet;
            for (int i = 0; i < iterations; i++)
            {
                tryGet = new TryGetExample(map);
                long tryGetStart = DateTime.UtcNow.Ticks;
                tryGet.TestTryGet();
                totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
                GC.Collect();

                tryGet = new TryGetExample(map);
                long callbackStart = DateTime.UtcNow.Ticks;
                tryGet.TestCallback();
                totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
                GC.Collect();
            }

            Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
            Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
        }
    }
}

正如我在启发这篇文章的文章中所说，我的性能期望是两者都不比另一个快或慢很多。毕竟，大部分工作都是在搜索和添加中，而不是在构建它的简单逻辑中。事实上，它在不同的运行中略有不同，但我无法发现任何一致的优势。

部分问题是我使用了低精度计时器并且测试很短，因此我将计数增加了 10 倍，达到 2000000，这有帮助。现在回调速度大约慢了 3%，我认为这并不重要。在我相当慢的机器上，回调花费了 17773437，而 tryget 花费了 17234375。

现在，至于代码复杂性，这有点不公平，因为 TryGet 是原生的，所以让我们忽略我必须添加回调接口的事实。在调用点，lambda 表示法很好地隐藏了复杂性。如果有的话，它实际上比 TryGet 版本中使用的 if/then/else 短，尽管我想我可以使用三元运算符来使其同样紧凑。

总的来说，我发现 C# 更优雅，其中部分原因是我作为 C# 程序员的偏见。主要是，我不必定义和实现接口，这减少了管道开销。我还使用了相当标准的 .NET 约定，这似乎比 Java 中喜欢的那种风格更加精简。

David, thanks for taking the time to write this up. I'm a C# programmer, so my Java skills are a bit vague these days. Because of this, I decided to port your code over and test it myself. I found some interesting differences and similarities, which are pretty much worth the price of admission as far as I'm concerned. Among the major differences are:

I didn't have to implement TryGet because it's built into Dictionary.
In order to use the native TryGet, instead of inserting nulls to simulate misses, I simply omitted those values. This still means that v = map[k] would have set v to null, so I think it's a proper porting. In hindsight, I could have inserted the nulls and changed (_map.TryGetValue(key, out value)) to (_map.TryGetValue(key, out value) && value != null)), but I'm glad I didn't.
I want to be exceedingly fair. So, to keep the code as compact and maintainable as possible, I used lambda calculus notation, which let me define the callbacks painlessly. This hides much of the complexity of setting up anonymous delegates, and allows me to use closures seamlessly. Ironically, the implementation of Lookup uses TryGet internally.
Instead of declaring a new type of Dictionary, I used an extension method to graft Lookup onto the standard dictionary, much simplifying the code.

With apologies for the less-than-professional quality of the code, here it is:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication1
{
    static class CallbackDictionary
    {
        public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
        {
            V v;
            if (map.TryGetValue(key, out v))
                found(key, v);
            else
                missed(key);
        }
    }

    class TryGetExample
    {
        private Dictionary<string, string> _map;
        private List<string> _found;
        private List<string> _missing;

        public TryGetExample(Dictionary<string, string> map)
        {
            _map = map;
            _found = new List<string>(_map.Count);
            _missing = new List<string>(_map.Count);
        }

        public void TestTryGet()
        {
            for (int i = 0; i < _map.Count; i++)
            {
                string key = "key" + i;
                string value;
                if (_map.TryGetValue(key, out value))
                    _found.Add(key + ": " + value);
                else
                    _missing.Add(key);
            }

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }

        public void TestCallback()
        {
            for (int i = 0; i < _map.Count; i++)
                _map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            int size = 2000000;
            var map = new Dictionary<string, string>(size);
            for (int i = 0; i < size; i++)
                if (i % 5 != 0)
                    map.Add("key" + i, "value" + i);

            long totalCallback = 0;
            long totalTryGet = 0;

            int iterations = 20;
            TryGetExample tryGet;
            for (int i = 0; i < iterations; i++)
            {
                tryGet = new TryGetExample(map);
                long tryGetStart = DateTime.UtcNow.Ticks;
                tryGet.TestTryGet();
                totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
                GC.Collect();

                tryGet = new TryGetExample(map);
                long callbackStart = DateTime.UtcNow.Ticks;
                tryGet.TestCallback();
                totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
                GC.Collect();
            }

            Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
            Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
        }
    }
}

My performance expectations, as I said in the article that inspired this one, would be that neither one is much faster or slower than the other. After all, most of the work is in the searching and adding, not in the simple logic that structures it. In fact, it varied a bit among runs, but I was unable to detect any consistent advantage.

Part of the problem is that I used a low-precision timer and the test was short, so I increased the count by 10x to 2000000 and that helped. Now callbacks are about 3% slower, which I do not consider significant. On my fairly slow machine, callbacks took 17773437 while tryget took 17234375.

Now, as for code complexity, it's a bit unfair because TryGet is native, so let's just ignore the fact that I had to add a callback interface. At the calling spot, lambda notation did a great job of hiding the complexity. If anything, it's actually shorter than the if/then/else used in the TryGet version, although I suppose I could have used a ternary operator to make it equally compact.

On the whole, I found the C# to be more elegant, and only some of that is due to my bias as a C# programmer. Mainly, I didn't have to define and implement interfaces, which cut down on the plumbing overhead. I also used pretty standard .NET conventions, which seem to be a bit more streamlined than the sort of style favored in Java.

回复收藏 0 原文

~没有更多了~