Java小数据集的数据查找方法？

发布于 2024-08-06 09:12:20 字数 776 浏览 5 评论 0原文

我们必须根据三个输入数据字段查找一些数据。查找必须很快。仅有大约 20 种可能的查找组合。我们使用静态 HashMap 实例实现了这一点，在该实例中，我们通过连接三个数据字段来创建一个键。有没有更好的方法来做到这一点，或者这是要走的路？代码如下。

更新：我并不是说这段代码很慢。只是好奇是否有更好的方法来做到这一点。我认为可能有一个更优雅的解决方案，但如果没有令人信服的替代方案，我很乐意保留它！

创建类级静态 HashMap 实例：

private static HashMap map = new HashMap();

我们如何将数据加载到内存中：

private void load(Iterator iterator) {        
    while (iterator.next()) {  
      Object o = it.next();
      key = o.getField1() + "-" + o.getField2() + "-" o.getField3();
      map.put(key, o.getData());
    }
}

以及我们如何根据三个字段查找数据：

private Stirng getData(String f1, String f2, String f3) {
   String key = f1 + "-" + f2 + "-" f3;
   return map.get(key);
}

原文

We have to lookup some data based on three input data fields. The lookup has to be fast. There are only about 20 possible lookup combinations. We've implemented this using a static HashMap instance where we create a key by concatinating the three data fields. Is there a better way to do this or is this the way to go? Code is below.

Update: I'm not implying that this code is slow. Just curious if there is a better way to do this. I thought there might be a more elegant solution but I'm happy to keep this in place if there are no compelling alternatives!

Create class level static HashMap instance:

private static HashMap map = new HashMap();

How we load data into memory:

private void load(Iterator iterator) {        
    while (iterator.next()) {  
      Object o = it.next();
      key = o.getField1() + "-" + o.getField2() + "-" o.getField3();
      map.put(key, o.getData());
    }
}

And how we look up the data based on the three fields:

private Stirng getData(String f1, String f2, String f3) {
   String key = f1 + "-" + f2 + "-" f3;
   return map.get(key);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冰魂雪魄 2024-08-13 09:12:20

好吧，要问自己的问题当然是“它足够快吗？”因为除非您的应用程序需要更快，而这正是瓶颈，否则这并不重要。你所拥有的已经相当有效了。

话虽这么说，如果您想从这个例程中榨取尽可能多的速度（而不用汇编语言重写它；-），您可以考虑使用数组而不是 HashMap，因为只有数量有限的小钥匙。您必须开发某种哈希函数，将每个对象哈希为 0 到 19 之间的唯一数字（或者您实际拥有的元素数量）。您还可以优化该哈希函数的实现，尽管在不知道您正在使用的对象的详细信息的情况下，我无法告诉您具体如何做到这一点。

回复收藏 0 原文

真心难拥有 2024-08-13 09:12:20

您可以创建一个具有三个字符串字段的特殊密钥对象，以避免构建密钥字符串：

class MapKey {
  public final String k1;
  public final String k2;
  public final String k3;

  public MapKey(String k1, String k2, String k3) {
    this.k1 = k1; this.k2 = k2; this.k3 = k3;
  }

  public MapKey(Object o) {
    this.k1 = o.getField1(); this.k2 = o.getField2(); this.k3 = o.getField3();
  }

  public int hashCode() {
    return k1.hashCode();  // if k1 is likely to be the same, also add hashes from k2 and k3
  }
}

You could create a special key object having three String fields to avoid building up the key string:

class MapKey {
  public final String k1;
  public final String k2;
  public final String k3;

  public MapKey(String k1, String k2, String k3) {
    this.k1 = k1; this.k2 = k2; this.k3 = k3;
  }

  public MapKey(Object o) {
    this.k1 = o.getField1(); this.k2 = o.getField2(); this.k3 = o.getField3();
  }

  public int hashCode() {
    return k1.hashCode();  // if k1 is likely to be the same, also add hashes from k2 and k3
  }
}

回复收藏 0 原文

長街聽風 2024-08-13 09:12:20

在你的情况下，我会继续使用你概述的实现。对于映射到常量数据的大量常量键，您可以使用最小完美哈希。由于编写这个代码并不简单，而且我不确定现有的库，因此在使用它之前您必须考虑实现成本。

回复收藏 0 原文

浅唱ヾ落雨殇 2024-08-13 09:12:20

我认为你的方法相当快。通过实现自己的哈希算法获得的任何收益都将非常小，特别是与所需的工作相比。

关于您的密钥格式的一点评论。您最好确保分隔符不能出现在字段 toString() 值中，否则可能会发生按键冲突：

field1="a-", field2="b-", field3="c" -> key="a--b--c"
field1="a", field2="-b", field3="-c" -> key="a--b--c"

I think your approach is pretty fast. Any gains by implementing your own hashing algorithm would be very small, especially compared to the effort required.

One remark about your key format. You better make sure that your separator cannot occur in the field toString() values, otherwise you might get key collisions:

field1="a-", field2="b-", field3="c" -> key="a--b--c"
field1="a", field2="-b", field3="-c" -> key="a--b--c"

回复收藏 0 原文

陌伤ぢ 2024-08-13 09:12:20

连接字符串对于创建键来说是一个坏主意。我的主要目的是不清楚。但实际上，很大一部分实现都存在错误，特别是分隔符实际上可能出现在字符串中。就性能而言，我发现只需将字符串 hack 的密钥更改为有意义的密钥对象，程序的速度就会提高百分之十。（如果您确实需要偷懒，可以使用 Arrays.asList 来创建密钥 - 请参阅 List.equals API 文档。）

回复收藏 0 原文

晚风撩人 2024-08-13 09:12:20

由于您只有 20 种组合，因此基于了解每种组合的特征，手工制作“给我该组合的索引 1..20”可能是可行的。

您能列出确切的组合列表吗？

回复收藏 0 原文

虚拟世界 2024-08-13 09:12:20

完成此操作的另一种方法是创建一个 Object 来处理您的密钥，您可以使用它覆盖 equals() （和 hashCode() ) 对传入密钥进行测试，依次测试 field1、field2 和 field3。

编辑（响应评论）：

由于您的地图使用从 hashCode() 返回的值将您的密钥放入存储桶中，（然后从中测试 equals ），理论上所有键的值都可以相同。但是，我不建议这样做，因为您不会获得 HashMap 性能的好处。您实际上是在迭代存储桶中的所有项目并测试 equals()。

您可以采取的一种方法是将对 hashCode() 的调用委托给密钥容器中的值之一。例如，您始终可以从 field3 返回 hashCode。在这种情况下，您将把密钥分发到可能与 field3 的不同值一样多的存储桶。一旦您的 HashMap 找到存储桶，它仍然需要迭代存储桶中的项目来测试 equals() 的结果，直到找到匹配项。

您可以创建所有字段上 hashCode() 返回的值的总和。正如刚才所讨论的，该值不需要是唯一的。此外，碰撞的可能性以及因此产生的更大的桶要小得多。考虑到这一点，您在 HashMap 上的查找应该会更快。

EDIT2：

该密钥的良好哈希码问题已在一个单独的问题中得到解答这里