Scala(或 Java)中的自适应映射保留插入顺序

发布于 2024-09-17 12:09:32 字数 454 浏览 3 评论 0原文

我想找到并重用(如果可能)具有以下属性的地图实现:

  1. 虽然条目数量很少,但可以说

    32,底层存储应该在像这样的数组中完成 [key0, val0, key1, val1, ...] 这种存储方案避免了许多小的 Entry 对象,并提供了极快的查找(即使它们是顺序扫描!)现代CPU的原因是CPU的缓存没有失效并且缺乏到堆的指针间接寻址。

  2. 无论类似于 LinkedHashMap 的条目数量如何,映射都应保持键/值对的插入顺序

我们正在努力基于 Scala 中巨大(数百万个节点/边)图的内存表示,拥有这样的 Map 将使我们能够以更有效的方式存储节点/边属性以及每个节点的边,适用于 99% 以上的节点具有很少属性或邻居的边,同时保留属性和边的时间插入顺序。

如果有人知道具有此类特征的 Scala 或 Java 映射,我将非常感激。

谢谢

I would like to find and reuse (if possible) a map implementation which has the following attributes:

  1. While the number of entries is small, say < 32, underlying storage should be done in an array like this [key0, val0, key1, val1, ...] This storage scheme avoids many small Entry objects and provides for extremely fast look ups (even tho they are sequential scans!) on modern CPU's due to the CPU's cache not being invalidated and lack of pointer indirection into heap.

  2. The map should maintain insertion order for key/value pairs regardless of the number of entries similar to LinkedHashMap

We are working on an in-memory representations of huge (millions of nodes/edges) graphs in Scala and having such a Map would allow us to store Node/Edge attributes as well as Edges per node in a much more efficient way for 99%+ of Nodes and Edges which have few attributes or neighbors while preserving chronological insertion order for both attributes and edges.

If anyone knows of a Scala or Java map with such characteristics I would be much obliged.

Thanx

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

将军与妓 2024-09-24 12:10:07

在java下你可以维护一个二维数组(电子表格)。我编写了一个程序,它基本上定义了一个包含 3 列数据和 3 列用于查找数据的 2 d 数组。三列是 testID、SubtestID 和 Mode。这使我基本上可以通过 testid 和模式或任何组合查找值,或者我也可以通过静态放置进行引用。该表在启动时加载到内存中并由程序引用。它是无限可扩展的,可以根据需要添加新的值。

如果您有兴趣,我今晚可以发布一个代码源示例。

另一个想法可能是在程序中维护一个数据库。数据库旨在组织大量数据。

Under java you can maintain a 2d array(spreadsheet). I wrote a program which basically defines a 2 d array with 3 coloumns of data, and 3 coloumns for looking up the data. the three coloumns are testID, SubtestID and Mode. This allows me to basically look up a value by testid and mode or any combination, or I can also reference by static placement. The table is loaded into memory at startup and referenced by the program. It is infinately expandable and new values can be added as needed.

If you are interested, I can post a code source example tonight.

Another idea may be to maintain a database in your program. Databases are designed to organize large amounts of data.

李不 2024-09-24 12:10:01

您是否使用分析器测量过 LinkedHashMap 对您来说是否太慢?也许你不需要那张新地图 - 过早的优化是万恶之源。
无论如何,要在一秒钟内处理数百万或更多数据,即使是最佳优化的映射也可能太慢,因为在这种情况下每个方法调用也会降低性能。那么你所能做的就是将你的算法从 Java 集合重写为数组(即 int -> 对象映射)。

Have you measured with profiler if LinkedHashMap is too slow for you? Maybe you don't need that new map - premature optimization is the root of all evil..
Anyway for processing millions or more pieces of data in a second, even best-optimized map can be too slow, because every method call decreases performance as well in that cases. Then all you can do is to rewrite your algorithms from Java collections to arrays (i.e. int -> object maps).

云胡 2024-09-24 12:09:55

虽然我不知道有任何实现完全符合您的要求,但您可能有兴趣查看 Flat3Map ()位于 Jakarta Commons 库中。

不幸的是,Jakarta 库相当过时(例如,在最新的稳定版本中不支持泛型,尽管有希望看到这在主干中发生变化),我通常更喜欢 Google 集合,但可能值得您花时间了解一下如何Apache 实现了一些东西。

不幸的是,Flat3Map 不保留键的顺序,但我确实对您的原始帖子有一个建议。我建议使用并行数组,而不是像 [key0, val0, key1, val1, ...] 这样将键和值存储在单个数组中;也就是说,一个数组包含 [key0, key1, ...],另一个数组包含 [val0, val1, ...]。通常我不是并行数组的支持者,但至少这样你就可以拥有一个 K 类型的数组(你的键类型)和另一个 V 类型的数组(你的值类型)。在 Java 级别,这有其自己的缺点,因为您不能使用语法 K[] keys = new K[32];相反,您需要使用一些类型转换

While I'm not aware of any implementations that exactly fit your requirements, you may be interested in peeking at Flat3Map (source) in the Jakarta Commons library.

Unfortunately, the Jakarta libraries are rather outdated (e.g., no support for generics in the latest stable release, although it is promising to see that this is changing in trunk) and I usually prefer the Google Collections, but it might be worth your time to see how Apache implemented things.

Flat3Map doesn't preserve the order of the keys, unfortunately, but I do have a suggestion in regard to your original post. Instead of storing the keys and values in a single array like [key0, val0, key1, val1, ...], I recommend using parallel arrays; that is, one array with [key0, key1, ...] and another with [val0, val1, ...]. Normally I am not a proponent of parallel arrays, but at least this way you can have one array of type K, your key type, and another of type V, your value type. At the Java level, this has its own set of warts as you cannot use the syntax K[] keys = new K[32]; instead you'll need to use a bit of typecasting.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文