Java省时稀疏一维数组(双精度)
我需要一个高效的 Java 结构来操作非常稀疏的双精度向量:基本的读/写操作。我用HashMap实现了但是访问太慢了。我应该使用其他数据结构吗?你有推荐免费的图书馆吗?
寻找一些和平的建议:)
非常感谢,
玛丽
I need an efficient Java structure to manipulate very sparse vectors of doubles: basic read / write operations. I implemented it in a HashMap but the access is too slow. Should I use another data structure? Do you recommend any free library?
Looking for some peaceful advice :)
Thanks a lot,
Marie
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
HashMap
是正确的选择。应该不慢。通过分析器运行代码以查看所有时间都花在哪里,然后进行相应的优化。如果您需要优化代码的提示,请在此处发布示例,以便我们帮助解决特定问题。[编辑] 根据索引的大小,您可以使用 Integer.valueOf(int) 中的技术来缓存装箱对象。但这仅在您创建大量地图并且索引在某种程度上有限的范围内时才有效。
或者您可以尝试来自 commons-lang 的
IntHashMap
。使用起来有点困难(它是包私有的),但你可以复制代码。最后,您可以使用自己的基于 int 的 HashMap 实现,并针对您的情况优化值查找。
HashMap
is the way to go. It shouldn't be slow. Run your code through a profiler to see where all the time goes and then optimize accordingly. If you need tips to optimize the code, post an example here so we can help with a specific issue.[EDIT] Depending on the size of the indexes, you can use a technique as in
Integer.valueOf(int)
to cache the objects for boxing. But this will only work when you create lots of maps and the indexes are in a somewhat limited range.Or you can try
IntHashMap
from commons-lang. It's a bit hard to use (it's package private) but you can copy the code.Lastly, you could use your own implementation of an int-based HashMap with optimized value lookup for your case.
您的数据集有多大?比 Integer.MAX_VALUE 大很多吗?问题是 HashSet 由数组支持。碰撞会降低性能。也许不是hashmap的机制太慢,而是你有多次碰撞。也许如果您首先使用另一个哈希函数对数据进行分区(例如),然后将每个数据分区存储在它自己的哈希图中,您会更幸运。
How big is your dataset? Much larger than Integer.MAX_VALUE? the problem is that HashSet is backed by an array. Collisions will slow performance. Perhaps it's not the mechanism of hashmap that is too slow, but the fact that you have multiple collisions. Perhaps if you partitioned your data first (e.g) using another hash function, then stored each partition of data in it's own hashmap you'd have more luck.
您可以从我的 Hapax 项目中复制粘贴稀疏向量: ch.akuhn.matrix.SparseVector
PS:所有其他不理解为什么使用地图太慢的答案和评论。它很慢,因为映射将所有索引装箱为整数对象!
这里提供的稀疏向量对于读取访问和附加值来说很快,但对于放置随机索引来说却不是。它最适合您首先创建 sprase 向量但将值按索引递增的顺序排列,然后主要使用地图进行读取的场景。
稀疏向量类中的重要方法是
You can copy paste the sparse vector from my Hapax project: ch.akuhn.matrix.SparseVector
PS: to all those other answers and comments that dont grok why using a map is too slow. It is slow because a map boxes all indices to Integer objects!
The sparse vector presented here is fast for read access and appending values, but not for putting at random indices. Its is optimal for a scenario where you first create the sprase vector but putting values in order of increasing indices, and later use the map for reading mostly.
Important methods in the sparse vector class are
对于一维稀疏数组,映射通常是最佳选择。仅当库是多维的时才需要使用它。
如果比较映射和数组之间的访问时间,
映射会慢得多。任何图书馆都会有同样的问题。
这就是稀疏数组吗?你用时间换取空间。
For 1D sparse array, map is normally the way to go. You only need to use a library if it's multi-dimension.
If you compare access time between map and array,
map is going to be much slower. Any library would have the same issue.
Is that sparse array all about? You trade time for space.