我有一些由二维坐标(带符号的短范围)标识的项目。每个项目都是包含 64KB 数据的类。任何时候大约有 500-1500 个项目。项目通常以大约 20 个为一组,围绕一个点。我的问题是我应该如何映射它们,这样就不会占用太多内存。项目将缓慢添加/删除(每秒 1-10 个),并且会经常获取,因此从列表中获取元素(指向更大结构的指针)应该尽可能快。
我想到的是,会有一些 gridContainer 类,可以说它会存储 64x64 指针的矩形。我将有主网格容器,它将存储其他 gridContainer,而这个嵌套的 gridContainer 将存储我想要映射的实际项目(这将允许 4096x4096 实际项目)。要访问特定项目,例如 [260, 130],我会将其除以 64 并取商来查找父 gridContainer 位置,并取余数来查找嵌套 gridContainer 位置。因此对于 [270,145] 我将有 [4,2] 和 [14,17]。
我也在考虑使用 std::map
但我不知道它的内部结构,也不知道我应该期望它的性能如何。
对我的方法有什么建议或者有更好的方法吗?
I've got number of items that are identified by its 2-dimensional coordinates (signed short range). Every item is class containing 64KB of data. There are about 500-1500 of items at any moment. Items are usually in groups of ~20 around a point. My question is how should I map them so that it would not take too much of memory. Items will be added/deleted slowly (1-10 per second) and will be fetched very often so fetching element from the list (pointer to bigger structure) should be as fast as possible.
What I came up with is that there would be some gridContainer class with lets say it would store rectangle of 64x64 pointers. I would have main grid container, which would store other gridContainer and this nested gridContainer would store actual items i want to map (this would allow 4096x4096 actual items). To access a particular item, for example [260, 130] I would divide it by 64 and take quotient to find parent gridContainer position and remainder to find nested gridContainer position. So for [270,145] i would have [4,2] and [14,17].
I was also thinking of using std::map
but I don't know the internals of it and I don't know what performance should I expect from it.
Any suggestions on my method or is there any better way to do it?
发布评论
评论(3)
您可以像已经建议的那样使用 std::map,它只是 ab 树容器,或者您可以使用哈希表,它有可能更快。四叉树也是一种选择,它比前两个选项涉及更多,但会提供早期退出选项。
具有最小负载的哈希表很有可能进行 O(1) 查找,但当 n = 存在项时,最坏情况为 O(n)。如果你能将负载保持在 10% 以下,那么你就必须有一个非常糟糕的哈希函数才能变得比 O(1) 更糟糕。这显然会消耗大量额外的内存,但它有可能成为最快的选择。哈希表的比较类型非常复杂。您有一个哈希函数,它接受密钥类型(在本例中为一条短裤)的输入,并返回一个索引。该索引对应于半固定对象数组上的位置。如果那里已经有一个对象,则必须解决碰撞,这可能会导致不利的运行时间,因此越稀疏越好。
四叉树的最佳情况返回时间为 O(1),但仅限于空单元格,而如果存在项目,则其查找时间为 O(log n),其中 n 是所需字段的大小。除非你有很少的对象,否则这有可能比哈希表消耗更多的内存,但如上所述,你可以获得相当好的运行时间,并且不会产生任何结果的查找可以以相对速度终止。四叉树的比较类型通常只是级联布尔检查。
std::map 的查找时间恒定为 O(log n),其中 n 是映射中的元素。这是内存效率最高的选项,但可以保证任何查找的运行时间。 std::map 的比较类型是级联小于操作,在 int ((short1<<16)|short2) 的情况下,也相当快。
这里对您可能会使用的容器进行了很好的概述。您应该使用哪一种取决于您期望桌子的负载程度。如果只有几个对象(< 500),std::map 可能是您最好的选择。对于 500 到大约 5000,您可能应该使用四叉树。除此之外,您将开始为树使用大量的内存,并且可能应该使用哈希表来代替。
You could use a std::map as already suggested, which is just a b tree container, or you could use a hash table, which has the potential to be faster. A quad tree is also an option, which would be far more involved than the previous two options, but would provide early out options.
A hash table with a minimal load has a good chance of being an O(1) lookup, but has a worst case of O(n), when n = items present. If you can keep the load below about 10%, then you'd have to have a really horrible hashing function to get worse than O(1). This obviously chews up a lot of extra memory, but it has the potential to be the fastest option. The comparison type for a hash table is quite involved. You have a hasher function which accepts input of the key type, in this case a pair of shorts, and returns an index. That index corresponds to a location on a semi-fixed array of objects. If there's already an object there, the collision has to be resolved, which can result in detrimental running times, so the sparser, the better.
A quad tree has a best case return time of O(1), but only for empty cells, whereas if there's a item present, it has a lookup time of O(log n) where n is the size of the desired field. Unless you have very few objects, this has the potential to eat up more memory than the hash table, but like stated above, you get a decent running time and a lookup that yields no results can terminate with relative speed. The comparison type for a Quad tree is usually just a cascading boolean check.
A std::map has a constant lookup time of O(log n), where n is the elements in the map. This is the most memory efficient option, but has a guaranteed run time for any lookup. The comparison type for a std::map is a cascading less-than operation, which in the case of an int ((short1<<16)|short2), is also quite fast.
There's a good overview of the containers you'll likely use. The one you ought to use depends on how loaded down you expect your table to be. With just a few objects (< 500), a std::map is probably your best bet. For 500 to about 5000, you should probably use a quad tree. Any more than that, you'll start using a ridiculous amount of memory for the tree, and should probably use a hash table instead.
您始终可以使用
std::map, *item>
这将允许在log(n)
时间内检索和添加,但为了获得更准确的答案,我们必须知道您需要哪些操作才能快速You can always use a
std::map<std::pair<short, short>, *item>
this would allow retrieving and adding inlog(n)
time, but for a more precise answer we have to know what operations you require to be fast四叉树似乎就是您正在寻找的东西。
A Quad Tree seems to be what you are looking for.