x/y 坐标稀疏列表的 Python 数据结构
考虑 x/y 坐标列表和字节“计数”。 x/y 的范围可能是 0 到 5000,即 2500 万个单元格。
然而,数据将非常稀疏,最多有几千个条目,并且大多数坐标将有零个条目。
该结构偶尔会被查找/添加(例如,如果 x=5 和 y=10 中存在某些内容,则 ++),但更频繁地转换为 x/y/count 列表(排序并不重要)
最快的数据用于查找的结构显然是一个二维数组,但您正在查看 24 MB 左右的内存,并且输出列表的迭代可能会很昂贵。对于磁盘存储,您可以实现 gif 样式压缩,其中 0 字节后跟另一个字节表示 x 个空单元格,其他任何内容都是单元格值 - 但这对内存情况没有帮助。
字典的字典可能会在查找/迭代速度和内存使用之间取得良好的平衡。
我是否应该考虑任何其他合适的数据结构(内置于 Python、现有库或更通用的数据结构?
Consider list of x/y co-ordinates and a byte 'count'. x/y will have a range of perhaps 0 to 5000 which is 25 million cells.
However the data will be quite sparsely populated, there will be at most a few thousand entries and the majority of co-ordinates will have zero entries.
The structure will be occasionally looked up/added to (e.g. if there is something in x=5 and y=10 then ++) but more frequently converted into a list of x/y/count (sorting is not important)
The fastest data structure is for lookup is obviously a 2d array, but you're looking at 24 MB of memory or so and the iteration to output a list could be expensive. For disk storage you could implement gif style compression where a 0 byte followed by another byte indicates x empty cells and anything else is a cell value - but this doesn't help the memory situation.
A dictionary of dictionary's would probably be a good balance between lookup/iteration speed and memory usage.
Are there any other suitable data structures I should be considering (either built in to Python, existing libraries or more general data structures?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
以点(即二元组)为键的字典对我来说听起来不错。它的复杂度为 O(1),类似于数组,而且更加紧凑。只要您不需要进行范围查询或类似的操作,就应该没问题。
A dictionary keyed by a point (ie a 2-tuple) sound good to me. It's O(1) like an array, and significantly more compact. As long as you never need to do range queries or the like, it should be fine.
scipy 有一系列不同的稀疏数组
scipy has a range of different sparse arrays
这应该类似于处理数据范围大小的稀疏矩阵,这里有很多东西需要考虑 http://en.wikipedia.org/wiki/Sparse_matrix
This should be similar to working with sparse matrices of the size of your data range, there's plenty of stuff to chew on here http://en.wikipedia.org/wiki/Sparse_matrix