x/y 坐标稀疏列表的 Python 数据结构

发布于 2024-11-07 20:51:25 字数 417 浏览 0 评论 0原文

考虑 x/y 坐标列表和字节“计数”。 x/y 的范围可能是 0 到 5000,即 2500 万个单元格。

然而,数据将非常稀疏,最多有几千个条目,并且大多数坐标将有零个条目。

该结构偶尔会被查找/添加(例如,如果 x=5 和 y=10 中存在某些内容,则 ++),但更频繁地转换为 x/y/count 列表(排序并不重要)

最快的数据用于查找的结构显然是一个二维数组,但您正在查看 24 MB 左右的内存,并且输出列表的迭代可能会很昂贵。对于磁盘存储,您可以实现 gif 样式压缩,其中 0 字节后跟另一个字节表示 x 个空单元格,其他任何内容都是单元格值 - 但这对内存情况没有帮助。

字典的字典可能会在查找/迭代速度和内存使用之间取得良好的平衡。

我是否应该考虑任何其他合适的数据结构(内置于 Python、现有库或更通用的数据结构?

Consider list of x/y co-ordinates and a byte 'count'. x/y will have a range of perhaps 0 to 5000 which is 25 million cells.

However the data will be quite sparsely populated, there will be at most a few thousand entries and the majority of co-ordinates will have zero entries.

The structure will be occasionally looked up/added to (e.g. if there is something in x=5 and y=10 then ++) but more frequently converted into a list of x/y/count (sorting is not important)

The fastest data structure is for lookup is obviously a 2d array, but you're looking at 24 MB of memory or so and the iteration to output a list could be expensive. For disk storage you could implement gif style compression where a 0 byte followed by another byte indicates x empty cells and anything else is a cell value - but this doesn't help the memory situation.

A dictionary of dictionary's would probably be a good balance between lookup/iteration speed and memory usage.

Are there any other suitable data structures I should be considering (either built in to Python, existing libraries or more general data structures?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

居里长安 2024-11-14 20:51:25

以点(即二元组)为键的字典对我来说听起来不错。它的复杂度为 O(1),类似于数组,而且更加紧凑。只要您不需要进行范围查询或类似的操作,就应该没问题。

# increment
p = (x, y)
counts[p] = counts.get(p, 0) + 1

# list
for (p, count) in counts.iteritems():
    x, y = p
    print x, y, count

A dictionary keyed by a point (ie a 2-tuple) sound good to me. It's O(1) like an array, and significantly more compact. As long as you never need to do range queries or the like, it should be fine.

# increment
p = (x, y)
counts[p] = counts.get(p, 0) + 1

# list
for (p, count) in counts.iteritems():
    x, y = p
    print x, y, count
无戏配角 2024-11-14 20:51:25

scipy 有一系列不同的稀疏数组

有七种可用的稀疏矩阵类型:
csc_matrix:压缩稀疏列格式
csr_matrix:压缩稀疏行格式
bsr_matrix:块稀疏行格式
lil_matrix:列表格式的列表
dok_matrix:键格式字典
coo_matrix:坐标格式(又名 IJV,三元组格式)
dia_matrix:对角线格式

scipy has a range of different sparse arrays

There are seven available sparse matrix types:
csc_matrix: Compressed Sparse Column format
csr_matrix: Compressed Sparse Row format
bsr_matrix: Block Sparse Row format
lil_matrix: List of Lists format
dok_matrix: Dictionary of Keys format
coo_matrix: COOrdinate format (aka IJV, triplet format)
dia_matrix: DIAgonal format

强者自强 2024-11-14 20:51:25

这应该类似于处理数据范围大小的稀疏矩阵,这里有很多东西需要考虑 http://en.wikipedia.org/wiki/Sparse_matrix

This should be similar to working with sparse matrices of the size of your data range, there's plenty of stuff to chew on here http://en.wikipedia.org/wiki/Sparse_matrix

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文