优化Python字典/负索引存储

发布于 2024-10-21 23:09:02 字数 527 浏览 2 评论 0原文

~~由此问题的评论提出~~ （我可以看到这是无关紧要的），我现在意识到使用字典来存储需要定期查询/访问的数据并不好，速度也不好。

我有这样的情况：

someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else

我将坐标键存储到充当游戏中的图块数组的对象。这些在某些时候会是负数，所以我不能使用列表或某种稀疏数组（我认为这就是这个术语？）。

我可以：

加快字典查找速度，这样就不会成为问题
找到某种支持稀疏负索引的容器？

我会使用一个列表，但查询将从 O(log n) 到 O(n) 来查找 (x, y) 处的区域。（我想我的时间也错了）。

原文

~~Raised by this question's comments~~ (I can see that this is irrelevant), I am now aware that using dictionaries for data that needs to be queried/accessed regularly is not good, speedwise.

I have a situation of something like this:

someDict = {}
someDict[(-2, -2)] = something
somedict[(3, -10)] = something else

I am storing keys of coordinates to objects that act as arrays of tiles in a game. These are going to be negative at some point, so I can't use a list or some kind of sparse array (I think that's the term?).

Can I either:

Speed up dictionary lookups, so this would not be an issue
Find some kind of container that will support sparse, negative indices?

I would use a list, but then the querying would go from O(log n) to O(n) to find the area at (x, y). (I think my timings are off here too).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

手心的温暖 2024-10-28 23:09:02

Python 字典非常非常快，使用整数元组不会成为问题。然而，您的用例似乎有时您需要进行单坐标检查，并且遍历所有字典当然很慢。

然而，您可以使用三个字典来加速所需访问的数据结构，而不是进行线性搜索：

class Grid(object):
    def __init__(self):
        self.data = {}  # (i, j) -> data
        self.cols = {}  # i -> set of j
        self.rows = {}  # j -> set of i

    def __getitem__(self, ij):
        return self.data[ij]

    def __setitem__(self, ij, value):
        i, j = ij
        self.data[ij] = value
        try:
            self.cols[i].add(j)
        except KeyError:
            self.cols[i] = set([j])
        try:
            self.rows[j].add(i)
        except KeyError:
            self.rows[j] = add([i])

    def getRow(self, i):
        return [(i, j, data[(i, j)])
                for j in self.cols.get(i, [])]

    def getCol(self, j):
        return [(i, j, data[(i, j)])
                for i in self.rows.get(j, [])]

请注意，还有许多其他可能的数据结构，具体取决于您想要执行的操作、读取的频率、读取的频率更新，如果您按矩形查询，如果您查找最近的非空单元格等等。

Python dictionaries are very very fast, and using a tuple of integers is not going to be a problem. However your use case seems that sometimes you need to do a single-coordinate check and doing that traversing all the dict is of course slow.

Instead of doing a linear search you can however speed up the data structure for the access you need using three dictionaries:

class Grid(object):
    def __init__(self):
        self.data = {}  # (i, j) -> data
        self.cols = {}  # i -> set of j
        self.rows = {}  # j -> set of i

    def __getitem__(self, ij):
        return self.data[ij]

    def __setitem__(self, ij, value):
        i, j = ij
        self.data[ij] = value
        try:
            self.cols[i].add(j)
        except KeyError:
            self.cols[i] = set([j])
        try:
            self.rows[j].add(i)
        except KeyError:
            self.rows[j] = add([i])

    def getRow(self, i):
        return [(i, j, data[(i, j)])
                for j in self.cols.get(i, [])]

    def getCol(self, j):
        return [(i, j, data[(i, j)])
                for i in self.rows.get(j, [])]

Note that there are many other possible data structures depending on exactly what you are trying to do, how frequent is reading, how frequent is updating, if you query by rectangles, if you look for nearest non-empty cell and so on.

回复收藏 0 原文

浮华 2024-10-28 23:09:02

首先

加快字典查找速度，因此这不会成为问题

字典查找速度非常快 O(1)，但是（从您的其他问题来看）您不依赖于字典的哈希表查找，而是依赖于线性搜索字典的键。

找到某种支持稀疏负索引的容器？

这不是字典索引。元组是一个不可变的对象，您将元组作为一个整体进行哈希处理。字典实际上不知道键的内容，只知道它们的哈希值。

我将像其他人一样建议您重组数据。

例如，您可以创建封装所需数据的对象，并将它们排列在二叉树中以进行 O(n lg n) 搜索。您甚至可以将整个内容包装在一个类中，该类将为您提供所需的良好 if foo in Bar: 语法。

您可能需要几个协调的结构来完成您想要的任务。这是一个使用字典和集合的简化示例（稍微调整用户 6502 的建议）。

# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}

def add_data(coord, data)
    matrix[coord] = data
    try:
        cols[coord[0]].add(coord)
    except KeyError:
        # wrap coords in a list to prevent set() from iterating over it
        cols[coord[0]] = set([coord])
    try:
        rows[coord[1]].add(coord)
    except KeyError:
        rows[coord[1]] = set([coord])

# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']

现在只需将其包装在一个类或一个模块中，然后您就会离开，并且一如既往，如果它不够快，请在您猜测之前进行分析和测试。

To start off with

Speed up dictionary lookups, so this would not be an issue

Dictionary lookups are pretty fast O(1), but (from your other question) you're not relying on the hash-table lookup of the dictionary, your relying on a linear search of the dictionary's keys.

Find some kind of container that will support sparse, negative indices?

This isn't indexing into the dictionary. A tuple is an immutable object, and you are hashing the tuple as a whole. The dictionary really has no idea of the contents of the keys, just their hash.

I'm going to suggest, as others did, that you restructure your data.

For example, you could create objects that encapsulate the data you need, and arrange them in a binary tree for O(n lg n) searches. You can even go so far as to wrap the entire thing in a class that will give you the nice if foo in Bar: syntax your looking for.

You probably need a couple coordinated structures to accomplish what you want. Here's a simplified example using dicts and sets (tweaking user 6502's suggestion a bit).

# this will be your dict that holds all the data
matrix = {}
# and each of these will be a dict of sets, pointing to coordinates
cols = {}
rows = {}

def add_data(coord, data)
    matrix[coord] = data
    try:
        cols[coord[0]].add(coord)
    except KeyError:
        # wrap coords in a list to prevent set() from iterating over it
        cols[coord[0]] = set([coord])
    try:
        rows[coord[1]].add(coord)
    except KeyError:
        rows[coord[1]] = set([coord])

# now you can find all coordinates from a row or column quickly
>>> add_data((2, 7), "foo4")
>>> add_data((2, 5), "foo3")
>>> 2 in cols
True
>>> 5 in rows
True
>>> [matrix[coord] for coord in cols[2]]
['foo4', 'foo3']

Now just wrap that in a class or a module, and you'll be off, and as always, if it's not fast enough profile and test before you guess.

回复收藏 0 原文

笑红尘 2024-10-28 23:09:02

字典查找非常快。搜索部分键（例如，第 x 行中的所有图块）并不快。你可以使用字典的字典。不要使用由 2 元组索引的单个字典，而是使用如下所示的嵌套字典：

somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"

然后，如果您想要给定“行”中的所有图块，则只需 somedict[0] 即可。

您将需要一些逻辑来在必要时添加辅助词典等。提示：查看标准 dict 类型上的 getitem() 和 setdefault()，或者可能是 collections.defaultdict > 类型。

这种方法使您可以快速访问给定行中的所有图块。如果您想要给定列中的所有图块，它仍然很慢（尽管至少您不需要查看每个单元格，只需查看每一行）。但是，如果需要，您可以通过使用两个字典的字典（一个按列、行顺序，另一个按行、列顺序）来解决这个问题。更新工作量会增加两倍，这对于大多数图块都是静态的游戏来说可能并不重要，但在任一方向上访问都非常容易。

如果您只需要存储数字并且大多数单元格将为 0，请查看 scipy 的稀疏矩阵类。

Dictionary lookups are very fast. Searching for part of the key (e.g. all tiles in row x) is what's not fast. You could use a dict of dicts. Rather than a single dict indexed by a 2-tuple, use nested dicts like this:

somedict = {0: {}, 1:{}}
somedict[0][-5] = "thingy"
somedict[1][4] = "bing"

Then if you want all the tiles in a given "row" it's just somedict[0].

You will need some logic to add the secondary dictionaries where necessary and so on. Hint: check out getitem() and setdefault() on the standard dict type, or possibly the collections.defaultdict type.

This approach gives you quick access to all tiles in a given row. It's still slow-ish if you want all the tiles in a given column (though at least you won't need to look through every single cell, just every row). However, if needed, you could get around that by having two dicts of dicts (one in column, row order and the other in row, column order). Updating then becomes twice as much work, which may not matter for a game where most of the tiles are static, but access is very easy in either direction.

If you only need to store numbers and most of your cells will be 0, check out scipy's sparse matrix classes.

回复收藏 0 原文

菊凝晚露 2024-10-28 23:09:02

一种替代方法是简单地改变指数，使其为正值。

例如，如果您的索引像这样连续：

...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...

只需执行类似 LookupArray[Index + MinimumIndex] 的操作，其中 MaximumIndex 是您将使用的最小索引的绝对值。

这样，如果您的最小值为 -50，它将映射到 0。-20 将映射到 30，依此类推。

编辑：

另一种方法是使用索引的使用技巧。定义以下键函数

Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)

它将所有正键映射到正偶数索引，将所有负元素映射到正奇数索引。但这可能不切实际，因为如果添加 100 个负键，则必须将数组扩展 200。

另一件事需要注意：如果您打算进行查找并且键的数量是恒定的（或者非常慢）改变），坚持使用数组。除此之外，字典一点也不差。

One alternative would be to simply shift the index so it's positive.

E.g. if your indices are contiguous like this:

...
-2 -> a
-1 -> c
0 -> d
1 -> e
2 -> f
...

Just do something like LookupArray[Index + MinimumIndex], where MinimumIndex is the absolute value of the smallest index you would use.

That way, if your minimum was say, -50, it would map to 0. -20 would map to 30, and so forth.

Edit:

An alternative would be to use a trick with how you use the indices. Define the following key function

Key(n) = 2 * n (n >= 0)
Key(n) = -2 * n - 1. (n < 0)

This maps all positive keys to the positive even indices, and all negative elements to the positive odd indices. This may not be practical though, since if you add 100 negative keys, you'd have to expand your array by 200.

One other thing to note: If you plan on doing look ups and the number of keys is constant (or very slowly changing), stick with an array. Otherwise, dictionaries aren't bad at all.

回复收藏 0 原文