python中对象的高效数据结构,用于基于任何对象成员变量进行查找
我需要存储具有多个(> 2)整数成员变量的对象,并使用任何成员变量作为搜索键进行快速查找。
为了更容易说明,假设对象是 3 个整数的元组,我需要使用元组的任何元素作为此类元组列表中的键进行快速查找:
collection = [(1, 200, 9),
(2, 300, 8),
(3, 400, 7)]
查找将类似于:
collection.lookup_by_first_element(1) # Return (1, 200, 9)
collection.lookup_by_second_element(300) # Return (2, 300, 8)
collection.lookup_by_third_element(250) # Return None
我需要查找快速/高效。到目前为止,我最好的选择是使用内存中的 sqlite 数据库,其中包含三个元组元素的三列,并将索引放在所有三列上。
搜索树也可以,但是它们有一个用于查找的键,我不知道如何基于多个键进行查找。你会怎么做?
I need to store objects that have a number (>2) of integer member variables and do quick look-ups using any member variables as a search key.
For easier illustration let's say the objects are tuples of 3 integers and I need to do quick look-ups using any element of a tuple as a key in a list of such tuples:
collection = [(1, 200, 9),
(2, 300, 8),
(3, 400, 7)]
Look-ups would be like:
collection.lookup_by_first_element(1) # Return (1, 200, 9)
collection.lookup_by_second_element(300) # Return (2, 300, 8)
collection.lookup_by_third_element(250) # Return None
I need the lookups to be fast/efficient. My best bet so far is to use an in memory sqlite database with three columns for the three tuple elements and put indexes on all three columns.
A search tree would also do, but those have one key for lookup and I don't see how could I do lookups based on more than one key. How would you do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个简单的解决方案。您可以轻松地将其放入类中并提供更简洁的界面。
更新:
就其价值而言,上述方法比查找的numpy解决方案:
输出:
This is a simple solution. You could easily put this in a class and provide a neater interface.
Update:
For what it's worth, the above is way faster than the numpy solution for lookup:
output:
senderle 是对的(我投了赞成票),但我想详细说明(不仅仅是在评论中)。
字典查找的时间复杂度为 O(1),而且速度非常快(基本上,您的密钥会变成一个散列,该散列可以寻址内存中的特定位置,以便立即从中检索数据)。
相比之下,通过搜索数组来查找值的速度至少为 O(N),因此对于较大的数组,找到正确的值将花费更长的时间。 (例如,您必须筛选所有 N 个键,找到正确的一个,然后可以返回元组。)
因此,如果您的键如您所说是唯一的,那么创建一个基于每个键进行索引的大型字典是有意义的您可以用来查找的键。当然,您必须在字典中表示 m 项(在您的情况下为 m=3)的每个元组,m 次,而使用单个数组时,只需在数组中表示一次。
所以你想定义一个Collection类
内部
c.collection_dict
是:并且你的查找按照你的要求工作
senderle is right (I upvoted), but I want to elaborate (and more than just in a comment).
Dictionary lookups are O(1) and really fast (basically your key gets turned into a hash that addresses a specific location in memory to immediately retrieve your data from).
In contrast looking for a value by searching through an array is slow at least O(N), so for larger arrays it will take longer to find the right value. (E.g., you have to sift through all N keys, find the right one, and then can return the tuple.)
So if your keys are unique as you say, it makes sense to just create a large dictionary that is indexed based on every key you may use to lookup. Granted is you will have to represent each tuple of m items (m=3 in your case), m times in the dictionary while with a single array it only had to be represented once in the array.
So you want to define a Collection class
The internal
c.collection_dict
is:And your lookups work as you requested
使用numpy:
Using numpy: