在Python中,使用bisect在字典列表中查找项目
我有一个字典列表,如下所示:
test_data = [
{ 'offset':0, 'data':1500 },
{ 'offset':1270, 'data':120 },
{ 'offset':2117, 'data':30 },
{ 'offset':4055, 'data':30000 },
]
字典项根据 'offset'
数据在列表中排序。真实的数据可能会更长。
我想要做的是在给定特定偏移值的列表中查找项目,该偏移值不恰好是这些值之一,但在该范围内。所以,二分搜索就是我想做的。
我现在知道Python bisect
模块,它是一个现成的二分搜索——很好,但不能直接用于这种情况。我只是想知道适应 bisect
满足我的需要。这是我想出的:
import bisect
class dict_list_index_get_member(object):
def __init__(self, dict_list, member):
self.dict_list = dict_list
self.member = member
def __getitem__(self, index):
return self.dict_list[index][self.member]
def __len__(self):
return self.dict_list.__len__()
test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)
它打印:
2
我的问题是,这是做我想做的事情的最佳方法,还是有其他更简单、更好的方法?
I have a list of dicts, something like this:
test_data = [
{ 'offset':0, 'data':1500 },
{ 'offset':1270, 'data':120 },
{ 'offset':2117, 'data':30 },
{ 'offset':4055, 'data':30000 },
]
The dict items are sorted in the list according to the 'offset'
data. The real data could be much longer.
What I want to do is lookup an item in the list given a particular offset value, which is not exactly one of those values, but in that range. So, a binary search is what I want to do.
I am now aware of the Python bisect
module, which is a ready-made binary search—great, but not directly usable for this case. I'm just wondering what is the easiest way to adapt bisect
to my needs. Here is what I came up with:
import bisect
class dict_list_index_get_member(object):
def __init__(self, dict_list, member):
self.dict_list = dict_list
self.member = member
def __getitem__(self, index):
return self.dict_list[index][self.member]
def __len__(self):
return self.dict_list.__len__()
test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)
It prints:
2
My question is, is this the best way to do what I want, or is there some other simpler, better way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这里通常的模式类似于按属性排序、装饰、操作和取消装饰。所以在这种情况下你只需要装饰然后调用。但是,您希望避免这样做,因为装饰将是 O(n),而您希望它是 O(logn)。因此我认为你的方法是最好的。
The usual pattern here is similar to sorting by an attribute, decorate, operate, and undecorate. So in this case you'd just need to decorate and then call. However you'd want to avoid doing this since decorate would be O(n) whereas you want this to be O(logn). Therefore I'd consider your method best.
您可以做的是
这应该允许您创建一个简单的
OffsetWithAttributes
实例的list
。bisect
算法应该非常乐意使用定义的运算符。您可以使用
someOWA.attributes['data']
。或者
这应该使 OffsetWithAttributes 更像一个 dict 。
What you can do is this
This should allow you to create a simple
list
ofOffsetWithAttributes
instances. Thebisect
algorithm should be perfectly happy to use the defined operators.You can use your
someOWA.attributes['data']
.Or
That should make the
OffsetWithAttributes
more like adict
.从 Python 3.10 开始,您可以将 key 函数作为关键字参数传递给 bisect 函数
Starting in Python 3.10 you can pass a key function as a keyword argument to the bisect functions
如果您可以使用元组,则元组可与二等分一起使用...
尽管因为元组是“按字典顺序”(从左到右)进行比较,直到一个元素不等于另一个元素 - 您必须考虑这是否是所需的行为
tuples work with bisect if you are ok using them instead...
although because tuples are compared "lexicographically" (left to right) until an element is not equal to the other - you would have to consider if this is desired behaviour
对于字典列表的范围查询,鸭子会表现良好。它与二分搜索一样快,因为它构建了基于树的索引。
pip install ducks
Ducks还可以通过多个属性进行搜索,例如:
For range queries over a list of dicts, ducks will perform well. It's as fast as a binary search because it builds a tree-based index.
pip install ducks
Ducks can also search by multiple attributes, e.g.:
您还可以使用 Python 的众多 SortedDict 实现之一来管理您的 test_data。排序字典按键对元素进行排序并维护到值的映射。一些实现还支持对键进行二等分操作。例如,Python Sortedcontainers 模块 有一个 SortedDict 满足您的要求。
在您的情况下,它看起来像:
SortedDict 类型有一个二分函数,它返回所需键的二分索引。通过该索引,您可以查找实际的键。通过该密钥,您可以获得该值。
所有这些操作在排序容器中都非常快,这也可以在纯 Python 中方便地实现。还有一个性能比较,其中讨论了其他选择并具有基准数据。
You could also use one of Python's many SortedDict implementations to manage your test_data. A sorted dict sorts the elements by key and maintains a mapping to a value. Some implementations also support a bisect operation on the keys. For example, the Python sortedcontainers module has a SortedDict that meets your requirements.
In your case it would look something like:
The SortedDict type has a bisect function which returns the bisected index of the desired key. With that index, you can lookup the actual key. And with that key you can get the value.
All of these operations are very fast in sortedcontainers which is also conveniently implemented in pure-Python. There's a performance comparison too which discusses other choices and has benchmark data.
当您说实际数据可能更长时,这是否会阻止您保留手头的偏移值列表?
不过你的方法对我来说似乎不错。
When you say the real data could be much longer, does that prevent you from keeping a list of offset values on hand?
Your method seems fine to me though.