在Python中,使用bisect在字典列表中查找项目

发布于 2024-08-03 03:04:39 字数 1161 浏览 6 评论 0原文

我有一个字典列表,如下所示:

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

字典项根据 'offset' 数据在列表中排序。真实的数据可能会更长。

我想要做的是在给定特定偏移值的列表中查找项目,该偏移值恰好是这些值之一,但在该范围内。所以,二分搜索就是我想做的。

我现在知道Python bisect 模块,它是一个现成的二分搜索——很好,但不能直接用于这种情况。我只是想知道适应 bisect 满足我的需要。这是我想出的:

import bisect

class dict_list_index_get_member(object):
    def __init__(self, dict_list, member):
        self.dict_list = dict_list
        self.member = member
    def __getitem__(self, index):
        return self.dict_list[index][self.member]
    def __len__(self):
        return self.dict_list.__len__()

test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)

它打印:

2

我的问题是,这是做我想做的事情的最佳方法,还是有其他更简单、更好的方法?

I have a list of dicts, something like this:

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

The dict items are sorted in the list according to the 'offset' data. The real data could be much longer.

What I want to do is lookup an item in the list given a particular offset value, which is not exactly one of those values, but in that range. So, a binary search is what I want to do.

I am now aware of the Python bisect module, which is a ready-made binary search—great, but not directly usable for this case. I'm just wondering what is the easiest way to adapt bisect to my needs. Here is what I came up with:

import bisect

class dict_list_index_get_member(object):
    def __init__(self, dict_list, member):
        self.dict_list = dict_list
        self.member = member
    def __getitem__(self, index):
        return self.dict_list[index][self.member]
    def __len__(self):
        return self.dict_list.__len__()

test_data_index_get_offset = dict_list_index_get_member(test_data, 'offset')
print bisect.bisect(test_data_index_get_offset, 1900)

It prints:

2

My question is, is this the best way to do what I want, or is there some other simpler, better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

我乃一代侩神 2024-08-10 03:04:40

这里通常的模式类似于按属性排序、装饰、操作和取消装饰。所以在这种情况下你只需要装饰然后调用。但是,您希望避免这样做,因为装饰将是 O(n),而您希望它是 O(logn)。因此我认为你的方法是最好的。

The usual pattern here is similar to sorting by an attribute, decorate, operate, and undecorate. So in this case you'd just need to decorate and then call. However you'd want to avoid doing this since decorate would be O(n) whereas you want this to be O(logn). Therefore I'd consider your method best.

绅刃 2024-08-10 03:04:40

您可以做的是

class OffsetWithAttributes( object ):
    def __init__( self, offset, **kw ):
        self.offset= offset
        self.attributes= kw
    def __eq__( self, other ):
        return self.offset == other.offset
    def __lt__( self, other ):
        return self.offset < other.offset
    def __le__( self, other ):
        return self.offset <= other.offset
    def __gt__( self, other ):
        return self.offset > other.offset
    def __ge__( self, other ):
        return self.offset >= other.offset
    def __ne__( self, other ):
        return self.offset != other.offset

这应该允许您创建一个简单的 OffsetWithAttributes 实例的 listbisect 算法应该非常乐意使用定义的运算符。

您可以使用 someOWA.attributes['data']

或者

    def __getattr__( self, key ):
        return self.attributes[key]

这应该使 OffsetWithAttributes 更像一个 dict 。

What you can do is this

class OffsetWithAttributes( object ):
    def __init__( self, offset, **kw ):
        self.offset= offset
        self.attributes= kw
    def __eq__( self, other ):
        return self.offset == other.offset
    def __lt__( self, other ):
        return self.offset < other.offset
    def __le__( self, other ):
        return self.offset <= other.offset
    def __gt__( self, other ):
        return self.offset > other.offset
    def __ge__( self, other ):
        return self.offset >= other.offset
    def __ne__( self, other ):
        return self.offset != other.offset

This should allow you to create a simple list of OffsetWithAttributes instances. The bisect algorithm should be perfectly happy to use the defined operators.

You can use your someOWA.attributes['data'].

Or

    def __getattr__( self, key ):
        return self.attributes[key]

That should make the OffsetWithAttributes more like a dict.

青衫儰鉨ミ守葔 2024-08-10 03:04:40

从 Python 3.10 开始,您可以将 key 函数作为关键字参数传递给 bisect 函数

>>> bisect.bisect(test_data, 1900, key=lambda x: x["offset"])
2

Starting in Python 3.10 you can pass a key function as a keyword argument to the bisect functions

>>> bisect.bisect(test_data, 1900, key=lambda x: x["offset"])
2
妄司 2024-08-10 03:04:40

如果您可以使用元组,则元组可与二等分一起使用...

import bisect

offset = 0
data = 1
test_data = [
    (0, 1500),
    (1270, 120),
    (2117, 30),
    (4055, 30000),
]

i = bisect.bisect(test_data, (1900,0))
test_data.insert(i, (1900,0))
print(test_data[i][data])

尽管因为元组是“按字典顺序”(从左到右)进行比较,直到一个元素不等于另一个元素 - 您必须考虑这是否是所需的行为

>>> bisect.insort(test_data, (2117,29))
>>> print(test_data)
[(0, 1500), (1270, 120), (2117, 29), (2117, 30), (4055, 30000)]

tuples work with bisect if you are ok using them instead...

import bisect

offset = 0
data = 1
test_data = [
    (0, 1500),
    (1270, 120),
    (2117, 30),
    (4055, 30000),
]

i = bisect.bisect(test_data, (1900,0))
test_data.insert(i, (1900,0))
print(test_data[i][data])

although because tuples are compared "lexicographically" (left to right) until an element is not equal to the other - you would have to consider if this is desired behaviour

>>> bisect.insort(test_data, (2117,29))
>>> print(test_data)
[(0, 1500), (1270, 120), (2117, 29), (2117, 30), (4055, 30000)]
再浓的妆也掩不了殇 2024-08-10 03:04:40

对于字典列表的范围查询,鸭子会表现良好。它与二分搜索一样快,因为它构建了基于树的索引。

pip install ducks

from ducks import Dex

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

# build index on 'offset'
dex = Dex(test_data, ['offset'])

dex[{'offset': {'>': 1900}}] 
# result: [{'offset': 2117, 'data': 30}, {'offset': 4055, 'data': 30000}]

Ducks还可以通过多个属性进行搜索,例如:

# build a Dex on 'offset' and 'data'
dex = Dex(test_data, ['offset', 'data'])
dex[{'offset': {'>': 1900}, 'data': {'<': 50}}]
# result: [{'offset': 2117, 'data': 30}]

For range queries over a list of dicts, ducks will perform well. It's as fast as a binary search because it builds a tree-based index.

pip install ducks

from ducks import Dex

test_data = [
    { 'offset':0, 'data':1500 },
    { 'offset':1270, 'data':120 },
    { 'offset':2117, 'data':30 },
    { 'offset':4055, 'data':30000 },
]

# build index on 'offset'
dex = Dex(test_data, ['offset'])

dex[{'offset': {'>': 1900}}] 
# result: [{'offset': 2117, 'data': 30}, {'offset': 4055, 'data': 30000}]

Ducks can also search by multiple attributes, e.g.:

# build a Dex on 'offset' and 'data'
dex = Dex(test_data, ['offset', 'data'])
dex[{'offset': {'>': 1900}, 'data': {'<': 50}}]
# result: [{'offset': 2117, 'data': 30}]
℉服软 2024-08-10 03:04:39

您还可以使用 Python 的众多 SortedDict 实现之一来管理您的 test_data。排序字典按键对元素进行排序并维护到值的映射。一些实现还支持对键进行二等分操作。例如,Python Sortedcontainers 模块 有一个 SortedDict 满足您的要求。

在您的情况下,它看起来像:

from sortedcontainers import SortedDict
offset_map = SortedDict((item['offset'], item['data']) for item in test_data)
index = offset_map.bisect(1275)
key = offset_map.iloc[index]
print offset_map[key]
# 120

SortedDict 类型有一个二分函数,它返回所需键的二分索引。通过该索引,您可以查找实际的键。通过该密钥,您可以获得该值。

所有这些操作在排序容器中都非常快,这也可以在纯 Python 中方便地实现。还有一个性能比较,其中讨论了其他选择并具有基准数据。

You could also use one of Python's many SortedDict implementations to manage your test_data. A sorted dict sorts the elements by key and maintains a mapping to a value. Some implementations also support a bisect operation on the keys. For example, the Python sortedcontainers module has a SortedDict that meets your requirements.

In your case it would look something like:

from sortedcontainers import SortedDict
offset_map = SortedDict((item['offset'], item['data']) for item in test_data)
index = offset_map.bisect(1275)
key = offset_map.iloc[index]
print offset_map[key]
# 120

The SortedDict type has a bisect function which returns the bisected index of the desired key. With that index, you can lookup the actual key. And with that key you can get the value.

All of these operations are very fast in sortedcontainers which is also conveniently implemented in pure-Python. There's a performance comparison too which discusses other choices and has benchmark data.

可是我不能没有你 2024-08-10 03:04:39

当您说实际数据可能更长时,这是否会阻止您保留手头的偏移值列表?

offset_values = [i['offset'] for i in test_data]
bisect.bisect(offset_values, 1900)

不过你的方法对我来说似乎不错。

When you say the real data could be much longer, does that prevent you from keeping a list of offset values on hand?

offset_values = [i['offset'] for i in test_data]
bisect.bisect(offset_values, 1900)

Your method seems fine to me though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文