如何按字段值过滤对象集合？

发布于 2025-02-09 11:01:59 字数 590 浏览 2 评论 0原文

在Python中，如何按场值组织和过滤对象集合？我需要通过等于确切的值并小于值来过滤。

以及如何有效地做到这一点？如果我将对象存储在列表中，则需要在整个列表上迭代，并可能持有数十万个对象。

@dataclass
class Person:
  name: str
  salary: float
  is_boss: bool


# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]

# filtering in O(n), sloooooow
target = 100000
filtered_collection = [x for x in collection if salary < target]

PS：实际上，我的用例是组成的由某个字段，即is_boss，然后由另一个字段过滤，即salary。怎么做？我是否应该在排序列表上用户itertools.groupby使我的对象可比较？

原文

How in Python to organize and filter a collection of objects by a field value? I need to filter by being equal to an exact value and by being less than a value.

And how to do it effectively? If I store my objects in a list I need to iterate over a whole list, potentially holding hundreds of thousands of objects.

@dataclass
class Person:
  name: str
  salary: float
  is_boss: bool


# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]

# filtering in O(n), sloooooow
target = 100000
filtered_collection = [x for x in collection if salary < target]

PS: Actually my use case is group by by a certain field, i.e. is_boss and filter by another, i.e. salary. How to do that? Should I user itertools.groupby over sorted lists and make my objects comparable?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆光下的微笑 2025-02-16 11:01:59

如果您按顺序维护list（理想情况下，这意味着很少的插入或删除，因为中间 - list插入/删除本身为o（o（n）），您可以在给定的薪水下方找到person s， Bisect模块。

from bisect import bisect
from operator import attrgetter

# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]
collection.sort(key=attrgetter('salary'))  # O(n log n) initial sort

# filtering searches in O(log n):
target = 100000
filtered_collection = collection[:bisect(collection, target, key=attrgetter('salary'))]

注意：键参数到各种二等>模块函数仅在3.10开始支持。在先前的版本中，您需要根据person的富裕比较操作员来定义salary的，然后搜索伪造的person对象，或维护丑陋的单独排序<代码>列表 s，salary单独使用之一，以及Person> Person对象的并行list。

为了将单个元素添加到Collection中，您可以使用Bisect's insort函数。或者，您可以在批量的list的末尾添加一堆项目，然后像以前一样在相同的key上求助于（Python的分类算法，Timsort，timsort，接近<<代码> o（n）性能大部分已经按顺序排列时，因此成本不如您想象的那么高）。

我会注意到，实际上，这种情况（可以由多个字段任意订购的大量数据）通常要求数据库；您可以考虑使用sqlite> sqlite3 如果需要的话，一个更生产级的数据库（例如MySQL或Postgres），通过定义适当的索引，可以让您在任何索引字段上进行o（log n） select s select ;您可以在提取实际需要使用的数据时将其转换为Person对象。 True DBMS解决方案提供的B-Trees为您提供o（log n）在索引字段上进行插入，删除和选择的精力，其中Python内置集合类型可让您选择；插入/删除或搜索中只有一个可以是真正的o（log n），而另一个是o（n）。

If you maintain your list in sorted order (which ideally means few insertions or removals, because mid-list insertion/removal is itself O(n)), you can find the set of Persons below a given salary with the bisect module.

from bisect import bisect
from operator import attrgetter

# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]
collection.sort(key=attrgetter('salary'))  # O(n log n) initial sort

# filtering searches in O(log n):
target = 100000
filtered_collection = collection[:bisect(collection, target, key=attrgetter('salary'))]

Note: The key argument to the various bisect module functions is only supported as of 3.10. In prior versions, you'd need to define the rich comparison operators for Person in terms of salary and search for a faked out Person object, or maintain ugly separate sorted lists, one of salary alone, and a parallel list of the Person objects.

For adding individual elements to the collection, you could use bisect's insort function. Or you could just add a bunch of items to the end of the list in bulk and resort it on the same key as before (Python's sorting algorithm, TimSort, gets near O(n) performance when the collection is mostly in order already, so the cost is not as high as you might think).

I'll note that in practice, this sort of scenario (massive data that can be arbitrarily ordered by multiple fields) usually calls for a database; you might consider using sqlite3 (eventually switching to a more production-grade database like MySQL or PostGres if needed), which, with appropriate indexes defined, would let you do O(log n) SELECTs on any indexed field; you could convert to Person objects on extraction for the data you actually need to work with. The B-trees that true DBMS solutions provide get you O(log n) effort for inserts, deletes and selects on the index fields, where Python built-in collection types make you choose; only one of insertion/deletion or searching can be truly O(log n), while the other is O(n).

回复收藏 0 原文

当梦初醒 2025-02-16 11:01:59

阵列具有一个排序方法 - 您要做的就是创建一个函数，如果对象大于另一个对象，则可以描述一个函数 - 让我向您展示

class Foo:
    def __init__(bar):
        this.bar = bar

fooArray = [Foo(10),Foo(8),Foo(9)]
def sortFoo(foo):
    return foo.bar

fooArray.sort(key=sortFoo)

Arrays have a sort method - All you have to do is create a function that detirmes if an object is greater than another object - let me show you

class Foo:
    def __init__(bar):
        this.bar = bar

fooArray = [Foo(10),Foo(8),Foo(9)]
def sortFoo(foo):
    return foo.bar

fooArray.sort(key=sortFoo)

回复收藏 0 原文

~没有更多了~