如何按字段值过滤对象集合?
在Python中,如何按场值组织和过滤对象集合?我需要通过等于确切的值并小于值来过滤。
以及如何有效地做到这一点?如果我将对象存储在列表中,则需要在整个列表上迭代,并可能持有数十万个对象。
@dataclass
class Person:
name: str
salary: float
is_boss: bool
# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]
# filtering in O(n), sloooooow
target = 100000
filtered_collection = [x for x in collection if salary < target]
PS:实际上,我的用例是组成的由某个字段,即is_boss
,然后由另一个字段过滤,即salary
。怎么做?我是否应该在排序列表上用户itertools.groupby
使我的对象可比较?
How in Python to organize and filter a collection of objects by a field value? I need to filter by being equal to an exact value and by being less than a value.
And how to do it effectively? If I store my objects in a list I need to iterate over a whole list, potentially holding hundreds of thousands of objects.
@dataclass
class Person:
name: str
salary: float
is_boss: bool
# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]
# filtering in O(n), sloooooow
target = 100000
filtered_collection = [x for x in collection if salary < target]
PS: Actually my use case is group by by a certain field, i.e. is_boss
and filter by another, i.e. salary
. How to do that? Should I user itertools.groupby
over sorted lists and make my objects comparable?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您按顺序维护
list
(理想情况下,这意味着很少的插入或删除,因为中间 -list
插入/删除本身为o(o(n)
),您可以在给定的薪水下方找到person
s,Bisect
模块。注意:
键
参数到各种二等>
模块函数仅在3.10开始支持。在先前的版本中,您需要根据person
的富裕比较操作员来定义salary
的,然后搜索伪造的person
对象,或维护丑陋的单独排序<代码>列表 s,salary
单独使用之一,以及Person> Person
对象的并行list
。为了将单个元素添加到
Collection
中,您可以使用Bisect
'sinsort
函数。或者,您可以在批量的list
的末尾添加一堆项目,然后像以前一样在相同的key
上求助于(Python的分类算法,Timsort,timsort,接近<<代码> o(n)性能大部分已经按顺序排列时,因此成本不如您想象的那么高)。我会注意到,实际上,这种情况(可以由多个字段任意订购的大量数据)通常要求数据库;您可以考虑使用
sqlite> sqlite3
如果需要的话,一个更生产级的数据库(例如MySQL或Postgres),通过定义适当的索引,可以让您在任何索引字段上进行o(log n)
select s select ;您可以在提取实际需要使用的数据时将其转换为Person
对象。 True DBMS解决方案提供的B-Trees为您提供o(log n)
在索引字段上进行插入,删除和选择的精力,其中Python内置集合类型可让您选择;插入/删除或搜索中只有一个可以是真正的o(log n)
,而另一个是o(n)
。If you maintain your
list
in sorted order (which ideally means few insertions or removals, because mid-list
insertion/removal is itselfO(n)
), you can find the set ofPerson
s below a given salary with thebisect
module.Note: The
key
argument to the variousbisect
module functions is only supported as of 3.10. In prior versions, you'd need to define the rich comparison operators forPerson
in terms ofsalary
and search for a faked outPerson
object, or maintain ugly separate sortedlist
s, one ofsalary
alone, and a parallellist
of thePerson
objects.For adding individual elements to the
collection
, you could usebisect
'sinsort
function. Or you could just add a bunch of items to the end of thelist
in bulk and resort it on the samekey
as before (Python's sorting algorithm, TimSort, gets nearO(n)
performance when the collection is mostly in order already, so the cost is not as high as you might think).I'll note that in practice, this sort of scenario (massive data that can be arbitrarily ordered by multiple fields) usually calls for a database; you might consider using
sqlite3
(eventually switching to a more production-grade database like MySQL or PostGres if needed), which, with appropriate indexes defined, would let you doO(log n)
SELECT
s on any indexed field; you could convert toPerson
objects on extraction for the data you actually need to work with. The B-trees that true DBMS solutions provide get youO(log n)
effort for inserts, deletes and selects on the index fields, where Python built-in collection types make you choose; only one of insertion/deletion or searching can be trulyO(log n)
, while the other isO(n)
.阵列具有一个排序方法 - 您要做的就是创建一个函数,如果对象大于另一个对象,则可以描述一个函数 - 让我向您展示
Arrays have a sort method - All you have to do is create a function that detirmes if an object is greater than another object - let me show you