列表推导式:对组件的引用
总之:我需要编写一个列表理解,其中我引用列表理解正在创建的列表。
这可能不是你每天都需要做的事情,但我也不认为这有什么不寻常。
也许这里没有答案——不过,请不要告诉我应该使用for循环。这可能是正确的,但没有帮助。原因在于问题域:这行代码是 ETL 模块的一部分,因此性能是相关的,因此需要避免创建临时容器 - 因此我希望在信用证中编写此步骤。如果 for 循环在这里对我有用,我就只编写一个。
无论如何,我无法编写这个特定的列表理解。原因:我需要编写的表达式具有以下形式:
[ some_function(s) for s in raw_data if s not in this_list ]
在该伪代码中,“this_list”指的是通过评估该列表理解而创建的列表。这就是我陷入困境的原因 - 因为在评估我的列表理解之前 this_list 不会构建,并且因为在我需要引用它时该列表尚未构建,所以我不知道如何引用它。
到目前为止我所考虑的(这可能基于一个或多个错误的假设,尽管我不知道具体在哪里):
Python解释器没有 给出这个正在建设中的列表 一个名字?我想是的
那个临时名称可能已被占用 来自用于构建的某些绑定方法 我的列表('sum'?)
但即使我不厌其烦地 找到该绑定方法并假设 这确实是临时名称 由 python 解释器用来 请参阅下面的列表 建筑,我很确定你 无法引用绑定方法 直接地;我不知道这样的 明确的规则,但那些方法(在 至少我实际上拥有的少数几个 查看)不是有效的 python 句法。我猜有一个原因 这样我们就不会将它们写入 我们的代码。
这就是我所谓的推理链条,它让我得出结论,或者至少猜测,我已经把自己逼到了一个角落。尽管如此,我仍然认为在转身走向不同的方向之前我应该向社区核实这一点。
In sum: I need to write a List Comprehension in which i refer to list that is being created by the List Comprehension.
This might not be something you need to do every day, but i don't think it's unusual either.
Maybe there's no answer here--still, please don't tell me i ought to use a for loop. That might be correct, but it's not helpful. The reason is the problem domain: this line of code is part of an ETL module, so performance is relevant, and so is the need to avoid creating a temporary container--hence my wish to code this step in a L/C. If a for loop would work for me here, i would just code one.
In any event, i am unable to write this particular list comprehension. The reason: the expression i need to write has this form:
[ some_function(s) for s in raw_data if s not in this_list ]
In that pseudo-code, "this_list" refers to the list created by evaluating that list comprehension. And that's why i'm stuck--because this_list isn't built until my list comprehension is evaluated, and because this list isn't yet built by the time i need to refer to it, i don't know how to refer to it.
What i have considered so far (and which might be based on one or more false assumptions, though i don't know exactly where):
doesn't the python interpreter have
to give this list-under-construction
a name? i think sothat temporary name is probably taken
from some bound method used to build
my list ('sum'?)but even if i went to the trouble to
find that bound method and assuming
that it is indeed the temporary name
used by the python interpreter to
refer to the list while it is under
construction, i am pretty sure you
can't refer to bound methods
directly; i'm not aware of such an
explicit rule, but those methods (at
least the few that i've actually
looked at) are not valid python
syntax. I'm guessing one reason why
is so that we do not write them into
our code.
so that's the chain of my so-called reasoning, and which has led me to conclude, or at least guess, that i have coded myself into a corner. Still i thought i ought to verify this with the Community before turning around and going a different direction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
曾经有一种方法可以使用未记录的事实来实现此目的,即在构建列表时,其值存储在名为
_[1].__self__
的局部变量中。然而,它在 Python 2.7 中停止工作(也许更早,我没有密切关注)。如果您首先设置外部数据结构,则可以在单个列表理解中执行您想要的操作。由于您所有的伪代码似乎都与
this_list
一起检查它以查看每个s
是否已经在其中 - 即成员资格测试 - 我已经更改了它作为优化,将其放入名为seen
的set
中(如果列表很大,则检查list
中的成员资格可能会非常慢)。我的意思是:如果您无权访问
some_function
,您可以在自己的包装函数中调用它,并将其返回值添加到seen
集中在归还之前。即使它不是列表理解,我也会将整个事情封装在一个函数中,以便更容易重用:
在任何一种情况下,我都觉得奇怪的是,在您想要引用的伪代码中构建的列表不是' t 由
raw_data
值的子集组成,而是对每个值调用some_function
的结果——即转换后的数据——这自然会让人想知道some_function
的作用是使其返回值可能与现有raw_data
项的值匹配。There used to be a way to do this using the undocumented fact that while the list was being built its value was stored in a local variable named
_[1].__self__
. However that quit working in Python 2.7 (maybe earlier, I wasn't paying close attention).You can do what you want in a single list comprehension if you set up an external data structure first. Since all your pseudo code seemed to be doing with
this_list
was checking it to see if eachs
was already in it -- i.e. a membership test -- I've changed it into aset
namedseen
as an optimization (checking for membership in alist
can be very slow if the list is large). Here's what I mean:If you don't have access to
some_function
, you could put a call to it in your own wrapper function that added its return value to theseen
set before returning it.Even though it wouldn't be a list comprehension, I'd encapsulate the whole thing in a function to make reuse easier:
In either case, I find it odd that the list being built in your pseudo code that you want to reference isn't comprised of a subset of
raw_data
values, but rather the result of callingsome_function
on each of them -- i.e. transformed data -- which naturally makes one wonder whatsome_function
does such that its return value might match an existingraw_data
item's value.我不明白为什么你需要一次性执行此操作。首先迭代初始数据以消除重复项 - 或者更好的是,按照 KennyTM 建议将其转换为
集合
- 然后进行列表理解。请注意,即使您可以引用“正在构建的列表”,您的方法仍然会失败,因为
s
无论如何都不在列表中 -some_function(s)
的结果是。I don't see why you need to do this in one go. Either iterate through the initial data first to eliminate duplicates - or, even better, convert it to a
set
as KennyTM suggests - then do your list comprehension.Note that even if you could reference the "list under construction", your approach would still fail because
s
is not in the list anyway - the result ofsome_function(s)
is.据我所知,在构建列表理解时无法访问它。
正如 KennyTM 提到的(如果条目的顺序不相关),那么您可以使用
set
代替。如果您使用的是 Python 2.7/3.1 及更高版本,您甚至可以获得集合推导式:否则,
for
循环也没有那么糟糕(尽管它会扩展得非常厉害)As far as I know, there is no way to access a list comprehension as it's being built.
As KennyTM mentioned (and if the order of the entries is not relevant), then you can use a
set
instead. If you're on Python 2.7/3.1 and above, you even get set comprehensions:Otherwise, a
for
loop isn't that bad either (although it will scale terribly)你为什么不简单地这样做:
[ some_function(s) for s in set(raw_data) ]
这应该可以满足你的要求。除非您需要保留前一个列表的顺序。
Why don't you simply do:
[ some_function(s) for s in set(raw_data) ]
That should do what you are asking for. Except when you need to preserve the order of the previous list.