系列的真实价值是模棱两可的。使用A.Empty,A.Bool(),A.Item(),A.Any()或A.Al()
我想用或
条件过滤我的数据框,以使行与特定列的值保持在范围之外的特定列值[ - 0.25,0.25]
。我尝试了:
df = df[(df['col'] < -0.25) or (df['col'] > 0.25)]
但是我得到了错误:
valueerror:系列的真实价值是模棱两可的。使用A.Empty,A.Bool(),A.Item(),a.any()或a.all()。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
如果您有多个值:
如果它只是一个值:
If you have more than one value:
If it’s only a single value:
我在此命令中遇到了一个错误:
但是当我将其更改为此时,它有效:
I was getting an error in this command:
But it worked when I changed it to this:
您需要使用位运算符
|
而不是或
和&amp;
而不是pandas中的和
。您不能简单地使用Python的Bool语句。对于许多复杂的过滤,请创建
掩码
并在数据框架上应用掩码。将所有查询放在面具中并涂上。认为,
You need to use bitwise operators
|
instead ofor
and&
instead ofand
in pandas. You can't simply use the bool statements from python.For much complex filtering, create a
mask
and apply the mask on the dataframe.Put all your query in the mask and apply it. Suppose,
在熊猫数据框架中工作时,我遇到了同样的问题。
我已经使用过: numpy.logical.logical.and.logical_and
我正在尝试在这里尝试要选择具有
41D7853
和degreee_type的行,而不是认证
。如下所示:
如果我尝试像以下内容一样编写代码:
我们将获得错误:
我使用 numpy.logical_and 它对我有用。
I have faced the same issue while working in the Panda dataframe.
I have used: numpy.logical_and:
Here I am trying to select the row with Id matched with
41d7853
and degreee_type not withCertification
.Like below:
If I try to write code like the below:
We will get the error:
I have used numpy.logical_and it worked for me.
我将尝试给出三种最常见方法的基准(也提到上述):
结果:
但是,
*
在熊猫系列中不支持,而numpy阵列比熊猫数据框架快(围绕着熊猫慢1000倍,请参阅编号):结果:
注意:添加一行代码
x = x.to_numpy()
将需要大约20 µs。对于那些喜欢
%TimeIt
的人:结果:
I'll try to give the benchmark of the three most common way (also mentioned above):
Result:
But,
*
is not supported in Panda Series, and NumPy Array is faster than pandas data frame (around 1000 times slower, see number):Result:
Note: adding one line of code
x = x.to_numpy()
will need about 20 µs.For those who prefer
%timeit
:Result:
我遇到了同样的错误,并被a pyspark dataframe持续了几天。 我能够通过用0 填充Na值成功解决它,因为我正在比较两个字段的整数值。
I encountered the same error and got stalled with a PySpark dataframe for few days. I was able to resolve it successfully by filling na values with 0 since I was comparing integer values from two fields.
一件小事,这浪费了我的时间。
将条件(如果使用“ =”,“”,!=“)在括号中。不这样做也会提出这个例外。
这将起作用:
这不会:
One minor thing, which wasted my time.
Put the conditions (if comparing using " = ", " != ") in parentheses. Failing to do so also raises this exception.
This will work:
This will not:
就我而言,由于此错误正在增加。确保对比较运算符给予相同的数据类型元素以进行比较。
In my case I was having a type value error due to which this error was raising. Make sure the comparison operator been given the same datatype element to compare.
可能出现此错误的另一种情况是,当熊猫单元包含numpy ndarrays时,您想执行比较,例如
&gt;
,==
等。在执行相同的工作之前,它会成为适当的ndarray。
或使用
.str
访问者访问值:Another situation where this error may show up is when a pandas cell contains numpy ndarrays and you want to perform comparisons such as
>
,==
etc.A solution is to convert it into a proper ndarray before performing the same job.
or access values using
.str
accessor:或
和和 python语句需要 truth - 值。对于熊猫,这些被认为是模棱两可的,因此您应该使用“ bitwise”|
(OR)或&amp;
(和)操作:这些数据结构已超载要产生元素
或
或和
。只是为了在此语句中添加更多说明:
当您要获得
pandas.Series.Series
的bool
时,会引发异常>隐式将操作数转换为
):
bool
(您使用了或
,但也发生在和
,,如果 > and除了这四个语句之外,还有几个python函数隐藏了一些
bool
呼叫(例如任何
, /code>,<代码>过滤器,...)。对于pandas.Series
,这些通常不是问题的,但是对于完整性,我想提及这些。就您而言,例外并没有真正的帮助,因为它没有提及正确的替代方案。对于
和
和或,如果要进行元素的比较,则可以使用:numpy.logical.or_or
:或简单的
|
操作员:numpy.logical_and
:或简单的
&amp;
操作员:如果您使用的是操作员,请确保因为
有几个逻辑numpy函数 em>在
pandas.series
上工作。如果您在执行> 时遇到时遇到
时遇到的替代方案,则更适合。我会尽快解释其中的每一个:
如果您想检查您的系列是否为空:
python通常将容器的 gth解释(例如
list
,tuple
,...)为真实值,如果没有明确的布尔解释。因此,如果您想要类似Python的检查,则可以执行:如果X.Size
或(如果不是X.Empty
)而不是,则如果x.sempt>
,则可以。如果您的
系列
包含一个且仅一个布尔值:如果您想先检查 (例如
.bool()
,但即使对非树立内容也有效):如果您想要检查是否或任何项目都不为零,不是空的或非false:
The
or
andand
Python statements require truth-values. For pandas, these are considered ambiguous, so you should use "bitwise"|
(or) or&
(and) operations:These are overloaded for these kinds of data structures to yield the element-wise
or
orand
.Just to add some more explanation to this statement:
The exception is thrown when you want to get the
bool
of apandas.Series
:You hit a place where the operator implicitly converted the operands to
bool
(you usedor
but it also happens forand
,if
andwhile
):Besides these four statements, there are several Python functions that hide some
bool
calls (likeany
,all
,filter
, ...). These are normally not problematic withpandas.Series
, but for completeness I wanted to mention these.In your case, the exception isn't really helpful, because it doesn't mention the right alternatives. For
and
andor
, if you want element-wise comparisons, you can use:numpy.logical_or
:or simply the
|
operator:numpy.logical_and
:or simply the
&
operator:If you're using the operators, then be sure to set your parentheses correctly because of operator precedence.
There are several logical NumPy functions which should work on
pandas.Series
.The alternatives mentioned in the Exception are more suited if you encountered it when doing
if
orwhile
. I'll shortly explain each of these:If you want to check if your Series is empty:
Python normally interprets the
len
gth of containers (likelist
,tuple
, ...) as truth-value if it has no explicit Boolean interpretation. So if you want the Python-like check, you could do:if x.size
orif not x.empty
instead ofif x
.If your
Series
contains one and only one Boolean value:If you want to check the first and only item of your Series (like
.bool()
, but it works even for non-Boolean contents):If you want to check if all or any item is not-zero, not-empty or not-False:
PANDAS使用Bitwise
&amp;
|
。另外,每个条件都应包裹在()
中。这起作用:
但是没有括号的相同查询没有:
Pandas uses bitwise
&
|
. Also, each condition should be wrapped inside( )
.This works:
But the same query without parentheses does not:
对于布尔逻辑,请使用
&amp;
和|
。要查看正在发生的事情,您将获得每次比较的布尔值,例如,
当您有多个条件时,您将返回多个列。这就是为什么联接逻辑模棱两可的原因。使用
和
或或分别处理每列,因此首先需要将该列减少到单个布尔值。例如,查看每个列中的任何值或所有值是否为true。实现同一件事的一种复杂的方法是将所有这些列一起拉链,并执行适当的逻辑。
有关更多详细信息,请参阅
For Boolean logic, use
&
and|
.To see what is happening, you get a column of Booleans for each comparison, e.g.,
When you have multiple criteria, you will get multiple columns returned. This is why the join logic is ambiguous. Using
and
oror
treats each column separately, so you first need to reduce that column to a single Boolean value. For example, to see if any value or all values in each of the columns is True.One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic.
For more details, refer to Boolean Indexing in the documentation.
对于初学者来说,这是在熊猫中做出多种条件时的一个普遍问题。一般而言,有两个可能导致此错误的条件:
条件1:Python操作员优先
有一些可能的方法可以摆脱括号,稍后我将介绍。
条件2:不正确的操作员/语句
如先前的报价所述,您需要使用
|
对或
,,&amp; 用于
和
,以及〜
不
。另一个可能的情况是,如果语句,您正在使用
中的布尔系列。
显然,如果语句接受布尔式表达而不是熊猫系列,则Python
。您应该使用
或在错误消息中列出的方法,根据您的需要将系列转换为值。
pandas.series.series.yany
yany例如:
让我们谈谈在第一种情况下逃脱括号的方法。
使用熊猫数学功能
pandas定义了许多数学功能,包括比较,如下:
pandas.series.gt()代码>
对于大于;
pandas.series.series.le()代码>
对于少,等于;
pandas.series.ge()代码>
对于更大和相等;
pandas.series.series.eq()
对于等于;
结果,您可以使用
使用 noreferrer“> <<代码> pandas.series.between()
如果要在两个值之间选择行,则可以使用
pandas.series.between
:df ['col] .betweew(左,右)
等于(左&lt; = df ['col'])&amp; (df ['col']&lt; = right)
;df ['col] .betweew(左,右,包含='左)
等于(左&lt; = df ['col'])&amp; (df ['col']&lt; right)
;df ['col] .betweew(左,右,包含='right')
等于(左&lt; df ['col'])&amp; (df ['col']&lt; = right)
;df ['col] .betweew(左,右,包含='no oter')
等于(左&lt; df ['col'])&amp; (df ['col']&lt; right)
;使用 ()
文档之前引用的文档方法很好地解释了这一点。
pandas.dataframe.query()
可以帮助您选择带有条件字符串的数据框架。在查询字符串中,您可以同时使用位运算符(&amp;
and|
)及其boolean cousins(and
andor
/代码>)。此外,您可以省略括号,但我不建议出于可读性原因。
使用 ()
pandas.dataframe.eval()
评估描述数据框架上操作的字符串。因此,我们可以使用此方法来构建多种条件。语法与pandas.dataframe.query()
。相同
pandas.dataframe.query()
和pandas.dataframe.eval()
可以做的比我在这里描述的更多的事情。建议您阅读他们的文档并与他们一起玩。This is quite a common question for beginners when making multiple conditions in Pandas. Generally speaking, there are two possible conditions causing this error:
Condition 1: Python Operator Precedence
There is a paragraph of Boolean indexing | Indexing and selecting data — pandas documentation explains this:
There are some possible ways to get rid off the parentheses, and I will cover this later.
Condition 2: Improper Operator/Statement
As is explained in the previous quotation, you need use
|
foror
,&
forand
, and~
fornot
.Another possible situation is that you are using a Boolean Series in an
if
statement.It's clear that the Python
if
statement accepts Boolean-like expression rather than Pandas Series. You should usepandas.Series.any
or methods listed in the error message to convert the Series to a value according to your need.For example:
Let's talk about ways to escape the parentheses in the first situation.
Use Pandas mathematical functions
Pandas has defined a lot of mathematical functions, including comparison, as follows:
pandas.Series.lt()
for less than;pandas.Series.gt()
for greater than;pandas.Series.le()
for less and equal;pandas.Series.ge()
for greater and equal;pandas.Series.ne()
for not equal;pandas.Series.eq()
for equal;As a result, you can use
Use
pandas.Series.between()
If you want to select rows in between two values, you can use
pandas.Series.between
:df['col].between(left, right)
is equal to(left <= df['col']) & (df['col'] <= right)
;df['col].between(left, right, inclusive='left)
is equal to(left <= df['col']) & (df['col'] < right)
;df['col].between(left, right, inclusive='right')
is equal to(left < df['col']) & (df['col'] <= right)
;df['col].between(left, right, inclusive='neither')
is equal to(left < df['col']) & (df['col'] < right)
;Use
pandas.DataFrame.query()
Document referenced before has a chapter The
query()
Method explains this well.pandas.DataFrame.query()
can help you select a DataFrame with a condition string. Within the query string, you can use both bitwise operators (&
and|
) and their boolean cousins (and
andor
). Moreover, you can omit the parentheses, but I don't recommend it for readability reasons.Use
pandas.DataFrame.eval()
pandas.DataFrame.eval()
evaluates a string describing operations on DataFrame columns. Thus, we can use this method to build our multiple conditions. The syntax is the same withpandas.DataFrame.query()
.pandas.DataFrame.query()
andpandas.DataFrame.eval()
can do more things than I describe here. You are recommended to read their documentation and have fun with them.或者,或者,您可以使用运算符模块。更详细的信息是在Python文档中:
Or, alternatively, you could use the operator module. More detailed information is in the Python documentation:
这个出色的答案很好地解释了正在发生的事情并提供了解决方案。我想添加另一个可能在类似情况下可能适合的解决方案:使用
查询
方法:另请参见 索引和选择数据 。
(我目前正在使用的一些数据框的某些测试表明,此方法比在一系列布尔值中使用位运算符慢一点:2&nbsp; ms vs. 870&nbsp; µs)
一条警告 :至少在列名称恰好是Python表达式时,这种情况并不简单。我有名为
wt_38hph_ip_2
,wt_38hph_input_2
和log2(WT_38HPH_IP_2/WT_38HPH_INPUT_2)
wt_38hph_input_2 )&gt; 1)和(wt_38hph_ip_2&gt;wt_38hph_ip_2/ wt_38hph_input_2
value error:“ log2”不是一个受支持的函数
我想这是因为查询解析器试图从前两列中制作某些东西,而不是用名称识别表达式第三列。
提出了可能的解决方法在这里。
This excellent answer explains very well what is happening and provides a solution. I would like to add another solution that might be suitable in similar cases: using the
query
method:See also Indexing and selecting data.
(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of Booleans: 2 ms vs. 870 µs)
A piece of warning: At least one situation where this is not straightforward is when column names happen to be Python expressions. I had columns named
WT_38hph_IP_2
,WT_38hph_input_2
andlog2(WT_38hph_IP_2/WT_38hph_input_2)
and wanted to perform the following query:"(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
I obtained the following exception cascade:
KeyError: 'log2'
UndefinedVariableError: name 'log2' is not defined
ValueError: "log2" is not a supported function
I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column.
A possible workaround is proposed here.