为什么以下代码在 Python 中表现异常?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
我正在使用Python 2.5.2。 尝试一些不同版本的 Python,Python 2.3.3 似乎在 99 到 100 之间显示了上述行为。
基于上述内容,我可以假设 Python 内部实现了“小”整数以与大整数不同的方式存储整数和 is 运算符可以区分。 为什么会出现泄漏抽象? 当我事先不知道它们是否是数字时,比较两个任意对象以查看它们是否相同的更好方法是什么?
Why does the following behave unexpectedly in Python?
>>> a = 256
>>> b = 256
>>> a is b
True # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False # What happened here? Why is this False?
>>> 257 is 257
True # Yet the literal numbers compare properly
I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.
Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is
operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?
发布评论
评论(11)
看看这个:
这是我在 " 的文档中找到的内容普通整数对象”:
因此,整数 256 是相同,但 257 则不同。 这是 CPython 实现细节,不保证其他 Python 实现也同样如此。
Take a look at this:
Here's what I found in the documentation for "Plain Integer Objects":
So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.
总之 - 让我强调一下:不要使用
is
来比较整数。这不是您应该抱有任何期望的行为。
相反,请使用
==
和!=
分别比较相等和不相等。 例如:解释
要了解这一点,您需要了解以下内容。
首先,
is
是做什么的? 它是一个比较运算符。 来自文档:因此以下内容是等效的。
来自文档:
请注意,CPython(Python 的参考实现)中对象的 id 是内存中的位置,这一事实是一个实现细节。 Python 的其他实现(例如 Jython 或 IronPython)可以轻松地为
id
提供不同的实现。那么
is
的用例是什么? PEP8 描述:您
提出并陈述以下问题(带代码):
这不是预期的结果。 为什么会被期待呢? 它仅意味着
a
和b
引用的值为256
的整数是同一整数实例。 整数在 Python 中是不可变的,因此它们不能改变。 这应该不会对任何代码产生影响。 这不应该是预期的。 这只是一个实现细节。但也许我们应该庆幸的是,每次我们声明一个值等于 256 时,内存中并没有一个新的单独实例。
看起来我们现在在内存中有两个独立的整数实例,其值为
257
。 由于整数是不可变的,这会浪费内存。 希望我们不会浪费太多。 我们可能不是。 但这种行为并没有得到保证。嗯,这看起来像你的 Python 的特定实现试图变得聪明,除非必须,否则不会在内存中创建冗余值的整数。 您似乎表明您正在使用 Python 的引用实现,即 CPython。 适合 CPython。
如果 CPython 可以全局执行此操作,如果它可以便宜地执行此操作(因为查找会产生成本),也许另一种实现可能会更好,这可能会更好。
但至于对代码的影响,您不应该关心整数是否是整数的特定实例。 您应该只关心该实例的值是什么,并且您可以使用普通的比较运算符,即
==
。is
的作用是检查两个对象的
id
是否相同。 在 CPython 中,id
是内存中的位置,但在另一个实现中它可能是其他一些唯一标识号。 用代码重申这一点:相同
与为什么我们要使用
is
then?? 相对于检查两个非常长的字符串值是否相等来说,这可以是非常快速的检查。 但由于它适用于对象的唯一性,因此我们的用例有限。 事实上,我们主要想用它来检查
None
,这是一个单例(内存中某个位置存在的唯一实例)。 如果有可能将它们混为一谈,我们可能会创建其他单例,我们可以使用is
进行检查,但这些相对较少。 这是一个示例(适用于 Python 2 和 3)例如Which prints:
因此我们看到,使用
is
和哨兵,我们能够区分bar
何时为不带参数调用以及使用None
调用时。 这些是is
的主要用例 - 不要使用它来测试整数、字符串、元组或其他类似内容的相等性。In summary - let me emphasize: Do not use
is
to compare integers.This isn't behavior you should have any expectations about.
Instead, use
==
and!=
to compare for equality and inequality, respectively. For example:Explanation
To know this, you need to know the following.
First, what does
is
do? It is a comparison operator. From the documentation:And so the following are equivalent.
From the documentation:
Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for
id
.So what is the use-case for
is
? PEP8 describes:The Question
You ask, and state, the following question (with code):
It is not an expected result. Why is it expected? It only means that the integers valued at
256
referenced by botha
andb
are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.
Looks like we now have two separate instances of integers with the value of
257
in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.
It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.
But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e.
==
.What
is
doesis
checks that theid
of two objects are the same. In CPython, theid
is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:is the same as
Why would we want to use
is
then?This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for
None
, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check withis
, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.Which prints:
And so we see, with
is
and a sentinel, we are able to differentiate between whenbar
is called with no arguments and when it is called withNone
. These are the primary use-cases foris
- do not use it to test for equality of integers, strings, tuples, or other things like these.我迟到了,但是你想要一些答案的来源吗?我会尝试以介绍性的方式表达这一点,以便更多的人可以跟随。
CPython 的一个好处是您可以实际看到其源代码。 我将使用 3.5 版本的链接,但找到相应的 2.x 版本很简单。
在 CPython 中,处理创建新
int
对象的 C-API 函数是PyLong_FromLong(long v)
。 该函数的描述是:(我的斜体)
不了解你,但我看到这个并想:让我们找到那个数组!
如果你还没有摆弄实现 CPython 的 C 代码你应该< /em>; 一切都井然有序且可读。 对于我们的例子,我们需要查看
Objects
子目录< 主源代码目录树的 /a>。PyLong_FromLong
处理long
对象,因此不难推断出我们需要查看内部longobject.c
。 往里面看,你可能会觉得事情很混乱; 是的,但不用担心,我们正在寻找的函数令人不寒而栗 230线等待我们查看。 这是一个很小的函数,因此主体(不包括声明)很容易粘贴到这里:现在,我们不是 C master-code-haxxorz 但我们也不傻,我们可以看到
CHECK_SMALL_INT(ival);
诱惑地偷看我们; 我们可以理解这与此有关。 让我们看看:这是一个调用的宏如果值
ival
满足条件,则函数get_small_int
会执行:那么
NSMALLNEGINTS
和NSMALLPOSINTS
是什么? 宏! 它们在这里:所以我们的条件是
if (-5 <= ival && ival < 257)
调用get_small_int
。接下来让我们看看
get_small_int
的所有内容荣耀(好吧,我们只看它的主体,因为那是有趣的地方):好的,声明一个
PyObject
,断言前面的条件成立并执行赋值:< code>small_ints 看起来很像我们一直在寻找的数组,确实如此! 我们可以只阅读该死的文档,我们会一直都知道!:
是的,这就是我们的人。 当您想要在
[NSMALLNEGINTS, NSMALLPOSINTS)
范围内创建一个新的int
时,您只需取回对已预先分配的现有对象的引用即可。由于引用引用的是同一个对象,因此直接发出 id() 或使用 is 检查身份将返回完全相同的结果。
但是,它们什么时候分配?
在
_PyLong_Init
中初始化期间 Python 会很乐意进入 for 循环来为您执行此操作:查看源代码以读取循环体!
我希望我的解释能让你现在清楚地了解事情(双关语显然是有意的)。
但是,
257 就是 257
? 这是怎么回事?这实际上更容易解释,而且我已经尝试这样做了; 这是因为 Python 将作为单个块执行此交互式语句:
在编译此语句期间,CPython 将看到您有两个匹配的文字,并将使用表示
257 的相同
。 如果您自己进行编译并检查其内容,您可以看到这一点:PyLongObject
当 CPython 执行操作时,它现在将加载完全相同的对象:
因此
is
将返回True
。I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.
A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.
In CPython, the C-API function that handles creating a new
int
object isPyLong_FromLong(long v)
. The description for this function is:(My italics)
Don't know about you but I see this and think: Let's find that array!
If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the
Objects
subdirectory of the main source code directory tree.PyLong_FromLong
deals withlong
objects so it shouldn't be hard to deduce that we need to peek insidelongobject.c
. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:Now, we're no C master-code-haxxorz but we're also not dumb, we can see that
CHECK_SMALL_INT(ival);
peeking at us all seductively; we can understand it has something to do with this. Let's check it out:So it's a macro that calls function
get_small_int
if the valueival
satisfies the condition:So what are
NSMALLNEGINTS
andNSMALLPOSINTS
? Macros! Here they are:So our condition is
if (-5 <= ival && ival < 257)
callget_small_int
.Next let's look at
get_small_int
in all its glory (well, we'll just look at its body because that's where the interesting things are):Okay, declare a
PyObject
, assert that the previous condition holds and execute the assignment:small_ints
looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:So yup, this is our guy. When you want to create a new
int
in the range[NSMALLNEGINTS, NSMALLPOSINTS)
you'll just get back a reference to an already existing object that has been preallocated.Since the reference refers to the same object, issuing
id()
directly or checking for identity withis
on it will return exactly the same thing.But, when are they allocated??
During initialization in
_PyLong_Init
Python will gladly enter in a for loop to do this for you:Check out the source to read the loop body!
I hope my explanation has made you C things clearly now (pun obviously intented).
But,
257 is 257
? What's up?This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:
During complilation of this statement, CPython will see that you have two matching literals and will use the same
PyLongObject
representing257
. You can see this if you do the compilation yourself and examine its contents:When CPython does the operation, it's now just going to load the exact same object:
So
is
will returnTrue
.这取决于您是否要查看两个事物是否相等,或者是同一个对象。
is
检查它们是否是同一个对象,而不仅仅是相等。 为了提高空间效率,小整数可能指向相同的内存位置。您应该使用
==
来比较任意对象的相等性。 您可以使用__eq__
和__ne__
属性指定行为。It depends on whether you're looking to see if 2 things are equal, or the same object.
is
checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiencyYou should use
==
to compare equality of arbitrary objects. You can specify the behavior with the__eq__
, and__ne__
attributes.您可以签入源文件intobject.c ,Python 会缓存小整数以提高效率。 每次创建对小整数的引用时,您引用的是缓存的小整数,而不是新对象。 257不是一个小整数,所以它是作为不同的对象来计算的。
最好使用
==
来达到此目的。As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.
It is better to use
==
for that purpose.我认为你的假设是正确的。 尝试
id
(对象的标识):数字
<= 255
似乎被视为文字,而上面的任何内容都会被不同地对待!I think your hypotheses is correct. Experiment with
id
(identity of object):It appears that numbers
<= 255
are treated as literals and anything above is treated differently!还有一个问题在任何现有答案中都没有指出。 Python 允许合并任意两个不可变值,并且预先创建的小 int 值并不是发生这种情况的唯一方法。 Python 实现永远无法保证做到这一点,但它们都不仅仅针对小整数。
一方面,还有一些其他预先创建的值,例如空的
tuple
、str
和bytes
以及一些短字符串 (在 CPython 3.6 中,它是 256 个单字符 Latin-1 字符串)。 例如:而且,即使是非预先创建的值也可以是相同的。 考虑这些示例:
并且这不限于
int
值:显然,CPython 没有为
42.23e100
预先创建float
值代码>. 那么,这是怎么回事?CPython 编译器将合并一些已知不可变类型的常量值,例如
int
、float
、str
、bytes
、在同一个编译单元中。 对于模块来说,整个模块是一个编译单元,但在交互式解释器中,每条语句都是一个单独的编译单元。 由于c
和d
是在单独的语句中定义的,因此它们的值不会合并。 由于e
和f
是在同一语句中定义的,因此它们的值会合并。您可以通过反汇编字节码来查看发生了什么。 尝试定义一个执行
e, f = 128, 128
的函数,然后对其调用dis.dis
,您将看到有一个常量值(128, 128)
您可能会注意到,编译器已将
128
存储为常量,即使字节码实际上并未使用它,这让您了解 CPython 编译器所做的优化有多么少。 这意味着(非空)元组实际上不会最终合并:将其放入函数中,
dis
它,然后查看co_consts
— 有一个1
和一个2
,两个共享相同1
和2
的(1, 2)
元组code> 但不相同,并且((1, 2), (1, 2))
元组具有两个不同的相等元组。CPython 还做了一项优化:字符串驻留。 与编译器的常量折叠不同,这不仅限于源代码文字:
另一方面,它仅限于
str
type,以及内部存储类型“ascii Compact”、“compact”或“legacy Ready”,并且在许多情况下只有“ascii Compact”会被拘留。无论如何,关于什么值必须是、可能是或不能不同的规则因实现而异,并且在同一实现的版本之间,甚至可能在同一实现的同一副本上运行相同代码之间有所不同。
为了获得乐趣,学习某个特定 Python 的规则是值得的。 但不值得在代码中依赖它们。 唯一安全的规则是:
x is y
,使用x == y
)x is not y
,使用x != y
)或者,换句话说,仅使用
is
来测试已记录的单例(例如None
)或仅在代码中的一个位置创建的单例(例如_sentinel = object ()
习语)。There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty
tuple
,str
, andbytes
, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:But also, even non-pre-created values can be identical. Consider these examples:
And this isn't limited to
int
values:Obviously, CPython doesn't come with a pre-created
float
value for42.23e100
. So, what's going on here?The CPython compiler will merge constant values of some known-immutable types like
int
,float
,str
,bytes
, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Sincec
andd
are defined in separate statements, their values aren't merged. Sincee
andf
are defined in the same statement, their values are merged.You can see what's going on by disassembling the bytecode. Try defining a function that does
e, f = 128, 128
and then callingdis.dis
on it, and you'll see that there's a single constant value(128, 128)
You may notice that the compiler has stored
128
as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:Put that in a function,
dis
it, and look at theco_consts
—there's a1
and a2
, two(1, 2)
tuples that share the same1
and2
but are not identical, and a((1, 2), (1, 2))
tuple that has the two distinct equal tuples.There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
On the other hand, it is limited to the
str
type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
x is y
, usex == y
)x is not y
, usex != y
)Or, in other words, only use
is
to test for the documented singletons (likeNone
) or that are only created in one place in the code (like the_sentinel = object()
idiom).对于不可变值对象,如整数、字符串或日期时间,对象标识并不是特别有用。 最好考虑一下平等。 身份本质上是值对象的实现细节 - 因为它们是不可变的,所以对同一对象或多个对象进行多个引用之间没有有效的区别。
For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.
is
is 恒等运算符(功能类似于id(a) == id(b)
); 只是两个相等的数字不一定是同一个对象。 出于性能原因,一些小整数恰好被记忆,因此它们往往是相同的(这可以因为它们是不可变的)。另一方面, PHP 的
===
运算符被描述为检查相等性和类型:x == y 和 type(x) == type(y)
根据 Paulo Freitas 的评论。 这对于普通数字来说就足够了,但与以荒谬的方式定义 __eq__ 的类不同:PHP 显然允许“内置”类(其中我的意思是在 C 级别实现,而不是在 PHP 中实现)。 一个稍微不那么荒谬的用法可能是计时器对象,它每次用作数字时都有不同的值。 我不知道为什么您想要模拟 Visual Basic 的
Now
而不是显示它是使用time.time()
进行计算。Greg Hewgill (OP) 发表了一项澄清评论:“我的目标是比较对象同一性,而不是价值平等。除了数字之外,我希望将对象同一性视为价值平等。”
这还有另一个答案,因为我们必须将事物分类为数字或非数字,以选择是否与
==
或is
进行比较。 CPython 定义 数字协议,包括 PyNumber_Check,但这不能从 Python 本身访问。我们可以尝试将
isinstance
与我们知道的所有数字类型一起使用,但这不可避免地是不完整的。 types 模块包含 StringTypes 列表,但不包含 NumberTypes。 从Python 2.6开始,内置的数字类有一个基类numbers .Number
,但它有同样的问题:顺便说一下,NumPy 将产生低数字的单独实例。
我实际上不知道这个问题的这个变体的答案。 我想理论上可以使用 ctypes 来调用 PyNumber_Check ,但即使是这个函数 已经引起争议,而且它肯定不可移植。 我们只需要对现在测试的内容不再那么挑剔。
最后,这个问题源于Python最初并没有带有诸如 Scheme's
数字?
,或 Haskell 的 类型类 Num。is
检查对象身份,而不是值相等。 PHP 也有着丰富多彩的历史,其中===
显然仅在对象上表现得像is
在 PHP5 中,但不在 PHP4 中。 这就是跨语言(包括一种语言的版本)迁移的成长烦恼。is
is the identity equality operator (functioning likeid(a) == id(b)
); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).PHP's
===
operator, on the other hand, is described as checking equality and type:x == y and type(x) == type(y)
as per Paulo Freitas' comment. This will suffice for common numbers, but differ fromis
for classes that define__eq__
in an absurd manner:PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's
Now
instead of showing that it is an evaluation withtime.time()
I don't know.Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."
This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with
==
oris
. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.We could try to use
isinstance
with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base classnumbers.Number
, but it has the same problem:By the way, NumPy will produce separate instances of low numbers.
I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call
PyNumber_Check
, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's
number?
, or Haskell's type class Num.is
checks object identity, not value equality. PHP has a colorful history as well, where===
apparently behaves asis
only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).字符串也会发生这种情况:
现在一切看起来都很好。
这也是预料之中的。
现在这是出乎意料的。
It also happens with strings:
Now everything seems fine.
That's expected too.
Now that's unexpected.
Python 3.8 中的新增功能:Python 行为的变化:
What’s New In Python 3.8: Changes in Python behavior: