“是”是指 运算符对整数的行为异常

发布于 2024-07-09 08:48:05 字数 571 浏览 17 评论 0 原文

为什么以下代码在 Python 中表现异常?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

我正在使用Python 2.5.2。 尝试一些不同版本的 Python,Python 2.3.3 似乎在 99 到 100 之间显示了上述行为。

基于上述内容,我可以假设 Python 内部实现了“小”整数以与大整数不同的方式存储整数和 is 运算符可以区分。 为什么会出现泄漏抽象? 当我事先不知道它们是否是数字时,比较两个任意对象以查看它们是否相同的更好方法是什么?

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.

Based on the above, I can hypothesize that Python is internally implemented such that "small" integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don't know in advance whether they are numbers or not?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

萌面超妹 2024-07-16 08:48:05

看看这个:

>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False

这是我在 " 的文档中找到的内容普通整数对象”

当前实现为 -5256 之间的所有整数保留一个整数对象数组。 当您在该范围内创建 int 时,您实际上只是返回对现有对象的引用。

因此,整数 256 是相同,但 257 则不同。 这是 CPython 实现细节,不保证其他 Python 实现也同样如此。

Take a look at this:

>>> a = 256
>>> b = 256
>>> id(a) == id(b)
True
>>> a = 257
>>> b = 257
>>> id(a) == id(b)
False

Here's what I found in the documentation for "Plain Integer Objects":

The current implementation keeps an array of integer objects for all integers between -5 and 256. When you create an int in that range you actually just get back a reference to the existing object.

So, integers 256 are identical, but 257 are not. This is a CPython implementation detail, and not guaranteed for other Python implementations.

凉城凉梦凉人心 2024-07-16 08:48:05

Python 的“is”运算符对整数的行为异常?

总之 - 让我强调一下:不要使用 is 来比较整数。

这不是您应该抱有任何期望的行为。

相反,请使用 ==!= 分别比较相等和不相等。 例如:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

解释

要了解这一点,您需要了解以下内容。

首先,is 是做什么的? 它是一个比较运算符。 来自文档

运算符 isis not 测试对象身份:x is y is true
当且仅当 x 和 y 是同一个对象。 x 不是 y 产生
逆真值。

因此以下内容是等效的。

>>> a is b
>>> id(a) == id(b)

来自文档

id
返回对象的“身份”。 这是一个整数(或长
整数),保证该对象是唯一且恒定的
在其生命周期内。 两个具有不重叠生命周期的对象可能
具有相同的 id() 值。

请注意,CPython(Python 的参考实现)中对象的 id 是内存中的位置,这一事实是一个实现细节。 Python 的其他实现(例如 Jython 或 IronPython)可以轻松地为 id 提供不同的实现。

那么 is 的用例是什么? PEP8 描述

与诸如 None 之类的单例的比较应始终使用 is
不是,绝不是相等运算符。

提出并陈述以下问题(带代码):

为什么以下代码在 Python 中表现异常?

<前><代码>>>> 一个= 256
>>>>> b = 256
>>>>> a 是 b
True # 这是预期结果

不是预期的结果。 为什么会被期待呢? 它仅意味着 ab 引用的值为 256 的整数是同一整数实例。 整数在 Python 中是不可变的,因此它们不能改变。 这应该不会对任何代码产生影响。 这不应该是预期的。 这只是一个实现细节。

但也许我们应该庆幸的是,每次我们声明一个值等于 256 时,内存中并没有一个新的单独实例。

<前><代码>>>> 一个= 257
>>>>> b = 257
>>>>> a 是 b
错误 # 这里发生了什么? 为什么这是假的?

看起来我们现在在内存中有两个独立的整数实例,其值为 257。 由于整数是不可变的,这会浪费内存。 希望我们不会浪费太多。 我们可能不是。 但这种行为并没有得到保证。

<前><代码>>>> 257 是 257
True # 但字面数字比较正确

嗯,这看起来像你的 Python 的特定实现试图变得聪明,除非必须,否则不会在内存中创建冗余值的整数。 您似乎表明您正在使用 Python 的引用实现,即 CPython。 适合 CPython。

如果 CPython 可以全局执行此操作,如果它可以便宜地执行此操作(因为查找会产生成本),也许另一种实现可能会更好,这可能会更好。

但至于对代码的影响,您不应该关心整数是否是整数的特定实例。 您应该只关心该实例的值是什么,并且您可以使用普通的比较运算符,即 ==

is 的作用是

检查两个对象的 id 是否相同。 在 CPython 中,id 是内存中的位置,但在另一个实现中它可能是其他一些唯一标识号。 用代码重申这一点:

>>> a is b

相同

>>> id(a) == id(b)

与为什么我们要使用 is then?

? 相对于检查两个非常长的字符串值是否相等来说,这可以是非常快速的检查。 但由于它适用于对象的唯一性,因此我们的用例有限。 事实上,我们主要想用它来检查 None,这是一个单例(内存中某个位置存在的唯一实例)。 如果有可能将它们混为一谈,我们可能会创建其他单例,我们可以使用 is 进行检查,但这些相对较少。 这是一个示例(适用于 Python 2 和 3)例如

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

因此我们看到,使用 is 和哨兵,我们能够区分 bar 何时为不带参数调用以及使用 None 调用时。 这些是 is 的主要用例 - 不要使用它来测试整数、字符串、元组或其他类似内容的相等性。

Python's “is” operator behaves unexpectedly with integers?

In summary - let me emphasize: Do not use is to compare integers.

This isn't behavior you should have any expectations about.

Instead, use == and != to compare for equality and inequality, respectively. For example:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

Explanation

To know this, you need to know the following.

First, what does is do? It is a comparison operator. From the documentation:

The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.

And so the following are equivalent.

>>> a is b
>>> id(a) == id(b)

From the documentation:

id
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or
is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let's hope we're not wasting a lot of it. We're probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What is does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use is then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here's an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is - do not use it to test for equality of integers, strings, tuples, or other things like these.

你的他你的她 2024-07-16 08:48:05

我迟到了,但是你想要一些答案的来源吗?我会尝试以介绍性的方式表达这一点,以便更多的人可以跟随。


CPython 的一个好处是您可以实际看到其源代码。 我将使用 3.5 版本的链接,但找到相应的 2.x 版本很简单。

在 CPython 中,处理创建新 int 对象的 C-API 函数是 PyLong_FromLong(long v)。 该函数的描述是:

当前的实现为 -5 到 256 之间的所有整数保留一个整数对象数组,当您创建该范围内的 int 时,您实际上只是返回对现有对象的引用。 所以应该可以改变1的值。我怀疑Python在这种情况下的行为是未定义的。 :-)

(我的斜体)

不了解你,但我看到这个并想:让我们找到那个数组!

如果你还没有摆弄实现 CPython 的 C 代码你应该< /em>; 一切都井然有序且可读。 对于我们的例子,我们需要查看 Objects 子目录< 主源代码目录树的 /a>。

PyLong_FromLong 处理 long 对象,因此不难推断出我们需要查看内部 longobject.c。 往里面看,你可能会觉得事情很混乱; 是的,但不用担心,我们正在寻找的函数令人不寒而栗 230线等待我们查看。 这是一个很小的函数,因此主体(不包括声明)很容易粘贴到这里:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

现在,我们不是 C master-code-haxxorz 但我们也不傻,我们可以看到 CHECK_SMALL_INT(ival); 诱惑地偷看我们; 我们可以理解这与此有关。 让我们看看:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

这是一个调用的宏如果值 ival 满足条件,则函数 get_small_int 会执行:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

那么 NSMALLNEGINTSNSMALLPOSINTS 是什么? 宏! 它们在这里

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

所以我们的条件是 if (-5 <= ival && ival < 257) 调用 get_small_int

接下来让我们看看 get_small_int 的所有内容荣耀(好吧,我们只看它的主体,因为那是有趣的地方):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

好的,声明一个PyObject,断言前面的条件成立并执行赋值:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

< code>small_ints 看起来很像我们一直在寻找的数组,确实如此! 我们可以只阅读该死的文档,我们会一直都知道!

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

是的,这就是我们的人。 当您想要在 [NSMALLNEGINTS, NSMALLPOSINTS) 范围内创建一个新的 int 时,您只需取回对已预先分配的现有对象的引用即可。

由于引用引用的是同一个对象,因此直接发出 id() 或使用 is 检查身份将返回完全相同的结果。

但是,它们什么时候分配?

_PyLong_Init 中初始化期间 Python 会很乐意进入 for 循环来为您执行此操作:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

查看源代码以读取循环体!

我希望我的解释能让你现在清楚地了解事情(双关语显然是有意的)。


但是,257 就是 257? 这是怎么回事?

这实际上更容易解释,而且我已经尝试这样做了; 这是因为 Python 将作为单个块执行此交互式语句:

>>> 257 is 257

在编译此语句期间,CPython 将看到您有两个匹配的文字,并将使用表示 257 的相同 PyLongObject 。 如果您自己进行编译并检查其内容,您可以看到这一点:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

当 CPython 执行操作时,它现在将加载完全相同的对象:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

因此 is 将返回 True

I'm late but, you want some source with your answer? I'll try and word this in an introductory manner so more folks can follow along.


A good thing about CPython is that you can actually see the source for this. I'm going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.

In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

(My italics)

Don't know about you but I see this and think: Let's find that array!

If you haven't fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.

PyLong_FromLong deals with long objects so it shouldn't be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we're looking for is chilling at line 230 waiting for us to check it out. It's a smallish function so the main body (excluding declarations) is easily pasted here:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

Now, we're no C master-code-haxxorz but we're also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let's check it out:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

So it's a macro that calls function get_small_int if the value ival satisfies the condition:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

So our condition is if (-5 <= ival && ival < 257) call get_small_int.

Next let's look at get_small_int in all its glory (well, we'll just look at its body because that's where the interesting things are):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints looks a lot like that array we've been searching for, and it is! We could've just read the damn documentation and we would've know all along!:

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you'll just get back a reference to an already existing object that has been preallocated.

Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.

But, when are they allocated??

During initialization in _PyLong_Init Python will gladly enter in a for loop to do this for you:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

Check out the source to read the loop body!

I hope my explanation has made you C things clearly now (pun obviously intented).


But, 257 is 257? What's up?

This is actually easier to explain, and I have attempted to do so already; it's due to the fact that Python will execute this interactive statement as a single block:

>>> 257 is 257

During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

When CPython does the operation, it's now just going to load the exact same object:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

So is will return True.

ˇ宁静的妩媚 2024-07-16 08:48:05

这取决于您是否要查看两个事物是否相等,或者是同一个对象。

is 检查它们是否是同一个对象,而不仅仅是相等。 为了提高空间效率,小整数可能指向相同的内存位置

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

。您应该使用 == 来比较任意对象的相等性。 您可以使用 __eq____ne__ 属性指定行为。

It depends on whether you're looking to see if 2 things are equal, or the same object.

is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.

岁月打碎记忆 2024-07-16 08:48:05

您可以签入源文件intobject.c ,Python 会缓存小整数以提高效率。 每次创建对小整数的引用时,您引用的是缓存的小整数,而不是新对象。 257不是一个小整数,所以它是作为不同的对象来计算的。

最好使用 == 来达到此目的。

As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.

It is better to use == for that purpose.

流绪微梦 2024-07-16 08:48:05

我认为你的假设是正确的。 尝试 id(对象的标识):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

数字 <= 255 似乎被视为文字,而上面的任何内容都会被不同地对待!

I think your hypotheses is correct. Experiment with id (identity of object):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

It appears that numbers <= 255 are treated as literals and anything above is treated differently!

巷子口的你 2024-07-16 08:48:05

还有一个问题在任何现有答案中都没有指出。 Python 允许合并任意两个不可变值,并且预先创建的小 int 值并不是发生这种情况的唯一方法。 Python 实现永远无法保证做到这一点,但它们都不仅仅针对小整数。


一方面,还有一些其他预先创建的值,例如空的 tuplestrbytes 以及一些短字符串 (在 CPython 3.6 中,它是 256 个单字符 Latin-1 字符串)。 例如:

>>> a = ()
>>> b = ()
>>> a is b
True

而且,即使是非预先创建的值也可以是相同的。 考虑这些示例:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

并且这不限于 int 值:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

显然,CPython 没有为 42.23e100 预先创建 float 值代码>. 那么,这是怎么回事?

CPython 编译器将合并一些已知不可变类型的常量值,例如 intfloatstrbytes、在同一个编译单元中。 对于模块来说,整个模块是一个编译单元,但在交互式解释器中,每条语句都是一个单独的编译单元。 由于 cd 是在单独的语句中定义的,因此它们的值不会合并。 由于 ef 是在同一语句中定义的,因此它们的值会合并。


您可以通过反汇编字节码来查看发生了什么。 尝试定义一个执行 e, f = 128, 128 的函数,然后对其调用 dis.dis,您将看到有一个常量值 (128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

您可能会注意到,编译器已将 128 存储为常量,即使字节码实际上并未使用它,这让您了解 CPython 编译器所做的优化有多么少。 这意味着(非空)元组实际上不会最终合并:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

将其放入函数中,dis 它,然后查看 co_consts — 有一个 1 和一个 2,两个共享相同 12(1, 2) 元组code> 但不相同,并且 ((1, 2), (1, 2)) 元组具有两个不同的相等元组。


CPython 还做了一项优化:字符串驻留。 与编译器的常量折叠不同,这不仅限于源代码文字:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

另一方面,它仅限于 str type,以及内部存储类型“ascii Compact”、“compact”或“legacy Ready”,并且在许多情况下只有“ascii Compact”会被拘留。


无论如何,关于什么值必须是、可能是或不能不同的规则因实现而异,并且在同一实现的版本之间,甚至可能在同一实现的同一副本上运行相同代码之间有所不同。

为了获得乐趣,学习某个特定 Python 的规则是值得的。 但不值得在代码中依赖它们。 唯一安全的规则是:

  • 不要编写假设两个相等但单独创建的不可变值相同的代码(不要使用 x is y,使用 x == y )
  • 不要编写假设两个相等但单独创建的不可变值不同的代码(不要使用 x is not y,使用 x != y

或者,换句话说,仅使用 is 来测试已记录的单例(例如 None)或仅在代码中的一个位置创建的单例(例如 _sentinel = object () 习语)。

There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.


For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:

>>> a = ()
>>> b = ()
>>> a is b
True

But also, even non-pre-created values can be identical. Consider these examples:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

And this isn't limited to int values:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

Obviously, CPython doesn't come with a pre-created float value for 42.23e100. So, what's going on here?

The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren't merged. Since e and f are defined in the same statement, their values are merged.


You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you'll see that there's a single constant value (128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

You may notice that the compiler has stored 128 as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

Put that in a function, dis it, and look at the co_consts—there's a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.


There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

On the other hand, it is limited to the str type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.


At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.

It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:

  • Do not write code that assumes two equal but separately-created immutable values are identical (don't use x is y, use x == y)
  • Do not write code that assumes two equal but separately-created immutable values are distinct (don't use x is not y, use x != y)

Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).

是你 2024-07-16 08:48:05

对于不可变值对象,如整数、字符串或日期时间,对象标识并不是特别有用。 最好考虑一下平等。 身份本质上是值对象的实现细节 - 因为它们是不可变的,所以对同一对象或多个对象进行多个引用之间没有有效的区别。

For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It's better to think about equality. Identity is essentially an implementation detail for value objects - since they're immutable, there's no effective difference between having multiple refs to the same object or multiple objects.

逐鹿 2024-07-16 08:48:05

is is 恒等运算符(功能类似于id(a) == id(b)); 只是两个相等的数字不一定是同一个对象。 出于性能原因,一些小整数恰好被记忆,因此它们往往是相同的(这可以因为它们是不可变的)。

另一方面, PHP 的 === 运算符被描述为检查相等性和类型:x == y 和 type(x) == type(y) 根据 Paulo Freitas 的评论。 这对于普通数字来说就足够了,但与以荒谬的方式定义 __eq__ 的类不同:

class Unequal:
    def __eq__(self, other):
        return False

PHP 显然允许“内置”类(其中我的意思是在 C 级别实现,而不是在 PHP 中实现)。 一个稍微不那么荒谬的用法可能是计时器对象,它每次用作数字时都有不同的值。 我不知道为什么您想要模拟 Visual Basic 的 Now 而不是显示它是使用 time.time() 进行计算。

Greg Hewgill (OP) 发表了一项澄清评论:“我的目标是比较对象同一性,而不是价值平等。除了数字之外,我希望将对象同一性视为价值平等。”

这还有另一个答案,因为我们必须将事物分类为数字或非数字,以选择是否与 ==is 进行比较。 CPython 定义 数字协议,包括 PyNumber_Check,但这不能从 Python 本身访问。

我们可以尝试将 isinstance 与我们知道的所有数字类型一起使用,但这不可避免地是不完整的。 types 模块包含 StringTypes 列表,但不包含 NumberTypes。 从Python 2.6开始,内置的数字类有一个基类 numbers .Number,但它有同样的问题:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

顺便说一下,NumPy 将产生低数字的单独实例。

我实际上不知道这个问题的这个变体的答案。 我想理论上可以使用 ctypes 来调用 PyNumber_Check ,但即使是这个函数 已经引起争议,而且它肯定不可移植。 我们只需要对现在测试的内容不再那么挑剔。

最后,这个问题源于Python最初并没有带有诸如 Scheme's 数字?,或 Haskell 的 类型类 Numis 检查对象身份,而不是值相等。 PHP 也有着丰富多彩的历史,其中 === 显然仅在对象上表现得像 is 在 PHP5 中,但不在 PHP4 中。 这就是跨语言(包括一种语言的版本)迁移的成长烦恼。

is is the identity equality operator (functioning like id(a) == id(b)); it's just that two equal numbers aren't necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).

PHP's === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas' comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:

class Unequal:
    def __eq__(self, other):
        return False

PHP apparently allows the same thing for "built-in" classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it's used as a number. Quite why you'd want to emulate Visual Basic's Now instead of showing that it is an evaluation with time.time() I don't know.

Greg Hewgill (OP) made one clarifying comment "My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value."

This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.

We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

By the way, NumPy will produce separate instances of low numbers.

I don't actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it's certainly not portable. We'll just have to be less particular about what we test for now.

In the end, this issue stems from Python not originally having a type tree with predicates like Scheme's number?, or Haskell's type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).

記憶穿過時間隧道 2024-07-16 08:48:05

字符串也会发生这种情况:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

现在一切看起来都很好。

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

这也是预料之中的。

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

现在这是出乎意料的。

It also happens with strings:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

Now everything seems fine.

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

That's expected too.

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

Now that's unexpected.

等风来 2024-07-16 08:48:05

Python 3.8 中的新增功能:Python 行为的变化

编译器现在在身份检查时生成 SyntaxWarning (<代码>是并且
is not) 与某些类型的文字(例如字符串、整数)一起使用。
这些通常可以在 CPython 中偶然工作,但不能保证
语言规范。 该警告建议用户使用相等测试(==
!=)。

What’s New In Python 3.8: Changes in Python behavior:

The compiler now produces a SyntaxWarning when identity checks (is and
is not) are used with certain types of literals (e.g. strings, ints).
These can often work by accident in CPython, but are not guaranteed by
the language spec. The warning advises users to use equality tests (==
and !=) instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文