Python 是否实习生字符串?

发布于 2025-01-10 19:33:22 字数 567 浏览 3 评论 0原文

在 Java 中,显式声明的字符串由 JVM 保留,因此同一字符串的后续声明会产生两个指向同一字符串实例的指针,而不是两个单独(但相同)的字符串。

例如:

public String baz() {
    String a = "astring";
    return a;
}

public String bar() {
    String b = "astring"
    return b;
}

public void main() {
    String a = baz()
    String b = bar()
    assert(a == b) // passes
}

我的问题是,CPython(或任何其他 Python 运行时)对字符串做同样的事情吗?例如,如果我有一些类:

class example():
    def __init__():
        self._inst = 'instance' 

并创建该类的 10 个实例,每个实例都会有一个实例变量引用内存中的同一字符串,还是最终会得到 10 个单独的字符串?

In Java, explicitly declared Strings are interned by the JVM, so that subsequent declarations of the same String results in two pointers to the same String instance, rather than two separate (but identical) Strings.

For example:

public String baz() {
    String a = "astring";
    return a;
}

public String bar() {
    String b = "astring"
    return b;
}

public void main() {
    String a = baz()
    String b = bar()
    assert(a == b) // passes
}

My question is, does CPython (or any other Python runtime) do the same thing for strings? For example, if I have some class:

class example():
    def __init__():
        self._inst = 'instance' 

And create 10 instances of this class, will each one of them have an instance variable referring to the same string in memory, or will I end up with 10 separate strings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

天生の放荡 2025-01-17 19:33:22

这称为实习,是的,Python 在某种程度上确实会为作为字符串文字创建的较短字符串执行此操作。请参阅关于不可变字符串的更改 id一些讨论。

实习是依赖于运行时的,没有标准。实习始终是内存使用和检查是否创建相同字符串的成本之间的权衡。有 sys.intern() 函数< /a> 来强制解决这个问题(如果您愿意的话),其中记录了 Python 实习生自动为您所做的一些工作:

通常,Python 程序中使用的名称会自动驻留,并且用于保存模块、类或实例属性的字典具有驻留键。

请注意,Python 2 的 intern() 函数曾经是内置函数,无需导入。

This is called interning, and yes, Python does do this to some extent, for shorter strings created as string literals. See About the changing id of an immutable string for some discussion.

Interning is runtime dependent, there is no standard for it. Interning is always a trade-off between memory use and the cost of checking if you are creating the same string. There is the sys.intern() function to force the issue if you are so inclined, which documents some of the interning Python does for you automatically:

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

Note that Python 2 the intern() function used to be a built-in, no import necessary.

北城孤痞 2025-01-17 19:33:22

一个相当简单的判断方法是使用 id() 。然而,正如 @MartijnPieters 提到的,这是依赖于运行时的。

class example():

    def __init__(self):
        self._inst = 'instance'

for i in xrange(10):
    print id(example()._inst)

A fairly easy way to tell is by using id(). However as @MartijnPieters mentions, this is runtime dependent.

class example():

    def __init__(self):
        self._inst = 'instance'

for i in xrange(10):
    print id(example()._inst)
戴着白色围巾的女孩 2025-01-17 19:33:22

有些字符串被保留在 python 中。在编译 python 代码时,标识符会被保留,例如变量名、函数名、类名。

满足以下划线或字符串开头且仅包含下划线、字符串和数字的标识符规则的字符串将被保留:

a="hello"
b="hello"

由于字符串是不可变的,因此 python 在这里共享内存引用,

a is b ===> True

但是如果我们有

a="hello world"
b="hello world"

“hello world”不符合标识符规则,a和b没有被实习。

a is b  ===> False

您可以使用 sys.intern() 来实习它们。如果代码中有大量字符串重复,请使用此方法。

a=sys.intern("hello world")
b=sys.intern("hello world")

现在
a是b ===>真的

Some strings are interned in python. As the python code compiled, identifiers are interned e.g. variable names, function names, class names.

Strings that meet identifier rules which are starts with underscore or string and contains only underscore, string and number, are interned:

a="hello"
b="hello"

Since strings are immutable python shares the memory references here and

a is b ===> True

But if we had

a="hello world"
b="hello world"

since "hello world" does not meet the identifier rules, a and b are not interned.

a is b  ===> False

You can intern those with sys.intern(). use this method if you have a lot of string repetition in your code.

a=sys.intern("hello world")
b=sys.intern("hello world")

now
a is b ===> True

带刺的爱情 2025-01-17 19:33:22
  • 所有长度为 0 和长度为 1 的字符串都会被保留。
  • 字符串在编译时被实习('wtf' 将被实习,但 ''.join(['w', 't', 'f'] 不会被实习)
  • 不是由 ASCII 字母、数字或下划线组成的字符串,这解释了为什么“wtf!”没有被拘留,因为

https://www.codementor。 io/satwikkansal/do-you-really-think-you-know-strings-in-python-fnxh8mtha

上面的文章解释了 python 中的字符串实习。文章中明确定义了一些例外情况。

  • All length 0 and length 1 strings are interned.
  • Strings are interned at compile time ('wtf' will be interned but ''.join(['w', 't', 'f'] will not be interned)
  • Strings that are not composed of ASCII letters, digits or underscores, are not interned. This explains why 'wtf!' was not interned due to !.

https://www.codementor.io/satwikkansal/do-you-really-think-you-know-strings-in-python-fnxh8mtha

The above article explains the string interning in python. There are some exceptions which are defined clearly in the article.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文