字符串驻留和文字字符串声明的搜索成本

发布于 2024-10-18 05:46:21 字数 873 浏览 2 评论 0原文

两个问题。

  1. 当我们声明文字字符串时,我们会在堆的字符串池中查找是否有相同的字符串。 这也是一个interning(String类的方法intern)吗?

  2. 在我看来,每个文字字符串声明都需要二进制搜索或其他东西,所以它的成本至少< em>log(n) 当 n 是池中现有字符串的数量时。而且如果池里的字符串很多的话,可能成本会很高。 (也许是搜索成本和内存的权衡?)从这个角度来看,声明 mant 文字字符串可能是危险的。 这种搜索成本有多重要以及为什么java以这种方式设计(声明文字字符串时搜索池)。

以下是我提到的理解背景。


java.lang.String< 的 JavaDoc /code> 类 状态:

字符串是常量;它们的值在创建后就无法更改。字符串缓冲区支持可变字符串。因为 String 对象是不可变的,所以它们可以共享。

http://www.janeg.ca/scjp/lang/strLiteral.html评论:

换句话说,因为编译器知道字符串的原始值一旦创建就无法更改,因此它可以安全地使用现有数据并避免重复项造成内存混乱。

Two Questions.

  1. When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning (method intern of class String)?

  2. In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool. And if there are many strings in the pool, it may be high cost. (maybe tradeoff of searching cost and memory?) On this point of view, it might be dangerous to declare mant literal strings.
    How significant is this searching cost and why java is designed in this way(searching pool when literal strings are declared).

Following is what I referred to understand background.


The JavaDoc for the java.lang.String class states:

Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.

http://www.janeg.ca/scjp/lang/strLiteral.html comments:

In other words, because the compiler knows the strings original value cannot be changed once it's created it can safely use existing data and avoid cluttering up memory with duplicates.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鲜血染红嫁衣 2024-10-25 05:46:21

您将编译时间复杂性与运行时复杂性混淆了。

当加载类时,是的,它会进行搜索以查看每个文字是否已经存在(尽管我想它会使用哈希图进行 O(1) 查找而不是您的建议)。

当代码运行时,它会引用内存中的字符串,因此与非文字相比没有额外的成本。

所以是的,文字被拘留了。根据 String 的 Javadoc,

字符串池最初是空的,由 String 类私有维护。

您可以对字符串调用 intern() 将其添加到此池中。从逻辑上讲,如果 a.equals(b)a.intern() == b.intern(),因为 .intern() > 保证从唯一的池中返回。

示例:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

运行时的结果:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

我应该指出该假设永远不会成立。 Java 语言本身有许多 String,在我们的 String 有机会出现之前,它们会被保留。然而,假设所有内容都是按顺序加载的,如果您只考虑被实习生的字符串增量,并假设与现有实习生没有冲突(我们都知道实习生可能很挑剔并且充满戏剧性,对吧?窃笑)那么这些数字确实表示字符串池大小的增量。

You're confusing compile time complexity with runtime complexity.

When the class is loaded, yes it does a search to see if each literal already exists (although I imagine it would use a hashmap for O(1) lookup instead of your proposal).

When the code runs, it has the reference to the string in memory so there is no additional cost than a non-literal.

So yes, literals are interned. According to the Javadoc for String,

A pool of strings, initially empty, is maintained privately by the class String.

You can invoke intern() on a String to add it to this pool. It logically follows that if a.equals(b) then a.intern() == b.intern(), since .intern() guarantees to return from the a unique pool.

Example:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

Results when run:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

I should point out that the assumption will never be true. The Java language itself has many Strings that would be interned before our Strings ever got the chance to see the light of day. However, assuming everything is loaded sequentially, if you consider only the delta of Strings being interned, and assume no collisions with existing interns (we all know interns can be fussy and full of drama, right? snicker) then the numbers do indeed indicate the delta of the size of the string pool.

っ左 2024-10-25 05:46:21

1 - 当我们声明文字字符串时,我们会在堆的字符串池中搜索是否有相同的字符串。这也是实习(String类的方法实习生)吗?

是的。这个过程称为实习。然而,它只发生一次......当包含文字的类被加载时。

2 - 在我看来,每个文字字符串声明都需要进行二分搜索或其他操作,因此当 n 是池中现有字符串的数量时,它至少需要 log(n) 。

不,没有。该池是一个哈希表。

...并且如果池中的字符串很多,则可能成本很高。

不,不会的。在字符串池哈希表中查找的成本为O(1)

...从这个角度来看,声明许多文字字符串可能是危险的。

与加载然后 JIT 编译类文件的其他成本相比,该成本并不显着。声明大量文字字符串不存在与性能相关的“危险”。

显然,字符串文字对应的 String 对象“永久”占用内存,您通常不希望不必要地浪费内存。但如果您需要使用这些常量字符串,则必须以某种方式表示它们。其他表示它们的方式要么以其他方式使用内存,要么涉及其他运行时成本;例如,从文件中读取它们或从数据库中检索它们的成本。

驻留字符串文字的好处是堆不会因同一文字字符串的多个副本而变得混乱。对于典型的 SE / EE 应用程序来说,这可能并不重要,但对于 ME 平台来说,堆内存非常宝贵,浪费它是一件坏事。


@RENO 询问字符串被拘留的次数。有两种情况:

  • 显式调用 String.intern() 发生的次数与应用程序选择的次数相同。

  • 对于字符串文字,javac 编译器将确保给定的 .class 文件在其常量池中不包含任何字符串文字的多个副本。这意味着在许多地方具有给定文字的类只会导致在加载该类时该文字被保留一次。但是,如果两个类在各自的源代码中具有相同的文字字符串,则它们都将在各自的常量池中拥有该字符串值,并且在加载各自的类时都会驻留该字符串。

1 - When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning(method intern of class String)?

Yes. This process is called interning. However, it happens just once ... when the class containing the literal is loaded.

2 - In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool.

No it doesn't. The pool is a hash table.

... And if there are many strings in the pool, it may be high cost.

No it won't. The cost of a lookup in the string pool hash table is O(1).

... On this point of view, it might be dangerous to declare many literal strings.

The cost is not significant compared with the other costs of loading and then JIT compiling a class file. There is no performance related "danger" in declaring lots of literal strings.

Obviously, the String objects corresponding to string literals occupy memory "permanently", and you generally don't want to waste memory unnecessarily. But if you need to use those constant strings, they have to be represented somehow. And other ways of representing them either use the memory in other ways, or involve other runtime costs; e.g. the costs of reading them from a file or retrieving them from database.

The benefit of interning string literals is that the heap does not get cluttered up with multiple copies of the same literal string. This is probably not significant for typical SE / EE applications, but for the ME platforms heap memory is at a premium, and it would be a bad thing to waste it.


@RENO asks about the number of times Strings get interned. There are two cases:

  • Explicit calls to String.intern() happen as many (or as few) times as the application chooses to make.

  • For String literals, the javac compiler will ensure that a given .class file does not contain multiple copies of any String literal in its constant pool. This means that a class that has a given literal in many places will only result in the literal being interned once when the class is loaded. However, if you have two classes with the same literal string in their respective source code, they will both have the string value in their respective constant pools, and both intern the string when the respective classes are loaded.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文