现实生活中,在 Java 中使用 String.intern() 的实际例子?

发布于 2024-09-15 00:41:31 字数 266 浏览 8 评论 0原文

我见过许多描述 String intern()'ing 如何工作的原始示例,但我还没有看到可以从中受益的实际用例。

我能想到的唯一情况是拥有一个接收大量请求的 Web 服务,由于严格的模式,每个请求在本质上都非常相似。在这种情况下,通过对请求字段名称进行 intern() 操作,可以显着减少内存消耗。

任何人都可以提供在生产环境中使用 intern() 并取得巨大成功的示例吗?也许是流行的开源产品中的一个例子?

编辑:我指的是手动实习,而不是字符串文字等的保证实习。

I've seen many primitive examples describing how String intern()'ing works, but I have yet to see a real-life use-case that would benefit from it.

The only situation that I can dream up is having a web service that receives a considerable amount of requests, each being very similar in nature due to a rigid schema. By intern()'ing the request field names in this case, memory consumption can be significantly reduced.

Can anyone provide an example of using intern() in a production environment with great success? Maybe an example of it in a popular open source offering?

Edit: I am referring to manual interning, not the guaranteed interning of String literals, etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

风吹短裙飘 2024-09-22 00:41:38

永远不要对用户提供的数据使用 intern,因为这可能会导致拒绝服务攻击(因为 intern()ed 字符串永远不会被释放)。您可以对用户提供的字符串进行验证,但话又说回来,您已经完成了 intern() 所需的大部分工作。

Never, ever, use intern on user-supplied data, as that can cause denial of service attacks (as intern()ed strings are never freed). You can do validation on the user-supplied strings, but then again you've done most of the work needed for intern().

邮友 2024-09-22 00:41:37

不是一个完整的答案,但值得深思(在这里找到):

因此,这种情况下的主要好处是,对内部化字符串使用 == 运算符比使用 equals() 方法[对于未内部化的字符串]要快得多。字符串]。因此,如果您要比较字符串超过一三次,请使用 intern() 方法。

Not a complete answer but additional food for thought (found here):

Therefore, the primary benefit in this case is that using the == operator for internalized strings is a lot faster than use the equals() method [for not internalized Strings]. So, use the intern() method if you're going to be comparing strings more than a time or three.

如何视而不见 2024-09-22 00:41:36

驻留有益的示例涉及大量字符串,其中:

  • 字符串可能会在多个 GC 周期中幸存下来,并且
  • 可能存在大部分字符串的多个副本。

典型示例包括将文本拆分/解析为符号(单词、标识符、URI),然后将这些符号附加到长期存在的数据结构中。 XML 处理、编程语言编译和 RDF/OWL 三重存储作为实习可能有益的应用程序浮现在脑海中。

但驻留并非没有问题,特别是如果事实证明上述假设不正确:

  • 用于保存驻留字符串的池数据结构需要额外的空间,
  • 驻留需要时间,并且
  • 驻留不会阻止重复的创建字符串放在第一位。

最后,实习可能会增加 GC 开销,因为它会增加需要跟踪和复制的对象数量以及需要处理的弱引用数量。开销的增加必须与有效驻留导致的 GC 开销的减少相平衡。

Examples where interning will be beneficial involve a large numbers strings where:

  • the strings are likely to survive multiple GC cycles, and
  • there are likely to be multiple copies of a large percentage of the Strings.

Typical examples involve splitting / parsing a text into symbols (words, identifiers, URIs) and then attaching those symbols to long-lived data structures. XML processing, programming language compilation and RDF / OWL triple stores spring to mind as applications where interning is likely to be beneficial.

But interning is not without its problems, especially if it turns out that the assumptions above are not correct:

  • the pool data structure used to hold the interned strings takes extra space,
  • interning takes time, and
  • interning doesn't prevent the creation of the duplicate string in the first place.

Finally, interning potentially increases GC overheads by increasing the number of objects that need to be traced and copied, and by increasing the number of weak references that need to be dealt with. This increase in overheads has to be balanced against the decrease in GC overheads that results from effective interning.

笙痞 2024-09-22 00:41:35

如果您有 N 个字符串只能采用 K 个不同的值,其中 N 远远超过 K,那么实习会非常有益>。现在,您将只存储最多 K 字符串,而不是在内存中存储 N 个字符串。

例如,您可能有一个由 5 位数字组成的 ID 类型。因此,只能有 10^5 个不同的值。假设您现在正在解析一个大型文档,其中包含许多对 ID 值的引用/交叉引用。假设该文档总共有 10^9 个引用(显然有些引用在文档的其他部分中重复)。

因此,在本例中,N = 10^9K = 10^5。如果您不保留字符串,您将在内存中存储 10^9 字符串,其中许多字符串是 equals (通过 鸽洞原理)。如果您 intern() 解析文档时获得的 ID 字符串,并且不保留对从文档中读取的未驻留字符串的任何引用 (这样它们就可以被垃圾回收),那么您将永远不需要在内存中存储超过 10^5 字符串。

Interning can be very beneficial if you have N strings that can take only K different values, where N far exceeds K. Now, instead of storing N strings in memory, you will only be storing up to K.

For example, you may have an ID type which consists of 5 digits. Thus, there can only be 10^5 different values. Suppose you're now parsing a large document that has many references/cross references to ID values. Let's say this document have 10^9 references total (obviously some references are repeated in other parts of the documents).

So N = 10^9 and K = 10^5 in this case. If you are not interning the strings, you will be storing 10^9 strings in memory, where lots of those strings are equals (by Pigeonhole Principle). If you intern() the ID string you get when you're parsing the document, and you don't keep any reference to the uninterned strings you read from the document (so they can be garbage collected), then you will never need to store more than 10^5 strings in memory.

孤千羽 2024-09-22 00:41:35

我们有一个生产系统,可以一次处理数百万条数据,其中许多数据都有字符串字段。我们应该一直在实习字符串,但有一个错误意味着我们没有。通过修复该错误,我们避免了进行非常昂贵的(至少 6 位数,可能 7 位数)服务器升级。

We had a production system that processes literally millions of pieces of data at a time, many of which have string fields. We should have been interning strings, but there was a bug which meant we were not. By fixing the bug we avoided having to do a very costly (at least 6 figures, possibly 7) server upgrade.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文