为什么 .NET 创建新的子字符串而不是指向现有字符串?

发布于 2024-07-26 01:43:58 字数 215 浏览 5 评论 0原文

从使用 Reflector 的简要观察来看,String.Substring() 似乎为每个子字符串分配内存。 我的说法正确吗? 我认为这没有必要,因为字符串是不可变的。

我的根本目标是创建一个 IEnumerableSplit(this String, Char) 扩展方法,不分配额外的内存。

From a brief look using Reflector, it looks like String.Substring() allocates memory for each substring. Am I correct that this is the case? I thought that wouldn't be necessary since strings are immutable.

My underlying goal was to create a IEnumerable<string> Split(this String, Char) extension method that allocates no additional memory.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

新一帅帅 2024-08-02 01:43:58

大多数具有不可变字符串的语言创建新的子字符串而不是引用现有字符串的原因之一是因为这会干扰稍后对这些字符串进行垃圾收集。

如果一个字符串用于其子字符串,但随后较大的字符串变得无法访问(除非通过子字符串),会发生什么情况。 较大的字符串将无法收集,因为这会使子字符串无效。 短期内看似节省内存的好方法从长远来看却变成了内存泄漏。

One reason why most languages with immutable strings create new substrings rather than refer into existing strings is because this will interfere with garbage collecting those strings later.

What happens if a string is used for its substring, but then the larger string becomes unreachable (except through the substring). The larger string will be uncollectable, because that would invalidate the substring. What seemed like a good way to save memory in the short term becomes a memory leak in the long term.

小苏打饼 2024-08-02 01:43:58

如果不使用 String 类在 .net 内部进行探索,这是不可能的。 您必须传递对可变数组的引用,并确保没有人搞砸。

每次您要求时,.Net 都会创建一个新字符串。 唯一的例外是由编译器创建的内部字符串(并且可以由您完成),它们被放入内存一次,然后出于内存和性能原因建立指向该字符串的指针。

Not possible without poking around inside .net using String classes. You would have to pass around references to an array which was mutable and make sure no one screwed up.

.Net will create a new string every time you ask it to. Only exception to this is interned strings which are created by the compiler (and can be done by you) which are placed into memory once and then pointers are established to the string for memory and performance reasons.

我的痛♀有谁懂 2024-08-02 01:43:58

每个字符串都必须拥有自己的字符串数据,这与 String 类的实现方式相同。

您可以创建自己的使用字符串一部分的 SubString 结构:

public struct SubString {

   private string _str;
   private int _offset, _len;

   public SubString(string str, int offset, int len) {
      _str = str;
      _offset = offset;
      _len = len;
   }

   public int Length { get { return _len; } }

   public char this[int index] {
      get {
         if (index < 0 || index > len) throw new IndexOutOfRangeException();
         return _str[_offset + index];
      }
   }

   public void WriteToStringBuilder(StringBuilder s) {
      s.Write(_str, _offset, _len);
   }

   public override string ToString() {
      return _str.Substring(_offset, _len);
   }

}

您可以使用其他方法(例如比较)来充实它,这也可以在不提取字符串的情况下完成。

Each string has to have it's own string data, with the way that the String class is implemented.

You can make your own SubString structure that uses part of a string:

public struct SubString {

   private string _str;
   private int _offset, _len;

   public SubString(string str, int offset, int len) {
      _str = str;
      _offset = offset;
      _len = len;
   }

   public int Length { get { return _len; } }

   public char this[int index] {
      get {
         if (index < 0 || index > len) throw new IndexOutOfRangeException();
         return _str[_offset + index];
      }
   }

   public void WriteToStringBuilder(StringBuilder s) {
      s.Write(_str, _offset, _len);
   }

   public override string ToString() {
      return _str.Substring(_offset, _len);
   }

}

You can flesh it out with other methods like comparison that is also possible to do without extracting the string.

水染的天色ゝ 2024-08-02 01:43:58

由于字符串在 .NET 中是不可变的,因此每个产生新字符串对象的字符串操作都会为字符串内容分配一个新的内存块。

理论上,在提取子字符串时可以重用内存,但这会使垃圾收集变得非常复杂:如果原始字符串被垃圾收集怎么办? 共享其中一部分的子串会发生什么?

当然,没有什么可以阻止 .NET BCL 团队在未来版本的 .NET 中更改此行为。 它不会对现有代码产生任何影响。

Because strings are immutable in .NET, every string operation that results in a new string object will allocate a new block of memory for the string contents.

In theory, it could be possible to reuse the memory when extracting a substring, but that would make garbage collection very complicated: what if the original string is garbage-collected? What would happen to the substring that shares a piece of it?

Of course, nothing prevents the .NET BCL team to change this behavior in future versions of .NET. It wouldn't have any impact on existing code.

離殇 2024-08-02 01:43:58

除了字符串是不可变的之外,您应该看到以下代码片段将在内存中生成多个字符串实例。

String s1 = "Hello", s2 = ", ", s3 = "World!";
String res = s1 + s2 + s3;

s1+s2 => 新字符串实例 (temp1)

temp1 + s3 => 新字符串实例 (temp2)

res 是对 temp2 的引用。

Adding to the point that Strings are immutable, you should be that the following snippet will generate multiple String instances in memory.

String s1 = "Hello", s2 = ", ", s3 = "World!";
String res = s1 + s2 + s3;

s1+s2 => new string instance (temp1)

temp1 + s3 => new string instance (temp2)

res is a reference to temp2.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文