大对象堆中的大字符串会导致问题 - 但无论如何它都必须以字符串形式结束

发布于 2024-12-10 13:30:09 字数 875 浏览 4 评论 0原文

我正在跟进这个问题

我遇到的问题是,我有一些来自 MSMQ 的大型对象(主要是字符串)。我已将内存问题范围缩小到在大对象堆 (LOH) 中创建的这些对象,因此对其进行了碎片化(在探查器的一些帮助下确认了这一点)。

在我上面发布的问题中,我得到了一些解决方法,主要是将字符串拆分为字符数组,我就是这样做的。

我面临的问题是,在字符串处理(无论何种形式)结束时,我需要将该字符串发送到我无法控制的另一个系统。因此,我正在考虑以下解决方案,将此字符串放置在 LOH 中:

  1. 将其表示为每个小于 85k 的 char 数组的数组(要放置在 LOH 中的对象的阈值)
  2. 在发送端压缩它(即在接收之前)它在我们在这里讨论的系统(即接收器)中,并且仅在将其传递到第三方系统之前对其进行解压缩。

无论我做什么——无论怎样——字符串都必须是完整的(没有字符数组或压缩)。

我被困在这里了吗?我在想使用托管环境是否是一个错误,我们是否应该硬着头皮选择 C++ 类型的环境。

谢谢, Yannis

编辑:我已将问题范围缩小到完全发布的代码这里

通过的大字符串被放置在 LOH 中。我已经删除了从收到消息开始的每个处理模块,并且内存消耗趋势保持不变。

所以我想我需要改变这个 WorkContext 在系统之间传递的方式。

I am following up from this question here

The problem I have is that I have some large objects coming from an MSMQ mainly Strings. I have narrowed down my memory problems to these objects being created in the Large Object Heap (LOH) and therefore fragmenting it (confirmed that with some help from the profiler).

In the question I posted above I got some workarounds mainly in the form of splitting up the String into char arrays which I did.

The problem I am facing is that at the end of the string processing (in whatever form that is) I need to send that string to another system which I have no control over. So I was thinking of the following solution to have this String placed in the LOH:

  1. Represent it as an array of char arrays less than 85k each (threshold of Objects to be placed in the LOH)
  2. Compress it on the sender end (i.e. before receiving it in the system we are talking about here which is the receiver) and decompress it only before passing it in the third party system.

Whatever I do - one way or another - the String will have to be complete (no char arrays or compressed).

Am I stuck here? I am thinking if using a managed environment was a mistake here and whether we should bite the bullet and go for a C++ kind of environment.

Thanks,
Yannis

EDIT: I have narrowed down the problem to exactly the code posted here

The large string that comes through is placed in the LOH. I have removed every single processing module from point where i have received the message onwards and the memory consumption trend remains the same.

So I guess i need to change the way this WorkContext is passed around between systems.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

把回忆走一遍 2024-12-17 13:30:09

那么您的选择取决于第三方系统如何接收数据。如果您可以以某种方式流式传输到它,那么您不必一次性将所有内容都存储在内存中。如果是这种情况,那么压缩(如果数据易于压缩,这可能会真正帮助您的网络负载)非常有用,因为您可以通过流解压缩并将其分块发送到第三方系统。

当然,如果您将字符串拆分到低于 LoH 阈值,同样的方法也会起作用。

如果没有,那么我仍然主张拆分 MSMQ 消息上的有效负载,然后使用预分配和重用字节数组的内存池进行重新组装,然后再将其发送到客户端。 Microsoft 有一个实现,您可以使用 http://msdn。 microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx

我能想到的最后一个选项是在 C++ 中处理非托管代码中的 msmq 反序列化并创建您自己的自定义大块内存池使用放置 new 将字符串反序列化到其中。您可以通过确保池缓冲区足以容纳最长的消息来保持它相对简单,而不是试图变得聪明和动态,这很难。

Well your options depend on how the 3rd party system is receiving data. If you can stream to it somehow then you don't have to have it all in memory in one go. If that is the case then compressing (which will probably really help your network load if its easily compressible data) is great as you can decompress through a stream and punt it to the 3rd party system in chunks.

The same of course would work if you split your strings up to go below LoH threshold.

If not then I would still advocate splitting the payload on the MSMQ message, and then using a memory pool of prealloacted and reused byte arrays for the re-assembly before sending it to the client. Microsoft has an implementation you can use http://msdn.microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx

The final option I can think of, is to handle the msmq deserialisation in unmanaged code in C++ and create your own custom large block memory pool using placement new to deserialise the strings into that. You could keep it relatively simple by ensuring your pool buffers are sufficient for the longest message possible rather than trying to be clever and dynamic which is hard.

遗弃M 2024-12-17 13:30:09

您可以尝试使用 StringBuilder(使用类似绳索实现的 4.0 版本)流式传输值。

此示例必须发布模式下执行,并附加启动而不调试 (CTRL-F5)。 调试 模式和开始调试 都会对 GC 造成太多干扰。

public class SerializableWork
{
    // This is very often between 100-120k bytes. This is actually a String - not just for the purposes of this example
    public String WorkContext { get; set; }

    // This is quite large as well but usually less than 85k bytes. This is actually a String - not just for the purposes of this example
    public String ContextResult { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Initial memory: {0}", GC.GetTotalMemory(true));
        var sw = new SerializableWork { WorkContext = new string(' ', 1000000), ContextResult = new string(' ', 1000000) };
        Console.WriteLine("Memory with objects: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            mq.Send(sw);
        }

        sw = null;

        Console.WriteLine("Memory after collect: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            StringBuilder sb1, sb2;

            using (var msg = mq.Receive())
            {
                Console.WriteLine("Memory after receive: {0}", GC.GetTotalMemory(true));

                using (var reader = XmlTextReader.Create(msg.BodyStream))
                {
                    reader.ReadToDescendant("WorkContext");
                    reader.Read();

                    sb1 = ReadContentAsStringBuilder(reader);

                    reader.ReadToFollowing("ContextResult");
                    reader.Read();

                    sb2 = ReadContentAsStringBuilder(reader);

                    Console.WriteLine("Memory after creating sb: {0}", GC.GetTotalMemory(true));
                }
            }

            Console.WriteLine("Memory after freeing mq: {0}", GC.GetTotalMemory(true));

            GC.KeepAlive(sb1);
            GC.KeepAlive(sb2);
        }

        Console.WriteLine("Memory after final collect: {0}", GC.GetTotalMemory(true));
    }

    private static StringBuilder ReadContentAsStringBuilder(XmlReader reader)
    {
        var sb = new StringBuilder();
        char[] buffer = new char[4096];

        int read;

        while ((read = reader.ReadValueChunk(buffer, 0, buffer.Length)) != 0)
        {
            sb.Append(buffer, 0, read);
        }

        return sb;
    }
}

我直接在 XmlReader 中读取消息的 Message.BodyStream,然后转到我需要的元素,并使用 XmlReader.ReadValueChunk 读取块中的数据

最终我没有使用 string 对象。唯一的大内存块是Message

You can try streaming the values using a StringBuilder (the 4.0 version that uses a rope-like implementation).

This example must be executed in Release mode and with the Start Without Debugging attached (CTRL-F5). Both Debug mode and Start Debugging mess too much with the GC.

public class SerializableWork
{
    // This is very often between 100-120k bytes. This is actually a String - not just for the purposes of this example
    public String WorkContext { get; set; }

    // This is quite large as well but usually less than 85k bytes. This is actually a String - not just for the purposes of this example
    public String ContextResult { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Initial memory: {0}", GC.GetTotalMemory(true));
        var sw = new SerializableWork { WorkContext = new string(' ', 1000000), ContextResult = new string(' ', 1000000) };
        Console.WriteLine("Memory with objects: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            mq.Send(sw);
        }

        sw = null;

        Console.WriteLine("Memory after collect: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            StringBuilder sb1, sb2;

            using (var msg = mq.Receive())
            {
                Console.WriteLine("Memory after receive: {0}", GC.GetTotalMemory(true));

                using (var reader = XmlTextReader.Create(msg.BodyStream))
                {
                    reader.ReadToDescendant("WorkContext");
                    reader.Read();

                    sb1 = ReadContentAsStringBuilder(reader);

                    reader.ReadToFollowing("ContextResult");
                    reader.Read();

                    sb2 = ReadContentAsStringBuilder(reader);

                    Console.WriteLine("Memory after creating sb: {0}", GC.GetTotalMemory(true));
                }
            }

            Console.WriteLine("Memory after freeing mq: {0}", GC.GetTotalMemory(true));

            GC.KeepAlive(sb1);
            GC.KeepAlive(sb2);
        }

        Console.WriteLine("Memory after final collect: {0}", GC.GetTotalMemory(true));
    }

    private static StringBuilder ReadContentAsStringBuilder(XmlReader reader)
    {
        var sb = new StringBuilder();
        char[] buffer = new char[4096];

        int read;

        while ((read = reader.ReadValueChunk(buffer, 0, buffer.Length)) != 0)
        {
            sb.Append(buffer, 0, read);
        }

        return sb;
    }
}

I read directly the Message.BodyStream of the message in an XmlReader and then I go to the elements I need and I read the data in chunks using XmlReader.ReadValueChunk

In the end nowhere I use string objects. The only big block of memory is the Message.

女中豪杰 2024-12-17 13:30:09

您也许可以实现一个类(称之为LargeString),它重用以前分配的字符串并保留它们的一小部分集合。

由于字符串通常是不可变的,因此您必须通过不安全的指针杂耍来完成每个更改和新的分配。将字符串传递给接收者后,您需要手动将其标记为免费以供重用。不同的消息长度也可能是一个问题,除非接收者可以处理太长的消息,或者您有各种长度的字符串集合。

可能不是一个好主意,但也许比用 C++ 重写所有内容更好。

You maybe could implement a class (call it LargeString), that reuses previously assigned strings and keeps a small collection of them.

Since strings normally are immutable, you'd have to do every change and new assignment by unsafe pointer juggling. After passing a string to the reciever, you'd need to manually mark it as free for reuse. Different message lengths might also be a problem, unless the reciever can cope with messages that are too long, or you have a collection of strings of every length.

Probably not a great idea, but maybe beats rewriting everything in C++.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文