大对象堆中的大字符串会导致问题 - 但无论如何它都必须以字符串形式结束
我正在跟进这个问题
我遇到的问题是,我有一些来自 MSMQ 的大型对象(主要是字符串)。我已将内存问题范围缩小到在大对象堆 (LOH) 中创建的这些对象,因此对其进行了碎片化(在探查器的一些帮助下确认了这一点)。
在我上面发布的问题中,我得到了一些解决方法,主要是将字符串拆分为字符数组,我就是这样做的。
我面临的问题是,在字符串处理(无论何种形式)结束时,我需要将该字符串发送到我无法控制的另一个系统。因此,我正在考虑以下解决方案,将此字符串放置在 LOH 中:
- 将其表示为每个小于 85k 的 char 数组的数组(要放置在 LOH 中的对象的阈值)
- 在发送端压缩它(即在接收之前)它在我们在这里讨论的系统(即接收器)中,并且仅在将其传递到第三方系统之前对其进行解压缩。
无论我做什么——无论怎样——字符串都必须是完整的(没有字符数组或压缩)。
我被困在这里了吗?我在想使用托管环境是否是一个错误,我们是否应该硬着头皮选择 C++ 类型的环境。
谢谢, Yannis
编辑:我已将问题范围缩小到完全发布的代码这里
通过的大字符串被放置在 LOH 中。我已经删除了从收到消息开始的每个处理模块,并且内存消耗趋势保持不变。
所以我想我需要改变这个 WorkContext 在系统之间传递的方式。
I am following up from this question here
The problem I have is that I have some large objects coming from an MSMQ mainly Strings. I have narrowed down my memory problems to these objects being created in the Large Object Heap (LOH) and therefore fragmenting it (confirmed that with some help from the profiler).
In the question I posted above I got some workarounds mainly in the form of splitting up the String into char arrays which I did.
The problem I am facing is that at the end of the string processing (in whatever form that is) I need to send that string to another system which I have no control over. So I was thinking of the following solution to have this String placed in the LOH:
- Represent it as an array of char arrays less than 85k each (threshold of Objects to be placed in the LOH)
- Compress it on the sender end (i.e. before receiving it in the system we are talking about here which is the receiver) and decompress it only before passing it in the third party system.
Whatever I do - one way or another - the String will have to be complete (no char arrays or compressed).
Am I stuck here? I am thinking if using a managed environment was a mistake here and whether we should bite the bullet and go for a C++ kind of environment.
Thanks,
Yannis
EDIT: I have narrowed down the problem to exactly the code posted here
The large string that comes through is placed in the LOH. I have removed every single processing module from point where i have received the message onwards and the memory consumption trend remains the same.
So I guess i need to change the way this WorkContext is passed around between systems.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
那么您的选择取决于第三方系统如何接收数据。如果您可以以某种方式流式传输到它,那么您不必一次性将所有内容都存储在内存中。如果是这种情况,那么压缩(如果数据易于压缩,这可能会真正帮助您的网络负载)非常有用,因为您可以通过流解压缩并将其分块发送到第三方系统。
当然,如果您将字符串拆分到低于 LoH 阈值,同样的方法也会起作用。
如果没有,那么我仍然主张拆分 MSMQ 消息上的有效负载,然后使用预分配和重用字节数组的内存池进行重新组装,然后再将其发送到客户端。 Microsoft 有一个实现,您可以使用 http://msdn。 microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx
我能想到的最后一个选项是在 C++ 中处理非托管代码中的 msmq 反序列化并创建您自己的自定义大块内存池使用放置 new 将字符串反序列化到其中。您可以通过确保池缓冲区足以容纳最长的消息来保持它相对简单,而不是试图变得聪明和动态,这很难。
Well your options depend on how the 3rd party system is receiving data. If you can stream to it somehow then you don't have to have it all in memory in one go. If that is the case then compressing (which will probably really help your network load if its easily compressible data) is great as you can decompress through a stream and punt it to the 3rd party system in chunks.
The same of course would work if you split your strings up to go below LoH threshold.
If not then I would still advocate splitting the payload on the MSMQ message, and then using a memory pool of prealloacted and reused byte arrays for the re-assembly before sending it to the client. Microsoft has an implementation you can use http://msdn.microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx
The final option I can think of, is to handle the msmq deserialisation in unmanaged code in C++ and create your own custom large block memory pool using placement new to deserialise the strings into that. You could keep it relatively simple by ensuring your pool buffers are sufficient for the longest message possible rather than trying to be clever and dynamic which is hard.
您可以尝试使用
StringBuilder
(使用类似绳索实现的 4.0 版本)流式传输值。此示例必须在
发布
模式下执行,并附加启动而不调试
(CTRL-F5)。调试
模式和开始调试
都会对 GC 造成太多干扰。我直接在
XmlReader
中读取消息的Message.BodyStream
,然后转到我需要的元素,并使用XmlReader.ReadValueChunk 读取块中的数据
最终我没有使用
string
对象。唯一的大内存块是Message
。You can try streaming the values using a
StringBuilder
(the 4.0 version that uses a rope-like implementation).This example must be executed in
Release
mode and with theStart Without Debugging
attached (CTRL-F5). BothDebug
mode andStart Debugging
mess too much with the GC.I read directly the
Message.BodyStream
of the message in anXmlReader
and then I go to the elements I need and I read the data in chunks usingXmlReader.ReadValueChunk
In the end nowhere I use
string
objects. The only big block of memory is theMessage
.您也许可以实现一个类(称之为LargeString),它重用以前分配的字符串并保留它们的一小部分集合。
由于字符串通常是不可变的,因此您必须通过不安全的指针杂耍来完成每个更改和新的分配。将字符串传递给接收者后,您需要手动将其标记为免费以供重用。不同的消息长度也可能是一个问题,除非接收者可以处理太长的消息,或者您有各种长度的字符串集合。
可能不是一个好主意,但也许比用 C++ 重写所有内容更好。
You maybe could implement a class (call it
LargeString
), that reuses previously assigned strings and keeps a small collection of them.Since strings normally are immutable, you'd have to do every change and new assignment by unsafe pointer juggling. After passing a string to the reciever, you'd need to manually mark it as free for reuse. Different message lengths might also be a problem, unless the reciever can cope with messages that are too long, or you have a collection of strings of every length.
Probably not a great idea, but maybe beats rewriting everything in C++.