压缩 XML 指标。
我有一个客户端服务器应用程序,它通过 TCP/IP 将 XML 从客户端发送到服务器,然后广播到其他客户端。 我如何知道 XML 的最小大小可以通过压缩 XML 而不是通过常规流发送来保证性能改进。
有什么好的指标或例子吗?
I have a client server application that sends XML over TCP/IP from client to server and then broadcast out to other clients. How do i know at what the minimun size of the XML that would warrant a performance improvement by compression the XML rather than sending over the regular stream.
Are there any good metrics on this or examples?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Xml 通常压缩得很好,因为它往往有很多重复。
另一种选择是交换为二进制格式; BinaryFormatter 或 NetDataContractSerializer 是简单的选项,但与 xml 相比,两者都非常不兼容(例如与 java)。
另一种选择是可移植的二进制格式,例如谷歌的“协议缓冲区”。 我维护了一个名为 protobuf-net 的 .NET/C# 版本。 它被设计为与常规 .NET 方法(例如 XmlSerializer / DataContractSerializer)并行兼容,但比 xml 小得多,并且序列化和反序列化所需的处理(CPU 等)显着减少。
此页面显示了 XmlSerializer、DataContractSerializer 和 protobuf-net 的一些数字; 我认为它包含有/没有压缩的统计数据,但它们似乎已经消失了......
[更新]我应该说 - QuickStart 项目中有一个 TCP/IP 示例。
Xml usually compresses very well, as it tends to have a lot of repetition.
Another option would be to swap to a binary format; BinaryFormatter or NetDataContractSerializer are simple options, but both are notoriously incompatible (for example with java) compared with xml.
Another option would be a portable binary format such as google's "protocol buffers". I maintain a .NET/C# version of this called protobuf-net. This is designed to be side-by-side compatible with regular .NET approaches (such as XmlSerializer / DataContractSerializer), but is much smaller than xml, and requires significantly less processing (CPU etc) for both serialization and deserialization.
This page shows some numbers for XmlSerializer, DataContractSerializer and protobuf-net; I thought it included stats with/without compression, but they seem to have vanished...
[update] I should have said - there is a TCP/IP example in the QuickStart project.
一个宽松的衡量标准是压缩任何大于单个数据包的数据,但这只是吹毛求疵。
没有理由避免在应用程序内部使用二进制格式 - 无论压缩需要多长时间,网络开销都会比压缩慢几个数量级(除非我们谈论的是非常慢的设备)。
如果这两个建议不能让您放心,您可以随时进行基准测试来找到压缩的位置。
A loose metric would be to compress anything larger than a single packet, but that's just nitpicking.
There is no reason to refrain from using a binary format internally in your application - no matter how much time compression will take, the network overhead will be several orders of magnitude slower than compressing (unless we're talking about very slow devices).
If these two suggestions don't put you at ease, you can always benchmark to find the spot to compress at.
无论如何都要压缩它。
对于具有 2 个以上标签的任何内容,它将节省您的带宽。
By all means compress it always.
It will save you bandwidth for anything with more then 2 tags.
要确定压缩是否对您有任何好处,您需要使用预期将流经系统的实际或预期数据量来运行一些测试。
希望这可以帮助。
To decide if compression has any benefit for you, you need to run some tests using actual or expected amount of the kind of data expect will flow through your system.
Hope this helps.
在我们所做的测试中,我们发现了巨大的好处,但要注意对 CPU 的影响。
在我从事的一个项目中,我们向运行 .NET 的客户端发送大量 XML 数据(> 10 meg)。 (我并不是推荐这样做,这只是我们所处的情况!!)我们发现,当 XML 文件变得足够大时,Microsoft XML 库无法解析 XML 文件(机器耗尽了)内存,即使在机器上> 1 gig)。 更改 XML 解析库最终有所帮助,但在此之前,我们对传输的数据启用了 GZIP 压缩,这有助于我们解析大型文档。 在我们的两个基于 Linux 的 websphere 服务器上,我们能够生成 XML,然后相当容易地对其进行 gzip。 我认为,如果有 50 个用户同时执行此操作(加载大约 10 到 20 个这些文件),我们就可以用大约 50% 的 cpu 来完成此操作。 XML 的压缩似乎在服务器上比在 .net gui 上处理得更好(即解析/CPU 时间),但这可能是由于所使用的 Microsoft XML 库的上述缺陷造成的。 正如我所提到的,有更好的库可用,它们速度更快且使用更少的内存。
在我们的例子中,我们在大小上也得到了巨大的改进——在某些情况下,我们将 50 兆的 XML 文件压缩到大约 10 兆。 这显然也有助于提高网络性能。
由于我们担心影响,以及这是否会产生其他后果(我们的用户似乎一波又一波地做事,所以我们担心我们会耗尽 CPU),我们有一个配置变量,我们可以用它来打开 gzip开关。 我建议你也这样做。
另一件事:我们还在将 XML 文件持久化到数据库之前对其进行了压缩,这节省了大约 50% 的空间(XML 文件从几 K 到几兆,但大多数都相当小)。 做所有事情可能比选择特定级别来区分何时使用压缩更容易。
In the tests that we did, we found a huge benefit, however be aware about the CPU implications.
On one project that I worked on we were sending over large amounts of XML data (> 10 meg) to clients running .NET. (I'm not recommending this as a way to do things, it's just the situation we found ourselves in!!) We found that as XML files got sufficiently large the Microsoft XML libraries were unable to parse the XML files (the machines ran out of memory, even on machines > 1 gig). Changing the XML parsing libraries eventually helped, but before we did that we enabled GZIP compression on the data we transferred which helped us parse the large documents. On our two linux based websphere servers we were able to generate the XML and then gzip it fairly easily. I think that with 50 users doing this concurrently (loading about 10 to 20 of these files) we were able to do this ok, with about 50% cpu. The compression of the XML seemed to be better handled (i.e. parsing/cpu time) on the servers than on the .net gui's, but this was probably due to the above inadequacies of the Microsoft XML libraries being used. As I mentioned, there are better libraries available that are faster and use less memory.
In our case, we got massive improvements in size too -- we were compressing 50 meg XML files in some cases down to about 10 meg. This obviously helped out network performance too.
Since we were concerned about the impact, and whether this would have other consequences (our users seemed to do things in large waves, so we were concerned we'd run out of CPU) we had a config variable which we could use to turn gzip on/off. I'd recommend that you do this too.
Another thing: we also zipped XML files before persisting them in databases, and this saved about 50% space (XML files ranging from a few K to a few meg, but mostly fairly small). It's probably easier to do everything than choose a specific level to differentiate when to use compression or not.