以预解析的二进制格式存储 XML 文档
我的应用程序需要存储大量类似 XML 的分层信息,并满足以下要求:
- 快速读取
- 最小内存消耗
- 键入数据而不仅仅是文本
对于实现这些目标的二进制格式有什么建议吗?
My application need to store large amounts of XML-like hierarchical information with the following requirements:
- Fast to read
- Minimal memory consumption
- Typed data instead of merely text
Any suggestions for a binary format that fulfills these goals?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您没有指定 xml 是否是格式要求,您只是说它需要像 xml 一样是分层的。
如果没有关于此类数据的更多详细信息,很难给您提供很多建议。这是一个小清单。
you don't specify if xml is a format requirement you only say it needs to be hierarchical like xml.
Without more detail on the kind of data it's hard to give you very much advice. So here's a small list.
其他应用程序是否需要读取存储的数据,或者只是您的应用程序?它需要是“标准”格式吗?
Fast Infoset 满足要求 (1) 和 (2),尽管因为它只是XML 信息模型,它与 XML 一样是无类型的。不过,在没有其他任何东西的情况下,可能足以满足您的目的。
Do other applications need to read the stored data, or just yours? Does it need to be a "standard" format?
Fast Infoset meets requirements (1) and (2), although because it's just a binary representation of the XML information model, it's just as untyped as XML. Might be good enough for your purposes, though, in the absence of anything else.
您的需求细节太少,无法提供好的建议。例如,您可以自由选择存储介质吗?它是文件系统、数据库还是其他什么?
“最小内存消耗”是什么意思?您是否在受限平台上运行?必须与其他应用程序共享资源吗?如果您的计算机有 4GB 内存,1GB 的占用空间足够小吗?您的数据会保存在内存中还是仅保存您正在处理的部分?
如果平台是 Java,我会从其标准序列化开始,然后在我对性能不满意时研究自定义序列化。
There's too little detail in your requirements to give good suggestions. For example are you free to pick your storage medium? Will it be a file system, database or something else?
What does "minimum memory consumption" mean? Are you running on a constrained platform? Must you share resources with other applications? Is a 1GB footprint small enough if your computer has 4GB of memory? Will your data sit in memory or only the parts you are working on?
If the platform was Java, I'd start with its standard serialization and then investigate custom serialization if I wasn't happy with the performance.
您还可以将 XML 读入对象图并存储为 Google Protocol Buffers。这些被设计得非常高效。
You could also read the XML into an object graph and store as Google Protocol Buffers. These are designed to be very efficient.
如果格式可以讨论,我建议使用 JSON,而不是 XML。实际上,JSON 的加载和写入速度比标准 XML 更快。
有关 JSON 的更多信息:
http://www.json. 25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986
http://www.25hoursaday.com/博客/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b
If the format is discussable, I'd suggest JSON, not XML. JSON is actually faster to load and write than standard XML.
More about JSON :
http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986
http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b
维基百科对这个问题的解释:
http://en.wikipedia.org/wiki/Binary_XML
据说推荐的组织及其java .net sdk 可以从以下位置下载:
http://www.agiledelta.com/product_efx.html
xml 是纯文本,但可以是用于表示序列化的对象。
假设您的序列化程序正在将对象序列化为 xml。
您不应该尝试将对象转换为二进制流,因为您必须处理字节序(http://en .wikipedia.org/wiki/Endian)和数据表示问题。但是,如果您坚持,则需要使用 XDR (http://en.wikipedia.org/wiki /External_Data_Representation)因其数据架构中立性。
否则,您应该使用标准序列化程序将对象序列化为 XML,然后由于库和 SDK 的可用性而将 xml 转换为二进制/紧凑 xml。然后通过从二进制 xml 解压缩来反序列化。
Wikipedia's explanation of the issue:
http://en.wikipedia.org/wiki/Binary_XML
Supposedly the recommended organisation and its java and .net sdk can be downloaded from:
http://www.agiledelta.com/product_efx.html
xml is pure text but can be used to represent serialized objects.
Let's presume your serializer is serializing your objects into xml.
You should not try to convert your objects into binary streams because you would have to tackle endian (http://en.wikipedia.org/wiki/Endian) and data-representation issues. However, if you insist, you would need to use XDR (http://en.wikipedia.org/wiki/External_Data_Representation) for its data architecture neutrality.
Otherwise, you should serialize your objects to XML using standard serializers and then convert the xml to binary/compact xml because of the availability of libraries and sdks. And then deserialize by decompacting from binary xml.