以预解析的二进制格式存储 XML 文档

发布于 2024-08-04 10:46:25 字数 133 浏览 7 评论 0原文

我的应用程序需要存储大量类似 XML 的分层信息,并满足以下要求:

  1. 快速读取
  2. 最小内存消耗
  3. 键入数据而不仅仅是文本

对于实现这些目标的二进制格式有什么建议吗?

My application need to store large amounts of XML-like hierarchical information with the following requirements:

  1. Fast to read
  2. Minimal memory consumption
  3. Typed data instead of merely text

Any suggestions for a binary format that fulfills these goals?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

白鸥掠海 2024-08-11 10:46:25

您没有指定 xml 是否是格式要求,您只是说它需要像 xml 一样是分层的。

如果没有关于此类数据的更多详细信息,很难给您提供很多建议。这是一个小清单。

  • b-trees 有许多库支持多种语言的 b-tree 存储格式。它们具有快速查找功能并且本质上是分层的。
  • 来自谷歌的协议缓冲区。针对通过线路发送而优化的紧凑存储。但不一定作为存储格式进行优化。它们虽然是打字的,但作为一种存储格式可能会表现得很好。
  • 压缩文本格式。紧凑,并取决于所选格式的类型和层次结构。
    • YAML(支持一些复杂的类型、分层、人类可读)
    • JSON(更少的输入支持、快速解析、分层、人类可读)

you don't specify if xml is a format requirement you only say it needs to be hierarchical like xml.

Without more detail on the kind of data it's hard to give you very much advice. So here's a small list.

  • b-trees there are a number of libraries supporting b-tree storage formats in mulitiple languages. they have fast lookups and are hierarchical in nature.
  • Protocol-Buffers from google. Compact storage optimized for sending over the wire. Not neccessarily optimized as a storage format though. They are typed though and probably will do pretty well as a storage format.
  • Zipped text formats. compact, and depending on the format chosen typed and hierarchical in nature.
    • YAML (supporting for some complex typing, hierarchical, human readable)
    • JSON (less typing support, fast parsing, hierarchical, human readable)
情泪▽动烟 2024-08-11 10:46:25

其他应用程序是否需要读取存储的数据,或者只是您的应用程序?它需要是“标准”格式吗?

Fast Infoset 满足要求 (1) 和 (2),尽管因为它只是XML 信息模型,它与 XML 一样是无类型的。不过,在没有其他任何东西的情况下,可能足以满足您的目的。

Do other applications need to read the stored data, or just yours? Does it need to be a "standard" format?

Fast Infoset meets requirements (1) and (2), although because it's just a binary representation of the XML information model, it's just as untyped as XML. Might be good enough for your purposes, though, in the absence of anything else.

无需解释 2024-08-11 10:46:25

您的需求细节太少,无法提供好的建议。例如,您可以自由选择存储介质吗?它是文件系统、数据库还是其他什么?

“最小内存消耗”是什么意思?您是否在受限平台上运行?必须与其他应用程序共享资源吗?如果您的计算机有 4GB 内存,1GB 的占用空间足够小吗?您的数据会保存在内存中还是仅保存您正在处理的部分?

如果平台是 Java,我会从其标准序列化开始,然后在我对性能不满意时研究自定义序列化。

There's too little detail in your requirements to give good suggestions. For example are you free to pick your storage medium? Will it be a file system, database or something else?

What does "minimum memory consumption" mean? Are you running on a constrained platform? Must you share resources with other applications? Is a 1GB footprint small enough if your computer has 4GB of memory? Will your data sit in memory or only the parts you are working on?

If the platform was Java, I'd start with its standard serialization and then investigate custom serialization if I wasn't happy with the performance.

夜吻♂芭芘 2024-08-11 10:46:25

您还可以将 XML 读入对象图并存储为 Google Protocol Buffers。这些被设计得非常高效。

You could also read the XML into an object graph and store as Google Protocol Buffers. These are designed to be very efficient.

抱着落日 2024-08-11 10:46:25

如果格式可以讨论,我建议使用 JSON,而不是 XML。实际上,JSON 的加载和写入速度比标准 XML 更快。

有关 JSON 的更多信息:

http://www.json. 25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986
http://www.25hoursaday.com/博客/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b

If the format is discussable, I'd suggest JSON, not XML. JSON is actually faster to load and write than standard XML.

More about JSON :

http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986
http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b

萌吟 2024-08-11 10:46:25

维基百科对这个问题的解释:
http://en.wikipedia.org/wiki/Binary_XML

据说推荐的组织及其java .net sdk 可以从以下位置下载:
http://www.agiledelta.com/product_efx.html

xml 是纯文本,但可以是用于表示序列化的对象。
假设您的序列化程序正在将对象序列化为 xml。

您不应该尝试将对象转换为二进制流,因为您必须处理字节序(http://en .wikipedia.org/wiki/Endian)和数据表示问题。但是,如果您坚持,则需要使用 XDR (http://en.wikipedia.org/wiki /External_Data_Representation)因其数据架构中立性。

否则,您应该使用标准序列化程序将对象序列化为 XML,然后由于库和 SDK 的可用性而将 xml 转换为二进制/紧凑 xml。然后通过从二进制 xml 解压缩来反序列化。

Wikipedia's explanation of the issue:
http://en.wikipedia.org/wiki/Binary_XML

Supposedly the recommended organisation and its java and .net sdk can be downloaded from:
http://www.agiledelta.com/product_efx.html

xml is pure text but can be used to represent serialized objects.
Let's presume your serializer is serializing your objects into xml.

You should not try to convert your objects into binary streams because you would have to tackle endian (http://en.wikipedia.org/wiki/Endian) and data-representation issues. However, if you insist, you would need to use XDR (http://en.wikipedia.org/wiki/External_Data_Representation) for its data architecture neutrality.

Otherwise, you should serialize your objects to XML using standard serializers and then convert the xml to binary/compact xml because of the availability of libraries and sdks. And then deserialize by decompacting from binary xml.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文