We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(6)
就像标准库容器一样,您应该使用什么库取决于您的需求。这是一个方便的流程图:
所以第一个问题是:您需要什么?
我需要完整的 XML 合规性
好的,所以您需要处理 XML。不是玩具 XML,而是真实的 XML。您需要能够读取和写入所有 XML 规范,而不仅仅是底层、易于解析的部分。您需要命名空间、文档类型、实体替换等工作。完整的 W3C XML 规范。
下一个问题是:您的 API 需要符合 DOM 或 SAX 吗?
我需要精确的 DOM 和/或 SAX 一致性
,所以您确实需要 API 是 DOM 和/或 SAX。它不能只是一个 SAX 风格的推送解析器,或者一个 DOM 风格的保留解析器。在 C++ 允许的范围内,它必须是实际的 DOM 或实际的 SAX。
您已选择:
Xerces
这是您的选择。它几乎是唯一具有完全(或接近 C++ 允许的)DOM 和 SAX 一致性的 C++ XML 解析器/编写器。它还具有 XInclude 支持、XML Schema 支持以及大量其他功能。
它没有真正的依赖关系。它使用 Apache 许可证。
我不关心 DOM 和/或 SAX 一致性
您选择了:
LibXML2
LibXML2 提供C 风格的接口(如果这确实困扰您,请使用 Xerces),尽管该接口至少在某种程度上是基于对象的并且易于包装。它提供了很多功能,例如 XInclude 支持(带有回调,以便您可以告诉它从哪里获取文件)、XPath 1.0 识别器、RelaxNG 和 Schematron 支持(尽管错误消息留下了很多 需要),等等。
它确实依赖 iconv,但可以在没有该依赖项的情况下对其进行配置。尽管这确实意味着您将拥有一组更有限的可解析的可能文本编码。
它使用 MIT 许可证。
我不需要完全的 XML 合规性
好吧,所以完全的 XML 合规性对您来说并不重要。您的 XML 文档要么完全在您的控制之下,要么保证使用 XML 的“基本子集”:没有命名空间、实体等。
那么,什么对您来说重要呢?下一个问题是:在您的 XML 工作中对您来说最重要的事情是什么?
最大 XML 解析性能
您的应用程序需要尽可能快地将 XML 转换为 C++ 数据结构。
您已选择:
RapidXML
这个 XML 解析器正如其包装上所写的那样:快速 XML。它甚至不涉及将文件拉入内存;如何发生取决于你。它所做的就是将其解析为一系列您可以访问的 C++ 数据结构。它执行此操作的速度与逐字节扫描文件的速度一样快。
当然,天下没有免费的午餐。与大多数不关心 XML 规范的 XML 解析器一样,Rapid XML 不涉及名称空间、DocType、实体(字符实体和 6 个基本 XML 实体除外)等。基本上是节点、元素、属性等等。
而且,它是一个 DOM 风格的解析器。因此,它确实要求您阅读所有文本。但是,它不会复制任何文本(通常)。 RapidXML 获得大部分速度的方式是通过就地引用字符串。这需要您进行更多的内存管理(当 RapidXML 查看该字符串时,您必须保持该字符串处于活动状态)。
RapidXML 的 DOM 是简单的。您可以获得事物的字符串值。您可以按名称搜索属性。就是这样。没有方便的函数可以将属性转换为其他值(数字、日期等)。你只得到字符串。
RapidXML 的另一个缺点是编写 XML 很痛苦。它要求您对字符串名称进行大量显式内存分配才能构建其 DOM。它确实提供了一种字符串缓冲区,但这仍然需要您进行大量的显式工作。它确实很实用,但使用起来很痛苦。
它使用麻省理工学院的许可证。它是一个只有头文件的库,没有依赖项。
我关心性能,但不太关心 是的
,性能对您来说很重要。但也许您需要一些不那么简单的东西。也许可以处理更多的 Unicode,或者不需要那么多用户控制的内存管理。性能仍然很重要,但您想要一些不那么直接的东西。
您已选择:
PugiXML
从历史上看,这曾是 RapidXML 的灵感来源。但这两个项目有所不同,Pugi 提供更多功能,而 RapidXML 完全专注于速度。
PugiXML 提供 Unicode 转换支持,因此如果您有一些 UTF-16 文档并希望将它们读取为 UTF-8,Pugi 将提供。如果您需要此类东西,它甚至还有 XPath 1.0 实现。
但普吉的速度还是很快的。与 RapidXML 一样,它没有依赖性,并且根据 MIT 许可证分发。
阅读巨大的文档
您需要阅读大小为千兆字节的文档。也许您从标准输入获取它们,并由其他进程提供。或者您正在从大量文件中读取它们。或者无论如何。关键是,您需要的是不必一次将整个文件读入内存才能处理它。
您已选择:
LibXML2
Xerces 的 SAX 样式 API 将以此功能工作,但 LibXML2 在这里是因为它更容易使用。 SAX 风格的 API 是一种推送 API:它开始解析流并触发您必须捕获的事件。您被迫管理上下文、状态等等。读取 SAX 风格 API 的代码比人们想象的要分散得多。
LibXML2 的
xmlReader
对象是一个 pull-API。您要求转到下一个 XML 节点或元素;没人告诉你。这允许您按照您认为合适的方式存储上下文,以比一堆回调在代码中更具可读性的方式处理不同的实体。替代方案
Expat
Expat 是一个著名的 C++ 解析器,它使用 pull-parser API。它的作者是詹姆斯·克拉克。
它的当前状态是活动的。最新版本是2.2.9,发布于(2019-09-25)。
LlamaXML
它是 StAX 风格 API 的实现。它是一个拉式解析器,类似于 LibXML2 的
xmlReader
解析器。但它自 2005 年以来就没有更新过。所以,买者自负。
XPath 支持
XPath 是一个用于查询 XML 树中元素的系统。这是一种使用标准化语法通过公共属性有效命名元素或元素集合的便捷方法。许多 XML 库提供 XPath 支持。
这里实际上有三个选择:
完成工作
所以,您不必关心 XML 的正确性。性能对你来说不是问题。流媒体无关紧要。您所需要的只是某种东西能够将XML 存入内存并允许您将其再次粘回到磁盘上。 您关心的是API。
您希望 XML 解析器体积小、易于安装、使用简单,并且小到与最终可执行文件的大小无关。
您已选择:
TinyXML
我将 TinyXML 放入此槽中是因为它与 XML 解析器一样简单易用。是的,它很慢,但它简单明了。它有很多方便的函数用于转换属性等。
在 TinyXML 中编写 XML 没有问题。您只需
新建
一些对象,将它们附加在一起,将文档发送到std::ostream
,然后每个人都很高兴。还有一些围绕 TinyXML 构建的生态系统,具有对迭代器更友好的 API,甚至在其之上分层的 XPath 1.0 实现。
TinyXML 使用 zLib 许可证,它或多或少是具有不同名称的 MIT 许可证。
Just like with standard library containers, what library you should use depends on your needs. Here's a convenient flowchart:
So the first question is this: What do you need?
I Need Full XML Compliance
OK, so you need to process XML. Not toy XML, real XML. You need to be able to read and write all of the XML specification, not just the low-lying, easy-to-parse bits. You need Namespaces, DocTypes, entity substitution, the works. The W3C XML Specification, in its entirety.
The next question is: Does your API need to conform to DOM or SAX?
I Need Exact DOM and/or SAX Conformance
OK, so you really need the API to be DOM and/or SAX. It can't just be a SAX-style push parser, or a DOM-style retained parser. It must be the actual DOM or the actual SAX, to the extent that C++ allows.
You have chosen:
Xerces
That's your choice. It's pretty much the only C++ XML parser/writer that has full (or as near as C++ allows) DOM and SAX conformance. It also has XInclude support, XML Schema support, and a plethora of other features.
It has no real dependencies. It uses the Apache license.
I Don't Care About DOM and/or SAX Conformance
You have chosen:
LibXML2
LibXML2 offers a C-style interface (if that really bothers you, go use Xerces), though the interface is at least somewhat object-based and easily wrapped. It provides a lot of features, like XInclude support (with callbacks so that you can tell it where it gets the file from), an XPath 1.0 recognizer, RelaxNG and Schematron support (though the error messages leave a lot to be desired), and so forth.
It does have a dependency on iconv, but it can be configured without that dependency. Though that does mean that you'll have a more limited set of possible text encodings it can parse.
It uses the MIT license.
I Do Not Need Full XML Compliance
OK, so full XML compliance doesn't matter to you. Your XML documents are either fully under your control or are guaranteed to use the "basic subset" of XML: no namespaces, entities, etc.
So what does matter to you? The next question is: What is the most important thing to you in your XML work?
Maximum XML Parsing Performance
Your application needs to take XML and turn it into C++ datastructures as fast as this conversion can possibly happen.
You have chosen:
RapidXML
This XML parser is exactly what it says on the tin: rapid XML. It doesn't even deal with pulling the file into memory; how that happens is up to you. What it does deal with is parsing that into a series of C++ data structures that you can access. And it does this about as fast as it takes to scan the file byte by byte.
Of course, there's no such thing as a free lunch. Like most XML parsers that don't care about the XML specification, Rapid XML doesn't touch namespaces, DocTypes, entities (with the exception of character entities and the 6 basic XML ones), and so forth. So basically nodes, elements, attributes, and such.
Also, it is a DOM-style parser. So it does require that you read all of the text in. However, what it doesn't do is copy any of that text (usually). The way RapidXML gets most of its speed is by refering to strings in-place. This requires more memory management on your part (you must keep that string alive while RapidXML is looking at it).
RapidXML's DOM is bare-bones. You can get string values for things. You can search for attributes by name. That's about it. There are no convenience functions to turn attributes into other values (numbers, dates, etc). You just get strings.
One other downside with RapidXML is that it is painful for writing XML. It requires you to do a lot of explicit memory allocation of string names in order to build its DOM. It does provide a kind of string buffer, but that still requires a lot of explicit work on your end. It's certainly functional, but it's a pain to use.
It uses the MIT licence. It is a header-only library with no dependencies.
I Care About Performance But Not Quite That Much
Yes, performance matters to you. But maybe you need something a bit less bare-bones. Maybe something that can handle more Unicode, or doesn't require so much user-controlled memory management. Performance is still important, but you want something a little less direct.
You have chosen:
PugiXML
Historically, this served as inspiration for RapidXML. But the two projects have diverged, with Pugi offering more features, while RapidXML is focused entirely on speed.
PugiXML offers Unicode conversion support, so if you have some UTF-16 docs around and want to read them as UTF-8, Pugi will provide. It even has an XPath 1.0 implementation, if you need that sort of thing.
But Pugi is still quite fast. Like RapidXML, it has no dependencies and is distributed under the MIT License.
Reading Huge Documents
You need to read documents that are measured in the gigabytes in size. Maybe you're getting them from stdin, being fed by some other process. Or you're reading them from massive files. Or whatever. The point is, what you need is to not have to read the entire file into memory all at once in order to process it.
You have chosen:
LibXML2
Xerces's SAX-style API will work in this capacity, but LibXML2 is here because it's a bit easier to work with. A SAX-style API is a push-API: it starts parsing a stream and just fires off events that you have to catch. You are forced to manage context, state, and so forth. Code that reads a SAX-style API is a lot more spread out than one might hope.
LibXML2's
xmlReader
object is a pull-API. You ask to go to the next XML node or element; you aren't told. This allows you to store context as you see fit, to handle different entities in a way that's much more readable in code than a bunch of callbacks.Alternatives
Expat
Expat is a well-known C++ parser that uses a pull-parser API. It was written by James Clark.
It's current status is active. The most recent version is 2.2.9, which was released on (2019-09-25).
LlamaXML
It is an implementation of an StAX-style API. It is a pull-parser, similar to LibXML2's
xmlReader
parser.But it hasn't been updated since 2005. So again, Caveat Emptor.
XPath Support
XPath is a system for querying elements within an XML tree. It's a handy way of effectively naming an element or collection of element by common properties, using a standardized syntax. Many XML libraries offer XPath support.
There are effectively three choices here:
Just Get The Job Done
So, you don't care about XML correctness. Performance isn't an issue for you. Streaming is irrelevant. All you want is something that gets XML into memory and allows you to stick it back onto disk again. What you care about is API.
You want an XML parser that's going to be small, easy to install, trivial to use, and small enough to be irrelevant to your eventual executable's size.
You have chosen:
TinyXML
I put TinyXML in this slot because it is about as braindead simple to use as XML parsers get. Yes, it's slow, but it's simple and obvious. It has a lot of convenience functions for converting attributes and so forth.
Writing XML is no problem in TinyXML. You just
new
up some objects, attach them together, send the document to astd::ostream
, and everyone's happy.There is also something of an ecosystem built around TinyXML, with a more iterator-friendly API, and even an XPath 1.0 implementation layered on top of it.
TinyXML uses the zLib license, which is more or less the MIT License with a different name.
您可能需要考虑另一种处理 XML 的方法,称为 XML
数据绑定。特别是如果您已经有了 XML 词汇表的正式规范(例如 XML 模式)。
XML 数据绑定允许您使用 XML,而无需实际执行任何 XML 解析或序列化。数据绑定编译器自动生成所有低级代码,并将解析的数据呈现为与您的应用程序域相对应的 C++ 类。然后,您可以通过调用函数并使用 C++ 类型(int、double 等)来处理这些数据,而不是比较字符串和解析文本(这是使用低级 XML 访问 API(如 DOM 或 SAX)所做的事情)。
例如,请参阅我编写的开源 XML 数据绑定实现,
CodeSynthesis XSD 并且,对于
轻量级、无依赖版本,CodeSynthesis
XSD/e。
There is another approach to handling XML that you may want to consider, called XML
data binding. Especially if you already have a formal specification of your XML vocabulary, for example, in XML Schema.
XML data binding allows you to use XML without actually doing any XML parsing or serialization. A data binding compiler auto-generates all the low-level code and presents the parsed data as C++ classes that correspond to your application domain. You then work with this data by calling functions, and working with C++ types (int, double, etc) instead of comparing strings and parsing text (which is what you do with low-level XML access APIs such as DOM or SAX).
See, for example, an open-source XML data binding implementation that I wrote,
CodeSynthesis XSD and, for a
lighter-weight, dependency-free version, CodeSynthesis
XSD/e.
在 Secured Globe, Inc. 中,我们使用 rapidxml。我们尝试了所有其他方法,但rapidxml似乎是我们的最佳选择。
这是一个例子:
In Secured Globe, Inc. we use rapidxml. We tried all the others but rapidxml seems to be the best choice for us.
Here is an example:
关于 Expat 的另一点注意事项:嵌入式系统工作值得关注。然而,您可能在网上找到的文档是古老且错误的。源代码实际上有相当详尽的函数级注释,但需要仔细阅读才能理解它们。
One other note about Expat: it's worth looking at for embedded systems work. However, the documentation you are likely to find on the web is ancient and wrong. The source code actually has fairly thorough function-level comments, but it will take some perusing for them to make sense.
好吧。我创建了一个新的列表,因为没有一个列表不能满足我的需求。
优点:
项目主页
Ok then. I've created new one, since none of the list wasn't statisfy my needs.
Benefits:
Project home
我的也放上来吧
http://www.codeproject.com /Articles/998388/XMLplusplus-version-The-Cplusplus-update-of-my-XML
没有 XML 验证功能,但速度很快。
Put mine as well.
http://www.codeproject.com/Articles/998388/XMLplusplus-version-The-Cplusplus-update-of-my-XML
No XML validation features, but fast.