返回介绍

XML Module

发布于 2019-10-04 14:57:25 字数 23022 浏览 1117 评论 0 收藏 0

This module is part of the Qt Enterprise Edition.

  • Overview of the XML architecture in Qt
  • The Qt SAX2 classes
    • Introduction to SAX2
    • Features
    • Namespace support via features
      • Summary
    • Properties
    • Further reading
  • The Qt DOM classes
    • Introduction to DOM
    • Further reading
  • An introduction to namespaces
    • Conventions used in Qt XML documentation

Overview of the XML architecture in Qt

The XML module provides a well-formed XML parser using the SAX2 (Simple API for XML) interface plus an implementation of the DOM Level 2 (Document Object Model).

SAX is an event-based standard interface for XML parsers. The Qt interface follows the design of the SAX2 Java implementation. Its naming scheme was adapted to fit the Qt naming conventions. Details on SAX2 can be found at http://www.megginson.com/SAX/.

Support for SAX2 filters and the reader factory are under development. Furthermore the Qt implementation does not include the SAX1 compatibility classes present in the Java interface.

For an introduction to Qt's SAX2 classes see "The Qt SAX2 classes". A code example is discussed in the "tagreader walkthrough".

DOM Level 2 is a W3C Recommendation for XML interfaces that maps the constituents of an XML document to a tree structure. Details and the specification of DOM Level 2 can be found at http://www.w3.org/DOM/. More information about the DOM classes in Qt is provided in the Qt DOM classes.

Qt provides the following XML related classes:

  • QDomAttr -- Represents one attribute of a QDomElement
  • QDomCDATASection -- Represents an XML CDATA section
  • QDomCharacterData -- Represents a generic string in the DOM
  • QDomComment -- Represents an XML comment
  • QDomDocument -- The representation of an XML document
  • QDomDocumentFragment -- Tree of QDomNodes which is usually not a complete QDomDocument
  • QDomDocumentType -- The representation of the DTD in the document tree
  • QDomElement -- Represents one element in the DOM tree
  • QDomEntity -- Represents an XML entity
  • QDomEntityReference -- Represents an XML entity reference
  • QDomImplementation -- Information about the features of the DOM implementation
  • QDomNamedNodeMap -- Collection of nodes that can be accessed by name
  • QDomNode -- The base class for all nodes of the DOM tree
  • QDomNodeList -- List of QDomNode objects
  • QDomNotation -- Represents an XML notation
  • QDomProcessingInstruction -- Represents an XML processing instruction
  • QDomText -- Represents textual data in the parsed XML document
  • QXmlAttributes -- XML attributes
  • QXmlContentHandler -- Interface to report logical content of XML data
  • QXmlDeclHandler -- Interface to report declaration content of XML data
  • QXmlDefaultHandler -- Default implementation of all XML handler classes
  • QXmlDTDHandler -- Interface to report DTD content of XML data
  • QXmlEntityResolver -- Interface to resolve extern entities contained in XML data
  • QXmlErrorHandler -- Interface to report errors in XML data
  • QXmlInputSource -- The input data for the QXmlReader subclasses
  • QXmlLexicalHandler -- Interface to report lexical content of XML data
  • QXmlLocator -- The XML handler classes with information about the actual parsing position
  • QXmlNamespaceSupport -- Helper class for XML readers which want to include namespace support
  • QXmlParseException -- Used to report errors with the QXmlErrorHandler interface
  • QXmlReader -- Interface for XML readers (i.e. for SAX2 parsers)
  • QXmlSimpleReader -- Implementation of a simple XML reader (a SAX2 parser)

The Qt SAX2 classes

Introduction to SAX2

The SAX2 interface is an event-driven mechanism to provide the user with document information. "Event" in this context has nothing to do with the term "event" you probably know from windowing systems; it means that the parser reports certain document information while parsing the document. These reported information is referred to as "event".

To make it less abstract consider the following example:

 
<quote>To make it less abstract consider the following example:</quote>

Whilst reading (a SAX2 parser is usually referred to as "reader") the above document three events would be triggered:

  1. A start tag occurs (<quote>).
  2. Character data (i.e. text) is found.
  3. An end tag is parsed (</quote>).

Each time such an event occurs the parser reports it so that a suitable event handling routine can be invoked.

Whilst this is a fast and simple approach to read XML documents manipulation is difficult because data are not stored, simply handled and discarded serially. This is when the DOM interface comes handy.

The Qt XML module provides an abstract class, QXmlReader, that defines the interface for potential SAX2 readers. At the moment Qt ships with one reader implementation, QXmlSimpleReader.

The reader reports parsing events through special handler classes. In Qt the following ones are available:

  • QXmlContentHandler reports events related to the content of a document (e.g. the start tag or characters).
  • QXmlDTDHandler reports events related to the DTD (e.g. notation declarations).
  • QXmlErrorHandler reports errors or warnings that occurred during parsing.
  • QXmlEntityResolver reports external entities during parsing and allows the user to resolve external entities him- or herself instead of leaving it to the reader.
  • QXmlDeclHandler reports further DTD related events (e.g. attribute declarations). Usually users are not interested in them, but under certain circumstances this class comes handy.
  • QXmlLexicalHandler reports events related to the lexical structure of the document (the beginning of the DTD, comments etc.). Occasionally this might be useful.

These classes are abstract classes describing the interface. The QXmlDefaultHandler class provides a "do nothing" default implementation for all of them. Therefore users need to overload only the QXmlDefaultHandler functions they are interested in.

To read input XML data a special class QXmlInputSource is used.

Apart from the already mentioned ones the following SAX2 support classes provide the user with useful functionality:

  • QXmlAttributes is used to pass attributes in a start element event.
  • QXmlLocator is used to obtain the actual parsing position of an event.
  • QXmlNamespaceSupport is used to easily implement namespace support for a reader. Note that namespaces do not change the parsing behavior. They are only reported through the handler.

Features

The behaviour of an XML reader depends on whether it supports certain optional features or not. As an example a reader can have the feature "report attributes used for namespace declarations and prefixes along with the local name of a tag". Like every other feature this has a unique name represented by a URI: it is called http://xml.org/sax/features/namespace-prefixes.

The Qt SAX2 implementation allows you to find out whether the reader has this ability using QXmlReader::hasFeature(). If the return value is TRUE it is possible to turn the relevant feature on and off. To do this use QXmlReader::setFeature(). Whether a supported feature is on or off (TRUE or FALSE) can be queried using QXmlReader::feature().

Consider the example

 
<document xmlns:book = 'http://trolltech.com/fnord/book/'
          xmlns      = 'http://trolltech.com/fnord/' >
A reader not supporting the http://xml.org/sax/features/namespace-prefixes feature would clearly report the element name document but not its attributes xmlns:book and xmlns with their values. A reader with the feature http://xml.org/sax/features/namespace-prefixes reports the namespace attributes if QXmlReader::feature() is TRUE and disregards them if the feature is FALSE.

Other features include http://xml.org/sax/features/namespace (namespace processing, implies http://xml.org/sax/features/namespace-prefixes) or http://xml.org/sax/features/validation (the ability to report validation errors).

Whilst SAX2 leaves it to the user to define and implement whatever features are required, support for http://xml.org/sax/features/namespace (and thus http://xml.org/sax/features/namespace-prefixes) is mandantory. Accordingly QXmlSimpleReader, the implementation of QXmlReader that comes with the Qt XML module, supports both of them, and therefore can do namespace processing.

Being a non-validating parser QXmlSimpleReader does not support http://xml.org/sax/features/validation and other features.

Namespace support via features

As we have seen in the previous section we can configure the behavior of the reader when it comes to namespace processing. This is done by setting and unsetting the http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes features.

They influence the reporting behavior in the following way:

  1. Namespace prefixes and local parts of elements and attributes can be reported.
  2. The qualified names of elements and attributes are reported.
  3. QXmlContentHandler::startPrefixMapping() and QXmlContentHandler::endPrefixMapping() are called by the reader.
  4. Attributes that declare namespaces (i.e. the attribute xmlns and attributes starting with xmlns: ) are reported.

Consider the following element:

<author xmlns:fnord = 'http://trolltech.com/fnord/'
             title="Ms" 
             fnord:title="Goddess" 
             name="Eris Kallisti"/>

With http://xml.org/sax/features/namespace-prefixes set to TRUE the reader will report four attributes, with the namespace-prefixes feature set to FALSE only three: The xmlns:fnord attribute defining a namespace is then "unvisible" for the reader.

The http://xml.org/sax/features/namespaces feature on the other hand is responsible for reporting local names, namespace prefixes and -URIs. With http://xml.org/sax/features/namespaces set to TRUE the parser will report title as the local name of fnord:title attribute, fnord being the namespace prefix and http://trolltech.com/fnord/ as the namespace URI. When http://xml.org/sax/features/namespaces is FALSE none of them are reported.

In the current implementation the Qt XML classes follow the definition that the prefix xmlns itself isn't associated with any namespace at all (see http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using). Therefore even with http://xml.org/sax/features/namespaces and http://xml.org/sax/features/namespace-prefixes both set to TRUE the reader won't return either a local name, a namespace prefix or a namespace URI for xmlns:fnord.

This might be changed in the future following the W3C suggestion http://www.w3.org/2000/xmlns/ to associate xmlns with the namespace http://www.w3.org/2000/xmlns.

As the SAX2 standard suggests QXmlSimpleReader by default has http://xml.org/sax/features/namespaces set to TRUE and http://xml.org/sax/features/namespace-prefixes set to FALSE. When changing this behavior using QXmlSimpleReader::setFeature() note that the combination of both features set to FALSE is illegal.

For a practical demonstration of how the two features affect the output of the reader run the tagreader with features example.

Summary

QXmlSimpleReader implements the following behavior:

(namespaces, namespace-prefixes)Namespace prefix and local partQualified namesPrefix mappingxmlns attributes
(TRUE, FALSE)YesYes*YesNo
(TRUE, TRUE)YesYesYesYes
(FALSE, TRUE)No*YesNo*Yes
(FALSE, FALSE)Illegal

For the entries marked with a "*", SAX does not require a particuliar behavior.

Properties

Properties are a more general concept. They also have a unique name, represented as an URI, but their value is void*. Thus nearly everything can be used as a property value. This concept involves some danger, though: there are no means to ensure type-safety; the user must take care that he or she passes the correct type. Properties are useful if a reader supports special handler classes.

The URIs used for features and properties often look like URLs, e.g. http://xml.org/sax/features/namespace. This does not mean that whatsoever data is required at this address. It is simply a way to define unique names.

Everybody can define and use new SAX2 properties for his or her readers. Property support is however not required.

To set or query properties the following functions are provided: QXmlReader::setProperty(), QXmlReader::property() and QXmlReader::hasProperty().

Further reading

For a practical example on how to use the Qt SAX2 classes see the tagreader walkthrough.

More information about XML (e.g. namespaces) can be found in the introduction to the Qt XML module.

The Qt DOM classes

Introduction to DOM

DOM provides an interface to access and change the content and structure of an XML file. It makes a hierarchical view of the document (tree) available with the root element of the XML file serving as its root. Thus -- in contrast to the SAX2 interface -- an object model of the document is resident in memory after parsing which makes manipulation easy.

In the Qt implementation of the DOM all nodes in the document tree are subclasses of QDomNode. The document itself is represented as a QDomDocument object.

Here are the available node classes and their potential children classes:

  • QDomDocument: Possible children are
    • QDomElement (at most one)
    • QDomProcessingInstruction
    • QDomComment
    • QDomDocumentType
  • QDomDocumentFragment: Possible children are
    • QDomElement
    • QDomProcessingInstruction
    • QDomComment
    • QDomText
    • QDomCDATASection
    • QDomEntityReference
  • QDomDocumentType: No children
  • QDomEntityReference: Possible children are
    • QDomElement
    • QDomProcessingInstruction
    • QDomComment
    • QDomText
    • QDomCDATASection
    • QDomEntityReference
  • QDomElement: Possible children are
    • QDomElement
    • QDomText
    • QDomComment
    • QDomProcessingInstruction
    • QDomCDATASection
    • QDomEntityReference
  • QDomAttr: Possible children are
    • QDomText
    • QDomEntityReference
  • QDomProcessingInstruction: No children
  • QDomComment: No children
  • QDomText: No children
  • QDomCDATASection: No children
  • QDomEntity: Possible children are
    • QDomElement
    • QDomProcessingInstruction
    • QDomComment
    • QDomText
    • QDomCDATASection
    • QDomEntityReference
  • QDomNotation: No children

With QDomNodeList and QDomNamedNodeMap two collection classes are provided: QDomNodeList is a list of nodes whereas QDomNamedNodeMap is used to handle unordered sets of nodes (often used for attributes).

The QDomImplementation class allows the user to query features of the DOM implementation.

Further reading

To get started please refer to the QDomDocument documentation that describes basic usage.

An introduction to namespaces

Parts of the Qt XML module documentation assume that you are familiar with XML namespaces. Here we present a brief introduction; skip to Qt XML documentation conventions if you know this material.

Namespaces are a concept introduced into XML to allow a more modular design. With their help data processing software can easily resolve naming conflicts in XML documents.

Consider the following example:

<document>
<book>
  <title>Practical XML</title>
  <author title="Ms" name="Eris Kallisti"/>
  <chapter>
    <title>A Namespace Called fnord</title>
  </chapter>
</book>
</document>

Here we find three different uses of the name title. If you wish to process this document you will encounter problems because each of the titles should be displayed in a different manner -- even though they have the same name.

The solution would be to have some means of identifying the first occurrence of title as the title of a book, i.e. to use the title element of a book namespace to distinguish it from for example the chapter title, e.g.:

<book:title>Practical XML</book:title>

book in this case is a prefix denoting the namespace.

Before we can apply a namespace to element or attribute names we must declare it.

Namespaces are URIs like http://trolltech.com/fnord/book/. This does not mean that data must be available at this address; the URI is simply used to provide a unique name.

We declare namespaces in the same way as attributes; strictly speaking they are attributes. To make for example http://trolltech.com/fnord/ the document's default XML namespace xmlns we write

xmlns="http://trolltech.com/fnord/"

To distinguish the http://trolltech.com/fnord/book/ namespace from the default, we have to supply it with a prefix:

xmlns:book="http://trolltech.com/fnord/book/"

A namespace that is declared like this can be applied to element and attribute names by prepending the appropriate prefix and a ":" delimiter. We have already seen this with the book:title element.

Element names without a prefix belong to the default namespace. This rule does not apply to attributes: an attribute without a prefix does not belong to any of the declared XML namespaces at all. Attributes always belong to the "traditional" namespace of the element in which they appear. A "traditional" namespace is not an XML namespace, it simply means that all attribute names belonging to one element must be different. Later we will see how to assign an XML namespace to an attribute.

Due to the fact that attributes without prefixes are not in any XML namespace there is no collision between the attribute title (that belongs to the author element) and for example the title element within a chapter.

Let's clarify matters with an example:

<document xmlns:book = 'http://trolltech.com/fnord/book/'
          xmlns      = 'http://trolltech.com/fnord/' >
<book>
  <book:title>Practical XML</book:title>
  <book:author xmlns:fnord = 'http://trolltech.com/fnord/'
               title="Ms"
               fnord:title="Goddess"
               name="Eris Kallisti"/>
  <chapter>
    <title>A Namespace Called fnord</title>
  </chapter>
</book>
</document>

Within the document element we have two namespaces declared. The default namespace http://trolltech.com/fnord/ applies to the book element, the chapter element, the appropriate title element and of course to document itself.

The book:author and book:title elements belong to the namespace with the URI http://trolltech.com/fnord/book/.

The two book:author attributes title and name have no XML namespace assigned. They are only members of the "traditional" namespace of the element book:author, meaning that for example two title attributes in book:author are forbidden.

In the above example we circumvent the last rule by adding a title attribute from the http://trolltech.com/fnord/ namespace to book:author: the fnord:title comes from the namespace with the prefix fnord that is declared in the book:author element.

Clearly the fnord namespace has the same namespace URI as the default namespace. So why didn't we simply use the default namespace we'd already declared? The answer is quite complex:

  • attributes without a prefix don't belong to any XML namespace at all, even not to the default namespace;
  • additionally omitting the prefix would lead to a title-title clash;
  • writing it as xmlns:title would declare a new namespace with the prefix title instead of applying the default xmlns namespace.

With the Qt XML classes elements and attributes can be accessed in two ways: either by refering to their qualified names consisting of the namespace prefix and the "real" name (or local name) or by the combination of local name and namespace URI.

More information on XML namespaces can be found at http://www.w3.org/TR/REC-xml-names/.

Conventions used in Qt XML documentation

The following terms are used to distinguish the parts of names within the context of namespaces:

  • The qualified name is the name as it appears in the document. (In the above example book:title is a qualified name.)
  • A namespace prefix in a qualified name is the part to the left of the ":". (book is the namespace prefix in book:title.)
  • The local part of a name (also refered to as the local name) appears to the right of the ":". (Thus title is the local part of book:title.)
  • The namespace URI ("Uniform Resource Identifier") is a unique identifier for a namespace. It looks like a URL (e.g. http://trolltech.com/fnord/ ) but does not require data to be accessible by the given protocol at the named address.

Elements without a ":" (like chapter in the example) do not have a namespace prefix. In this case the local part and the qualified name are identical (i.e. chapter).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文