使用 XSD 验证 XML...但仍允许可扩展性

发布于 2024-09-11 20:20:57 字数 1398 浏览 1 评论 0 原文

也许是我,但看起来如果您有一个

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

定义此文档架构的

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <GivenName></GivenName>
    <SurName></SurName>
</User>

XSD如果您添加了另一个元素(例如 EmailAddress)并混淆了顺序,它将无法验证

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <SurName></SurName>
    <EmailAddress></EmailAddress>
    <GivenName></GivenName>
</User>

我不想将 EmailAddress 添加到文档中并将其标记为可选。

我只想要一个 XSD 来验证文档必须满足的最低要求。

有办法做到这一点吗?

编辑:

marc_s 在下面指出,您可以在 xs:sequence 内部使用 xs:any 来允许更多元素,不幸的是,您必须维护元素的顺序。

或者,我可以使用 xs:all ,它不会强制执行元素的顺序,但可惜的是,不允许我将 xs:any 放置在其中。

Maybe it's me, but it appears that if you have an XSD

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

that defines the schema for this document

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <GivenName></GivenName>
    <SurName></SurName>
</User>

It would fail to validate if you added another element, say EmailAddress, and mix up the order

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <SurName></SurName>
    <EmailAddress></EmailAddress>
    <GivenName></GivenName>
</User>

I don't want to add EmailAddress to the document and have it be marked optional.

I just want an XSD that validates the bare minimum requirements that the document must meet.

Is there a way to do this?

EDIT:

marc_s pointed out below that you can use xs:any inside of xs:sequence to allow more elements, unfortunately, you have to maintain the order of elements.

Alternatively, I can use xs:all which doesn't enforce the order of elements, but alas, doesn't allow me to place xs:any inside of it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

月下客 2024-09-18 20:20:57

你的问题已经有了解决方案,但效果并不理想。原因如下:

违反非确定性内容模型

您已经触及了 W3C XML Schema 的灵魂。您所要求的 - 可变顺序可变未知元素 - 违反了 XSD 最难但最基本的原则,即非歧义性规则,或者更正式地说, 唯一粒子属性约束

内容模型必须这样形成:
在验证期间 [..] 每个项目
序列中可以是唯一的
无需检查即可确定
该项目的内容或属性,
并且没有任何有关的信息
其余的项目
序列。

用普通英语来说:当验证 XML 并且 XSD 处理器遇到 时,它必须能够验证它,而无需首先检查后面是否有 < ;GivenName>,即不期待。在您的情况下,这是不可能的。该规则的存在是为了允许通过有限状态机实现,这应该使实现变得相当简单和快速。

这是最受争议的问题之一,是 SGML 和 DTD(内容模型必须是确定性的)和 XML 的遗产,默认情况下定义元素的顺序很重要 (因此,尝试相反的做法,使顺序变得不重要,是很困难的)。

正如 Marc_s 已经建议的那样,Relax_NG 是一种允许非确定性内容模型的替代方案。但是,如果您受困于 W3C XML 架构,该怎么办?

无效的半有效解决方案

您已经注意到 xs:all 的限制非常严格。原因很简单:同样的非确定性规则适用,这就是为什么 xs:anymin/maxOccurs 大于 1 且不允许使用序列。

此外,您可能已经尝试过choicesequenceany 的各种组合。 Microsoft XSD处理器遇到这种无效情况时抛出的错误是:

错误:元素的多重定义
'http://example.com/Chad:SurName'
导致内容模型变成
模糊的。内容模型必须是
形成使得在验证期间
元素信息项序列,
直接包含的粒子,
间接或隐含地其中
尝试验证每个项目
按顺序依次可
未经审查而唯一确定的
该内容或属性
项目,并且没有任何信息
关于其余的项目
序列。

O'Reilly 的 XML Schema 中(是的,这本书有其缺陷)得到了很好的解释。幸运的是,这本书的部分内容可以在线获取。我强烈建议您通读 第7.4.1.3节关于唯一粒子归因规则,他们的解释和例子比我能得到的要清楚得多。

一种可行的解决方案

在大多数情况下,可以从不确定性设计转变为确定性设计。这通常看起来不太漂亮,但如果您必须坚持使用 W3C XML 架构和/或如果您绝对必须允许对 XML 进行非严格规则,那么它是一个解决方案。你的情况的噩梦是,你想要强制执行一件事(2个预定义元素),同时想要让它非常宽松(顺序并不重要并且任何东西都可以在之前和之前之间进行后)。如果我不试图给您提供好的建议,而只是直接带您找到解决方案,那么它将如下所示:

<xs:element name="User">
    <xs:complexType>
        <xs:sequence>
            <xs:any minOccurs="0" processContents="lax" namespace="##other" />
            <xs:choice>
                <xs:sequence>                        
                    <xs:element name="GivenName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="SurName" />
                </xs:sequence>
                <xs:sequence>
                    <xs:element name="SurName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="GivenName" />
                </xs:sequence>
            </xs:choice>
            <xs:any minOccurs="0" processContents="lax" namespace="##any" />
        </xs:sequence>
        <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
    </xs:complexType>
</xs:element>

上面的代码实际上只是有效。但有一些注意事项。第一个是 xs:any,以 ##other 作为其命名空间。除了最后一个之外,您不能使用 ##any,因为这将允许使用 GivenName 等元素来代替,这意味着 的定义用户变得模糊。

第二个警告是,如果您想对两个或三个以上的组合使用此技巧,则必须写下所有组合。维护噩梦。这就是为什么我提出以下建议:

建议的解决方案,可变内容容器的变体

更改您的定义。这样做的优点是对读者或用户来说更加清晰。它还具有变得更易于维护的优点。一整串解决方案在 XFront 上进行了解释,这是一个您可能已经看过的可读性较差的链接来自奥列格的帖子。这是一本很好的读物,但其中大部分内容都没有考虑到您对可变内容容器内有两个元素的最低要求。

当前适合您情况的最佳实践方法(这种情况发生的频率比您想象的要高)是将数据划分为必填字段和非必填字段。您可以添加元素,或者执行相反的操作,添加元素(或将其称为Properties,或者可选数据)。看起来如下:

<xs:element name="User2">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="GivenName" />
            <xs:element name="SurName" />
            <xs:element name="ExtendedInfo" minOccurs="0">
                <xs:complexType>
                    <xs:sequence>
                        <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" namespace="##any" />
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

目前这似乎不太理想,但让它成长一点。拥有一组有序的固定元素并不是什么大问题。您并不是唯一一个抱怨 W3C XML Schema 的明显缺陷的人,但正如我之前所说,如果您必须使用它,您将不得不忍受它的局限性,或者接受开发的负担以更高的拥有成本来克服这些限制。

替代解决方案

我确信您已经知道这一点,但属性的顺序默认情况下未确定。如果你的内容都是简单类型,你也可以选择更丰富地使用属性。

最后一句话

无论您采取什么方法,您都将失去大量数据的可验证性。通常最好允许内容提供商添加内容类型,但前提是可以对其进行验证。您可以通过从lax 切换到strict 处理并使类型本身更严格来实现这一点。但过于严格也不好,正确的平衡将取决于您判断所面临的用例以及权衡某些实施策略的权衡的能力。

Your issue has a resolution, but it will not be pretty. Here's why:

Violation of non-deterministic content models

You've touched on the very soul of W3C XML Schema's. What you are asking — variable order and variable unknown elements — violates the hardest, yet most basic principle of XSD's, the rule of Non-Ambiguity, or, more formally, the Unique Particle Attribution Constraint:

A content model must be formed such
that during validation [..] each item
in the sequence can be uniquely
determined without examining the
content or attributes of that item,
and without any information about the
items in the remainder of the
sequence.

In normal English: when an XML is validated and the XSD processor encounters <SurName> it must be able to validate it without first checking whether it is followed by <GivenName>, i.e., no looking forward. In your scenario, this is not possible. This rule exists to allow implementations through Finite State Machines, which should make implementations rather trivial and fast.

This is one of the most-debated issues and is a heritage of SGML and DTD (content models must be deterministic) and XML, that defines, by default, that the order of elements is important (thus, trying the opposite, making the order unimportant, is hard).

As Marc_s already suggested, Relax_NG is an alternative that allows for non-deterministic content models. But what can you do if you're stuck with W3C XML Schema?

Non-working semi-valid solutions

You've already noticed that xs:all is very restrictive. The reason is simple: the same non-deterministic rule applies and that's why xs:any, min/maxOccurs larger then one and sequences are not allowed.

Also, you may have tried all sorts of combinations of choice, sequence and any. The error that the Microsoft XSD processor throws when encountering such invalid situation is:

Error: Multiple definition of element
'http://example.com/Chad:SurName'
causes the content model to become
ambiguous. A content model must be
formed such that during validation of
an element information item sequence,
the particle contained directly,
indirectly or implicitly therein with
which to attempt to validate each item
in the sequence in turn can be
uniquely determined without examining
the content or attributes of that
item, and without any information
about the items in the remainder of
the sequence.

In O'Reilly's XML Schema (yes, the book has its flaws) this is excellently explained. Furtunately, parts of the book are available online. I highly recommend you read through section 7.4.1.3 about the Unique Particle Attribution Rule, their explanations and examples are much clearer than I can ever get them.

One working solution

In most cases it is possible to go from an undeterministic design to a deterministic design. This usually doesn't look pretty, but it's a solution if you have to stick with W3C XML Schema and/or if you absolutely must allow non-strict rules to your XML. The nightmare with your situation is that you want to enforce one thing (2 predefined elements) and at the same time want to have it very loose (order doesn't matter and anything can go between, before and after). If I don't try to give you good advice but just take you directly to a solution, it will look as follows:

<xs:element name="User">
    <xs:complexType>
        <xs:sequence>
            <xs:any minOccurs="0" processContents="lax" namespace="##other" />
            <xs:choice>
                <xs:sequence>                        
                    <xs:element name="GivenName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="SurName" />
                </xs:sequence>
                <xs:sequence>
                    <xs:element name="SurName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="GivenName" />
                </xs:sequence>
            </xs:choice>
            <xs:any minOccurs="0" processContents="lax" namespace="##any" />
        </xs:sequence>
        <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
    </xs:complexType>
</xs:element>

The code above actually just works. But there are a few caveats. The first is xs:any with ##other as its namespace. You cannot use ##any, except for the last one, because that would allow elements like GivenName to be used in that stead and that means that the definition of User becomes ambiguous.

The second caveat is that if you want to use this trick with more than two or three, you'll have to write down all combinations. A maintenance nightmare. That's why I come up with the following:

A suggested solution, a variant of a Variable Content Container

Change your definition. This has the advantage of being clearer to your readers or users. It also has the advantage of becoming easier to maintain. A whole string of solutions are explained on XFront here, a less readable link you may have already seen from the post from Oleg. It's an excellent read, but most of it does not take into account that you have a minimum requirement of two elements inside the variable content container.

The current best-practice approach for your situation (which happens more often than you may imagine) is to split your data between the required and non-required fields. You can add an element <Required>, or do the opposite, add an element <ExtendedInfo> (or call it Properties, or OptionalData). This looks as follows:

<xs:element name="User2">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="GivenName" />
            <xs:element name="SurName" />
            <xs:element name="ExtendedInfo" minOccurs="0">
                <xs:complexType>
                    <xs:sequence>
                        <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" namespace="##any" />
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

This may seem less than ideal at the moment, but let it grow a bit. Having an ordered set of fixed elements isn't that big a deal. You're not the only one who'll be complaining about this apparent deficiency of W3C XML Schema, but as I said earlier, if you have to use it, you'll have to live with its limitations, or accept the burden of developing around these limitations at a higher cost of ownership.

Alternative solution

I'm sure you know this already, but the order of attributes is by default undetermined. If all your content is of simple types, you can alternatively choose to make a more abundant use of attributes.

A final word

Whatever approach you take, you will lose a lot of verifiability of your data. It's often better to allow content providers to add content types, but only when it can be verified. This you can do by switching from lax to strict processing and by making the types themselves stricter. But being too strict isn't good either, the right balance will depend on your ability to judge the use-cases that you're up against and weighing that in against the trade-offs of certain implementation strategies.

热风软妹 2024-09-18 20:20:57

在阅读了 ma​​rc_s 的答案以及您在评论中的讨论后,我决定添加一些内容。

在我看来,没有完美的解决方案来解决您的问题Chad。有一些方法可以在 XSD 中实现可扩展内容模型,但我所知的所有实现都有一些限制。因为您没有写关于计划使用可扩展 XSD 的环境,所以我只能推荐一些链接,这些链接可能会帮助您选择可以在您的环境中实现的方式:

  1. http://www.xfront.com/ExtensibleContentModels.html (或 http://www.xfront.com/ExtensibleContentModels.pdf) 和 http://www.xfront.com/VariableContentContainers.html
  2. http:// /www.xml.com/lpt/a/993 (或 http://www.xml.com/pub/a/2002/07/03/schema_design.html)
  3. http://msdn.microsoft.com/en-us/library/ms950793.aspx

After reading of the answer of marc_s and your discussion in comments I decide to add a little.

It seems to me there are no perfect solution of your problem Chad. There are some approaches how to implement extensible content model in XSD, but all me known implementation have some restrictions. Because you didn't write about the environment where you plan to use extensible XSD I can you only recommend some links which probably will help you to choose the way which can be implemented in your environment:

  1. http://www.xfront.com/ExtensibleContentModels.html (or http://www.xfront.com/ExtensibleContentModels.pdf) and http://www.xfront.com/VariableContentContainers.html
  2. http://www.xml.com/lpt/a/993 (or http://www.xml.com/pub/a/2002/07/03/schema_design.html)
  3. http://msdn.microsoft.com/en-us/library/ms950793.aspx
千仐 2024-09-18 20:20:57

您应该能够使用 元素来扩展架构以实现可扩展性 - 请参阅 W3Schools 了解详细信息。

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
                <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

当您添加 processContents="lax" 时,.NET XML 验证应该会成功。

有关详细信息,请参阅 xs:any 上的 MSDN 文档

更新:如果您需要更大的灵活性和不太严格的验证,您可能需要查看为 XML 定义架构的其他方法 - 例如 放松NG。 XML Schema 有意地对其规则相当严格,所以对于当前的工作来说,这可能是一个错误的工具。

You should be able to extend your schema with the <xs:any> element for extensibility - see W3Schools for details.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
                <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

When you add the processContents="lax" then the .NET XML validation should succeed on it.

See MSDN docs on xs:any for more details.

Update: if you require more flexibility and less stringent validation, you might want to look at other methods of defining schemas for your XML - something like RelaxNG. XML Schema is - on purpose - rather strict about its rules, so maybe that's just the wrong tool for this job at hand.

打小就很酷 2024-09-18 20:20:57

好吧,您始终可以使用 DTD :-),只不过 DTD 还规定了排序。使用“无序”语法进行验证的成本非常昂贵。您可以使用 xsd:choice 并出现最小值和最大值,但它也可能会犹豫。您还可以编写 XSD 扩展/派生模式。

从您提出问题的方式来看,您似乎根本不需要 XSD。您可以加载它,然后使用 XPath 验证您想要的任何最小值,但只是抗议 XSD,在它成为普遍存在的标准多少年之后,真的真的不会让您有任何进展。

Well, you can always use DTD :-) except that DTD also prescribes ordering. Validation with "unordered" grammar is terribly expensive. You could play with xsd:choice and min and max occurs but it's probably going to balk as well. You could also write XSD extensions / derived schemas.

The way you posed the problem it looks like you don't really want XSD at all. You can just load it and then validate whatever minimum you want with XPaths, but just protesting against XSD, how many years after it became omni-present standard is really, really not going to get you anywhere.

潇烟暮雨 2024-09-18 20:20:57

RelaxNG 将简洁地解决这个问题,如果你能使用它的话。确定性不是模式的要求。您可以将 RNG 或 RNC 模式转换为 XSD,但在这种情况下它会近似。这是否足以供您使用取决于您。

这种情况的 RNC 模式是:

start = User
User = element User {
   attribute ID { xsd:unsignedByte },
   ( element GivenName { text } &
     element SurName { text } &
     element * - (SurName | GivenName) { any })
}

any = element * { (attribute * { text } | text | any)* }

any 规则匹配任何格式良好的 XML 片段。因此,这将要求 User 元素包含包含任意顺序文本的 GiveName 和 SurName 元素,并允许任何其他元素包含几乎任何内容。

RelaxNG will solve this problem succinctly, if you can use it. Determinism isn't a requirement for schemas. You can translate an RNG or RNC schema into XSD, but it will approximate in this case. Whether that's good enough for your use is up to you.

The RNC schema for this case is:

start = User
User = element User {
   attribute ID { xsd:unsignedByte },
   ( element GivenName { text } &
     element SurName { text } &
     element * - (SurName | GivenName) { any })
}

any = element * { (attribute * { text } | text | any)* }

The any rule matches any well-formed XML fragment. So this will require the User element to contain GivenName and SurName elements containing text in any order, and allow any other elements containing pretty much anything.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文