当前位置：文江博客话题详情

不允许在 DTD 和 XSD 模式中声明非确定性元素的原因是什么？

发布于 2024-10-09 04:20:11 字数 190 浏览 12 评论 0原文

以下声明：

根据验证器和快速测试，及其 XSD 等效项均无效，因为它们不是确定性的检查规格。然而，由于每个非确定性有限自动机都有一个等效的确定性有限自动机，并且存在将 NFA 转换为 DFA 的算法，那么禁止非确定性声明的原因是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

递刀给你 2024-10-16 04:20:11

对于这样的问题，有两类可能的答案：技术性的和历史性的。

XML DTD 或 XSD 的确定性规则没有合理的技术理由。

正是由于确定性规则，两个 XSD 内容模型的交集和并集不能保证可以用合法的 XSD 内容模型进行描述。正是由于确定性规则，一些正则语言无法表示为内容模型。正是由于决定论规则，XSD 中的抽象类型和替换组远未发挥出使词汇表易于扩展的潜力。简而言之，确定性规则对 SGML、XML DTD 或 XSD 没有任何贡献，而是毫无意义的复杂化（并且，在 SGML 和 XSD 的情况下，是疯狂的术语：歧义实际上不是歧义，并且 >独特的粒子归因而不是强决定论——当你可以使用九个音节时为什么要使用五个音节？）。

这就留下了历史的答案。

在我发表这些言论之前，我要说，对于我在这段令人遗憾的历史中所扮演的角色，我向那些关心这些事情的人道歉。 1996 年，当我们设计 XML DTD 时，我试图说服工作组取消确定性规则，但失败了。 1998-2001 年，当我们设计 XSD 1.0 时，我试图说服 XML Schema WG 不要采用它们，但失败了。 2001-2012 年，当 XSD 1.1 逐渐（非常非常逐渐）形成时，我试图说服 WG 摆脱它们，但我再次失败了。对不起;我确实尝试过。

最初创建该规则（在 ISO 8879 中）的 SGML 工作组中的人员以及投票保留该规则的 XML 和 XML Schema 工作组中的人员在我多年来询问他们时提供了各种合理解释。

XML WG 中的一些人认为，确定性规则为解析器编写者的任务提供了有用的简化。愤世嫉俗者一再（私下）暗示，该规则的产生最初是因为 SGML WG 有影响力的成员无法弄清楚如何使回溯正确工作；该工作组的其他成员强烈否认了这一说法。（有趣的是，我记得在 XML WG 中，那些主张确定论可以简化解析器任务的人包括 James Clark，他批评了 chris 在回答这个问题时引用的消息。我希望他在为时已晚之前改变主意。）工作组中的其他人认为决定论规则是假的，但我们无法改变它，除非与 SGML 存在不可接受的不兼容性。

在 XML 架构工作组中，一些工作组成员告诉我，他们认为确定性是一个好主意，因为这意味着 XSD 验证器可以使用现有 DTD 验证器中的内容模型验证代码库。（事实！这表明了对代码重用的相当感人的信念，或者可能是过量服用致幻剂。）当时，许多工作组成员给我的印象是，他们觉得自己对这个问题的理解不够清楚，无法有独立的观点和看法。认为与 DTD 向后兼容比进行可能会产生他们无法预见的后果的更改更安全。（他们认为）如果后来证明没有必要或无益的话，很容易改变。（后来，在 XSD 1.1 的工作过程中，供应商抵制了任何消除决定论规则的尝试，就好像他们在抵制对其美德的攻击一样。“我们以后总是可以放松约束”。）

有些人（都在SGML 和 XSD WG）建议确定性规则很有用，因为它允许注释与内容模型中的特定位置相关联。在 XSD 案例中，这让我觉得这是一场失败的战斗——根据实例中的位置进行注释同样容易，并且现有的 XPath 基础设施和思想共享使得这是一个更可取的过程。在 SGML 情况下，该论点并不适用，因为 XPath 尚未发明，但在 SGML 中，内容模型的各个粒子上不允许使用注释，因此无论如何，这个想法都行不通。

不过，这个想法仍然存在，因为 XSD 架构作者能够向内容模型中的各个粒子添加 xs:annotation 元素。我已经询问了大约十四年，但没有找到任何人在生产模式中使用过此设施，任何人在实验测试模式中使用过此设施，或者任何听说过任何人在任何类型的模式中使用此设施的人根本不。我也没有发现任何人能够提供一个具体应用的连贯说明，这将是有帮助的。（作为回应，他们向我指出，我从未提供过铁定的证据，证明它在任何情况下都不会有所帮助，一式三份，并得到了五位软件工程教授的认可。他们是对的；我想我我只是懒惰。但我一直不明白为什么在 XSD 中包含愚蠢的东西的门槛如此之低，而消除它们的门槛却如此之高。）

我听过的唯一半点合理的论点来自于一些（极少数）深思熟虑的人。工作组成员（很少——好吧，一个）明确表示他们完全理解技术问题，但在他们想到的事务服务器部署场景中，验证速度至关重要，即使在模式预编译和缓存将是不可行的。因此，他们希望保留确定性约束，以避免 (a) 使用 NFA 而不是 DFA 进行验证的成本，以及 (b) 确定 NFA 的二次成本。我实际上并不认为这是一个令人信服的论点（为什么模式缓存对于事务服务器来说是不可能的，天哪？），但不可否认的是，制作它的人比我更了解事务服务器。

总而言之：SGML 发明决定论规则的原因已经消失在时间的迷雾中；工作组中没有两个成员讲述相同的故事，我的结论是，对于制定该规则的原因没有达成共识，只有很多个人原因。 XML DTD 保留了与 SGML 兼容的规则（合法的 SGML DTD 是合法的 XML 还不够，我们希望所有合法的 XML DTD 都是合法的 SGML —— WG 由SGML 人员组成）。 XSD 从 XML DTD 中获得了确定性规则，然后出于恐惧、不确定性和怀疑而保留了它，没有任何特殊原因。

叹。

There are two classes of possible answer to a question like this: technical and historical.

There is no sound technical reason for the determinism rules of XML DTDs or XSD.

It is because of the determinism rule that the intersection and union of two XSD content models are not guaranteed to be describable with a legal XSD content model. It is because of the determinism rule that some regular languages cannot be expressed as content models. It is because of the determinism rule that abstract types and substitution groups in XSD fall so far short of their potential in making vocabularies easily extensible. In short, the determinism rule contributes nothing to SGML, XML DTDs, or XSD but pointless complication (and, in the cases of SGML and XSD, crackpot termininology: ambiguity that is actually not ambiguity, and unique particle attribution instead of strong determinism -- why use five syllables when you can use nine?).

That leaves the historical answer.

I preface these remarks by saying that for my part in this sorry history I apologize to those who care about these things. I tried to persuade the WG to get rid of the determinism rules in 1996 when we were designing XML DTDs, and I failed. I tried to persuade the XML Schema WG not to adopt them in 1998-2001 when we were designing XSD 1.0, and I failed. I tried to persuade that WG to get rid of them in 2001-2012 when XSD 1.1 was gradually (very, very gradually) coming into being, and I failed again. Sorry; I did try.

Those in the SGML working group which originally created the rule (in ISO 8879) and those in the XML and XML Schema working groups who voted to retain the rule have offered a variety of rationalizations, when over the years I have asked them.

Some in the XML WG argued that the determinism rules offer a useful simplification of the parser-writer's task. Cynics have repeatedly suggested (in private) that the rule arose originally because influential members of the SGML WG couldn't figure out how to make backtracking work correctly; others in that WG have hotly denied the claim. (Interestingly, my recollection is that in the XML WG those who argued for determinism as simplifying the parser's task included James Clark, who criticizes the determinism rule in the message cited by chris in his answer to this question. I wish he had changed his mind before it was too late.) Others in the WG thought the determinism rule was bogus, but that we couldn't change it without an unacceptable incompatibility with SGML.

In the XML Schema WG, some WG members tell me they thought determinism was a good idea because it would mean XSD validators could use the content-model validation code base from existing DTD validators. (Truth! This suggests a rather touching faith in code reuse, or possibly excessive consumption of hallucinogens.) At the time, many WG members gave me the impression that they didn't feel they understood the issue clearly enough to have an independent view and thought that backward compatibility with DTDs would be safer than making a change which might have consequences they could not foresee. It would be easy (they thought) to change later if it proved unnecessary or unhelpful. (Later, during the work on XSD 1.1, vendors resisted any attempt to eliminate the determinism rule as if they were repelling an attack on their virtue. So much for "We can always relax the constraint later".)

Some people (both in the SGML and in the XSD WGs) have suggested the determinism rule is useful because it allows annotations to be associated with particular positions in the content model. In the XSD case, this strikes me as fighting a lost battle -- it is just as easy to annotate based on position in the instance, and the existing XPath infrastructure and mindshare makes that a far preferable course. In the SGML case, that argument doesn't apply since XPath hadn't yet been invented, but in SGML annotations are not allowed on individual particles of a content model, so the idea was a non-starter in any case.

This idea survives, though, in the ability of XSD schema authors to add xs:annotation elements to individual particles in a content model. I have been asking for fourteen years or so now without finding anyone who has used this facility in a production schema, anyone who has used this facility in an experimental test schema, or anyone who has heard of anyone using this facility in any kind of schema at all. Nor have I found anyone able to provide a coherent account of a concrete application in which it would be helpful. (They, in response, have pointed out to me that I have never provided an ironclad proof that it could never ever under any circumstances be helpful, in triplicate with endorsements from five full professors of software engineering. They're right; I guess I'm just lazy. But I have never understood why the bar for including stupidities in XSD was so low, and the bar for eliminating them so high.)

The only halfway plausible argument I have ever heard came from a few (very few) thoughtful WG members (very few -- well, one) who made clear that they understood the technical issues perfectly well, but that in the transaction-server deployment scenarios they had in mind, validation speed was essential even in situations where schema pre-compilation and caching would be infeasible. So they wanted to retain the determinism constraint so as to avoid (a) the cost of validating with an NFA instead of a DFA, and (b) the quadratic cost of determinizing an NFA. I don't actually think this is a compelling argument (why should schema caching be impossible for a transaction server, for heaven's sake?), but the person who made it undeniably knows more about transaction servers than I do.

In sum: SGML invented the determinism rule for reasons lost in the mists of time; no two members of the WG tell the same story, and I conclude that there was no consensus there on the reason to have the rule, only a lot of individual reasons. XML DTDs retained the rule for compatibility with SGML (it was not enough that legal SGML DTDs be legal XML, we wanted all legal XML DTDs to be legal SGML -- the WG was made up of SGML people). And XSD got the determinism rule from XML DTDs and then retained it for no particular reasons but fear, uncertainty, and doubt.

Sigh.

回复收藏 0 原文