如何确定给定 DTD 是否是另一个 DTD 的子集?
我需要验证“简化”DTD 确实是较大 DTD 的子集,即根据“简化”DTD 有效的文档根据较大(或“主”)DTD 也始终有效。
现在正在编写简化的 DTD——它源自主 DTD(如果是相反的话,可以简单地将较小的 DTD 包含到较大的 DTD 中)。
我如何确定简化 DTD 是否源自主 DTD?
I need to verify that a "simplified" DTD is really a subset of a larger DTD, I.e. that documents that are valid according to the "simplified" DTD will also always be valid according to the larger (or "master") DTD.
The simplified DTD is being written now -- it's derived from the master DTD (were it the other way around, one could simply include the smaller DTD into the larger one).
How would I be able to determine if the simplified DTD is derived from the master DTD?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
DTD 实际上只是伪装的上下文无关语法。语法 G 表示一组可能的合法字符串,这些字符串包含语法所表示的未声明语言 L(G)。
你问的问题相当于确定是否有G1和G2,L(G1)是否是L(G2)的子集。我的语言理论已经生锈了,我不记得这是否是一般可计算的,但我猜这真的很难,因为你必须证明 G1 中的任意推导总是在 G2 中具有推导。
您也许能够回答 G1 的结构是否可以通过证明 G1 的每个元素与 G2 的每个元素兼容来证明 L(G1) 是 L(G2) 的子集,本质上是这样的通过显示每个语法规则
G1 中的 G2 中也有相应的规则,但删除了元素。您区分 DTD 的想法似乎是沿着这条线的,但前提是如果差异很大,您将陷入一般问题而不是更简单的问题。至少你描述问题的方式(G2 源自主 DTD)我认为你有机会。
差异的目的是通过找到最小差异来识别兼容规则。
如果你有语法规则 g2 = A ;另一个你声称相关的 g1 = A 和
您想要检查的,
你首先必须证明 A 在 G1 中派生的字符串标记是超集
G2 中派生的标记 A 字符串的组成。这看起来就像比较两种语言的原始无约束问题;我们现在只是比较两条规则 g1 和 g2 的子语言。
所以现在我认为你必须坚持 g1 可到达的每个子规则在结构上与 g2 中的相应子规则兼容,才能使其切实可行。
我认为您可以编写一个递归过程来检查这一点。此过程最需要的帮助是您通常在 LALR 解析器生成器中找到的所有集合运算符(FirstOf,..)。
另一方面,我的公司制作了 Smart Differencer 工具,用于计算语言结构的增量语言元素以及对这些元素的编辑操作。它由语言定义参数化。 SmartDifference 目前适用于多种传统语言(C、C++、C#、COBOL、Java、PHP、Python...)。 XML(和 DTD)也是一种语言,我们为其提供了语言定义,并且我们构建了一个实验性的 XML Smart Differencer 工具。它应该可以很好地处理 DTD。如果您有进一步的直接兴趣,请离线联系我(参见简历)。
编辑:只是为了笑,我尝试了以下两个 DTD,一个派生自另一个:
orderform.xml:
和 orderform2.xml:
[看看你是否能发现差异首先是你自己:-)
并运行 XML SmartDifferencer:
是的,这就是我为获得派生的结果所做的事情。 (符号 NM 表示“N 行,M 列”)。
DTDs are really just context-free grammars in disguise. A grammar G represents the set of possible legal strings that comprise the unstated language L(G) that grammar represents.
What you are asking is tantamount to determining if you have G1 and G2, whether L(G1) is a subset of L(G2). My language theory is getting rusty and I don't remember if this is computable in general or not, but my guess this is really hard, because you have to demonstrate that an arbitrary derivation in G1 always has a derivation in G2.
You might be able to answer the question of whether G1 is structured in such a way that you can demonstrate that L(G1) is a subset of L(G2) by demonstrating that each element of G1 is compatible with each element of G2, essentialy by showing that each grammar rule
in G1 has a corresponding rule in G2 with elements dropped. Your idea of diffing DTDs seems to be along this line, with the proviso that if the the diffs are large you are stuck with the general problem rather than the simpler one. At least the way you've characterized the problem (G2 is derived from the master DTD) I think you have chance.
The purpose of the diff would be to identify compatible rules by finding the least differences.
If you have grammar rule g2 = A ; and another g1 = A that you'd claim are related and
that you'd like to check,
you'd first have to demonstrate that the string tokens that A derived in G1 is a superset
of the string of tokens A derived in G2. This looks just like the original unconstrained problem of comparing two langauges; we're now just comparing the sublanguages for the two rules g1 and g2.
So now I think you have to insist that each subrule reachable by g1 is compatible structurally with a corresponding subrule in g2 to make this practical.
I think you can probably write a recursive procedure to check this. What this procedure mostly needs as help is all the set operators (FirstOf, ..) that you tend to find in an LALR parser generator.
On a different front, my company makes Smart Differencer tools, that compute deltas over language constructs in terms of langauge elements and editing operations on those elements. It is parameterized by language definitions. The SmartDifference presently works for a variety of conventional languages (C, C++, C#, COBOL, Java, PHP, Python, ....). XML (and DTDs) are a language, too, for which we have a language definition, and we've built an experimental XML Smart Differencer tools. It ought to work on DTDs just fine. Contact me offline (see bio) if you have further direct interest.
EDIT: Just for grins, I tried the following two DTDs, one derived from the other:
orderform.xml:
and orderform2.xml:
[See if you can spot the differences yourself, first :-)
And ran the XML SmartDifferencer:
Yep, that's what I did to get the derived one. (The notation N.M means "line N, column M").