sax 解析器如何根据 dtd 进行验证?
我有一个 xml 文件和一个定义的 dtd。我对 sax 解析器的理解是它处理事件而不是将整个 xml 文档(如 DOM)存储在内存中。比如说,我有一个 xml 文件,其声明类似于 <名称> ... // 这里大约有 200 万行 < /名称> .. 那么,在这种情况下 sax 解析器将在内存中存储什么?它如何知道结束标记名称将会出现。现在真正的问题是,sax 解析器如何针对 dtd 进行验证?我并不是在寻找深入的解释,而是在寻找有关验证如何发生的一般概念。
I have a xml file and a dtd defined. My understanding of a sax parser is it processes events instead of storing the entire xml document (like DOM) in memory. Say, I have a xml file with declaration like < name> ... // some 2 million lines here < /name> .. So, what will the sax parser store in memory in this case? How does it know that the end-tag name will occur. And now the real question, how does a sax parser validate against a dtd ? I am not looking for an in-depth explanation but just the general idea on how validation occurs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,DTD 被转换为一组有限状态自动机 - 有一个标准算法用于将 BNF 语法转换为确定性 FSA,该算法可以在 Aho 和 Ullmann 等编译器教科书中找到。这将为每个元素的内容模型生成一个 FSA。因此,解析/验证的当前状态由为每个打开元素保存一个 FSA(及其当前状态)的堆栈表示。当解析器遇到开始标记时,它会检查该开始标记是否表示最顶层 FSA 中的有效转换,并通过进行此转换来更改该 FSA 中的当前状态;它还将新的FSA添加到与新元素的内容模型的FSA对应的堆栈中。当它看到结束标记时,它会检查最顶层 FSA 的当前状态是否为最终状态,并将该 FSA 从堆栈中弹出。
Typically the DTD is converted into a set of finite state automata - there's a standard algorithm for converting a BNF grammar to a deterministic FSA which is found in compiler textbooks such as Aho and Ullmann. This will produce one FSA for the content model of each element. The current state of parsing/validation is thus represented by a stack holding one FSA (with its current state) for each open element. When the parser encounters a start tag, it checks whether that start tag represents a valid transition in the topmost FSA, and changes the current state in that FSA by making this transition; it also adds a new FSA to the stack corresponding to the FSA for the content model of the new element. When it sees an end tag, it checks whether the current state of the topmost FSA is a final state, and pops this FSA off the stack.