(FInite State Machine) - 在 javascript 中实现 XML 模式验证器
我现在已经在一个项目上工作了一个月左右,用 javascript 开发 XML 验证器 (XSD)。我已经非常接近了,但仍然遇到问题。
我唯一做得好的是将模式结构规范化为我存储在 DOM 中的 FSA。我尝试了多种方法来根据 FSA 验证我的 xml 结构,但每次都失败了。
验证器用于运行客户端 WYSIWYG XML 编辑器,因此它必须满足以下要求
- 必须高效(即使对于复杂模型,验证元素子节点模式也需要 <15 毫秒)
- 必须公开验证后架构信息集 (PSVI)可以查询它以确定可以在文档的各个点插入/删除哪些元素,并且仍然保持文档有效。
- 必须能够验证 xml 子节点结构,如果无效,则返回预期内容或意外内容。
-- 更多信息 请考虑以下示例 --
首先,我将模式结构转换为通用 FSA 表示形式,规范化 xs:group 和 xs:import 等与命名空间相关的内容。例如,考虑:
<xs:group name="group1">
<xs:choice minOccurs="2">
<xs:element name="e2" maxOccurs="3"/>
<xs:element name="e3"/>
</xs:choice>
</xs:group>
<xs:complexType>
<xs:seqence>
<xs:element name="e1"/>
<xs:group ref="group1"/>
</xs:sequence>
<xs:complexType>
将转换为类似的通用结构:
<seq>
<e name="e" minOccurs="2"/>
<choice minOccurs="2">
<e name="e2" maxOccurs="3"/>
<e name="e3"/>
</choice>
</seq>
我通过 XQuery 和 XSLT 在所有服务器端执行此操作。
我第一次尝试构建验证器是使用 JavaScript 中的递归函数。在此过程中,如果我发现可能存在的内容,我会将其添加到全局 PSVI 中,表明它可以添加到层次结构中的指定点。
我的第二次尝试是迭代的,速度更快,但两次都遇到了同样的问题。
这两种方法都可以正确地验证简单的内容模型,但是一旦模型变得更加复杂并且非常嵌套,它们就会失败。
我认为我是从完全错误的方向来解决这个问题的。据我所知,大多数 FSA 都是通过将状态推送到堆栈来处理的,但我不确定在我的情况下如何执行此操作。
我需要有关以下问题的建议:
- 状态机是正确的解决方案吗?它会实现顶部所述的目标吗?
- 如果使用状态机,将架构结构转换为 DFA 的最佳方法是什么?汤普森算法?我是否需要优化 DFA 才能使其发挥作用?
- 在 javascript 中实现这一切的最佳方法(或最有效的方法)是什么(注意优化,并且预处理可以在服务器上完成)
谢谢,
凯西
其他编辑:
我一直在研究教程在这里: http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx 专注于正则表达式。它似乎与我需要的非常相似,但专注于为正则表达式构建解析器。这带来了一些有趣的想法。
我认为 xml 模式只分解为几个运算符:
序列 ->连接
选择->联盟
minOccurs/maxOccurs - 可能需要的不仅仅是 Kleene Closure,不完全确定表示此运算符的最佳方式。
I have been working on a project for a month or so now to develop a XML validator (XSD) in javascript. I have gotten really close but keep running into problems.
The only thing I have working well is normalizing schema structures into FSA that I store in the DOM. I have tried several methods to validate my xml structures against the FSA and come short each time.
The validator is being used to run a client side WYSIWYG XML editor so it has to meet the following requirements
- Must be efficient ( < 15ms to validate an element child node pattern even with complex models)
- Must expose a Post Validation Schema Infoset (PSVI) which can be queried to determine what elements can be inserted/removed from the document at various points and still keep the document valid.
- Must be able to validate a xml child node structure and if invalid return what content was EXPECTED or what content is UNEXPECTED.
-- More info Consider the following example--
First I convert schema structures to a general FSA representation normalizing out things like xs:group and xs:import with respect to namespaces. For instance consider:
<xs:group name="group1">
<xs:choice minOccurs="2">
<xs:element name="e2" maxOccurs="3"/>
<xs:element name="e3"/>
</xs:choice>
</xs:group>
<xs:complexType>
<xs:seqence>
<xs:element name="e1"/>
<xs:group ref="group1"/>
</xs:sequence>
<xs:complexType>
Would be converted into a similar generalized structure:
<seq>
<e name="e" minOccurs="2"/>
<choice minOccurs="2">
<e name="e2" maxOccurs="3"/>
<e name="e3"/>
</choice>
</seq>
I do this all server side through XQuery and XSLT.
My first attempt at building a validator was with recursive functions in javascript. Along the way if I found content that could exist I would add it to a global PSVI signaling that it could be added at a specified point in the hierarchy.
My second attempt was iterative, and was much faster but both of these suffered from the same problem.
Both of these could correctly validate simple content models, but as soon as the models became more complex and very nested they failed.
I am thinking that I am approaching this problem from the completely wrong direction. From what I have read most FSA's are processed by pushing states to a stack, but I am not sure how to do this in my situation.
I need advice on the following questions:
- Is a state machine the right solution here, will it acomplish the goals stated at the top.?
- If using a state machine whats the best method to convert the schema structure to DFA? Thompson algorithm? Do I need to optimize the DFA for this to work.
- Whats the best way (or most efficient way) to implement this all in javascript (Note optimizations, and pre-processing can be done on the server)
Thanks,
Casey
Additional Edits:
I have been looking at the tutorial here: http://www.codeproject.com/KB/recipes/OwnRegExpressionsParser.aspx focused on regular expressions. It seems to be very similar to what I need but focused on building a parser for regex. This brings up some interesting thoughts.
I am thinking that xml schema breaks down into only a few operators:
sequence -> Concatination
choice -> Union
minOccurs/maxOccurs - Probably need more than Kleene Closure, not totally sure the best way to represent this operator.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当我经历同样的学习过程时,我发现我需要花一些时间学习有关编译器编写的书籍(例如 Aho & Ullman)。构建有限状态机来实现语法是标准的教科书内容;它并不容易或直观,但它在文献中进行了彻底的描述 - 也许数字 minOccurs/maxOccurs 除外,这在典型的 BNF 语言语法中不会出现,但 Thompson 和 Tobin 很好地涵盖了。
When I was going through the same learning process I found that I needed to spend some time studying books on compiler-writing (for example Aho & Ullman). The construction of a finite state machine to implement a grammar is standard textbook stuff; it's not easy or intuitive, but it is thoroughly described in the literature - except perhaps for numeric minOccurs/maxOccurs, which don't occur in typical BNF language grammars, but are well covered by Thompson and Tobin.