用 javascript 编写 xml 解析器的技术
已经有很多问题询问如何编写 xml 解析器,主要用于网站或其他应用程序。
还有其他教程被证明是有用的,包括:
http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery
但是,我正在尝试编写一个解析器文件格式 sbml(系统生物学标记语言):
规范 - http://sbml.org/Documents/Specifications
我一直在尝试对解析器进行硬编码,虽然它适用于我的情况,但它不适用于每个部分。
$(document).ready(function()
{
//alert("In function");
$.ajax({
type: "GET",
url: "sbml.xml",
dataType: "xml",
success: parseXml
});
});
function parseXml(xml) {
//alert("Xml loaded");
$("#output").append("Output loaded <br />" );
$(xml).find("model").each(function() {
$("#output").append("Found model <br />" );
//alert("Found model");
//alert($(this).attr("id"));
$(xml).find("listOfCompartments").each(function() {
//alert("Found list of compartments");
$("#output").append("List of Compartments found <br />" );
$.each($(this).children(), function() {
var id = $(this).attr("id");
var size = $(this).attr("size");
//alert("Id: " + id + ", Size: " + size);
$("#output").append("Compartment <br />" );
$("#output").append("Id: " + id + ", Size: " + size + "<br />");
});
});
});
}
由于规范相当大(8页)并且容易发生变化,是否有更好的方法为这种情况编写解析器?
是否可以创建一个包含所有可能节点的数组并循环遍历而不是对所有内容进行硬编码。这样会更有效率吗?
There have been many questions already asking at how to write an xml parser, mainly for a website or other applications..
There are also other tutorials that have proved useful including:
http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery
However, I am trying to write a parser for the file format sbml (systems biology markup language):
Specifications - http://sbml.org/Documents/Specifications
I have been trying to hardcode the parser and while it works for the case I have, it will not work for every section.
$(document).ready(function()
{
//alert("In function");
$.ajax({
type: "GET",
url: "sbml.xml",
dataType: "xml",
success: parseXml
});
});
function parseXml(xml) {
//alert("Xml loaded");
$("#output").append("Output loaded <br />" );
$(xml).find("model").each(function() {
$("#output").append("Found model <br />" );
//alert("Found model");
//alert($(this).attr("id"));
$(xml).find("listOfCompartments").each(function() {
//alert("Found list of compartments");
$("#output").append("List of Compartments found <br />" );
$.each($(this).children(), function() {
var id = $(this).attr("id");
var size = $(this).attr("size");
//alert("Id: " + id + ", Size: " + size);
$("#output").append("Compartment <br />" );
$("#output").append("Id: " + id + ", Size: " + size + "<br />");
});
});
});
}
As the specification is quite big (8 pages) and is prone to changing, is there a better way to write a parser for such a case?
Would it be possible to make an array of all the possible nodes and loop through rather than hardcoding everything. Would this be more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
除非别无选择,否则不要编写 XML 解析器。 XML 规范中有许多内容(例如参数实体、内部子集等)是您必须处理并且相当复杂的。所有语言总有优秀的解析器,您应该使用其中之一。
如果您自己编写它,您将编写一个仅实现部分规范的解析器。它将来肯定会崩溃,这只会给您和您的合作者带来问题。
更新:
区分解析和操作 DOM。您不想解析 XML,您希望浏览器为您做这件事(它会的)。您想要操作 DOM,也许使用 XPath。
更新:
我不是专家,但这是 MS 环境中解析器的一个相当新的示例。
其他浏览器也会有类似的解析器。
更新3:
“您是否为 CML 创建了解析器...”?并不真地。我在 1997 年参与了 XML 及其解析器的开发(Norbert Mikula、Tim Bary 等人)。事实上,由于解析 XML 很困难,我们重新设计了 XML。
XML 解析器创建 SAX 事件流或 DOM,理论上所有解析器都应该创建相同的内容。这称为信息集。它删除了 XML 中的所有语法变体(引用、CDATA、实体等)。它通常被称为 DOM。
我想你的意思是 - “如何将信息集变成专门适合我的应用程序的东西”?如果是这样,是的 - 我已经编写了大量代码来操作原始信息集。就我而言,它是创建 XML 元素的专门子类。因此我有 CMLMolecule、CMLAtom 等。其中的代码是 JUMBO (CMLXOM) https://bitbucket.org/wwmm/ cmlxom
这与(例如)MathML 和 SVG 采用的理念相同 - 它们有专门的子类。
这是相当多的工作——我使用了自动和手工方法。我不喜欢 W3CDom 作为基础,我建议使用 DOM,您可以在其中子类化 Element。但如果您打算编写权威的 SBML Javascript DOM,那么我不会阻止您。
我不久前在 Javascript 中为 CML 做过这个,但浏览器有不稳定的 DOM,我可能需要重新审视这个。这对于制作交互式图形几乎是必不可少的。
期待您的来信
Do not write an XML parser unless there is no alternative. There are many things in the XML spec (such as parameter entities, internal subsets, etc.) which you must tackle and are quite involved. There are always excellent parsers for all languages and you should use one of those.
If you write it yourself you will write a parser that only implements part of the spec. It will certainly break in the future and that will only cause problems to you and your collaborators.
UPDATE:
Distinguish between PARSING and manipulating the DOM. You do not want to parse the XML, you want the browser to do it for you (and it will). You want to manipulate the DOM, maybe with XPath.
UPDATE:
I am not an expert but here is a fairly recent example of a parser in a MS environment.
Other browsers will have similar parsers.
UPDATE3:
"did you create a parser for CML..."? Not really. I took part in the development of XML and its parsers in 1997 (Norbert Mikula, Tim Bary and others). In fact we redesigned XML as a result of the difficulty of parsing XML.
XML parsers create either a SAX event stream or a DOM and in theory all parsers should create the same. This is referred to as the Infoset. It has removed all the syntactic variations in XML (quoting, CDATA, entities, etc.). It is generally referred to as the DOM.
I think you mean - "how to I turn the infoset into something specialised for my application"? If so, yes - I have written extensive code to manipulate the raw infoset. In my case it is to create specialised subclasses of XML Elements. Thus I have CMLMolecule, CMLAtom, etc. The code in is JUMBO (CMLXOM) https://bitbucket.org/wwmm/cmlxom
This is the same philosophy as has been adopted by (say) MathML and SVG - they have specialised subclasses.
It's quite a lot of work - I have used both automatic and handcrafted approaches. I don't like the W3CDom as a base and I'd advise a DOM where you can subclass Element. But if you are intending to write the definitive SBML Javascript DOM then I would not discourage you.
I did do this for CML in Javascript some time ago but the browsers had flaky DOMs and I may need to revisit this. It's almost essential for doing interactive graphics.
Look forward to hearing from you
浏览器可以解析 XML,让它为您做这件事。浏览器 XML 解析可能是正确的,
那么你只需要和 dom 一起工作。
The browser can parse XML let it do it for you. The browser XML parse is probably correct,
then you just have to work with the dom.