如何使 XmlDocument 能够处理没有引用属性的 XML?
我有一个 asp.net vb 项目,需要解析来自数据库的一些原始 XML,XML 的布局如下:
<HTML><HEAD><TITLE></TITLE></HEAD><BODY><STRONG><A name=SN>AARTS</A>, <A name=GN>Michelle Marie</A>, </STRONG><A name=HO>B.Sc.</A>, <A name=HO>M.Sc.</A>, <A name=HO>Ph.D.</A>; <A name=OC>scientist, professor</A>; b. <A name=BC>St. Marys</A>, Ont. <A name=BY>1970</A>; <A name=PA>d. Wm. and H. Aarts</A>; <A name=ED>e. Univ. of Western Ont. B.Sc.(Hons.) 1994, M.Sc. 1997</A>; <A name=ED>McGill Univ. Ph.D. 2002</A>; <A name=MA>m. L. MacManus</A>; two children; <A name=PO>CANADA RESEARCH CHAIR IN SIGNAL TRANSDUCTION IN ISCHEMIA</A> and <A name=PO>ASST. PROF., DEPT. OF BIOL. SCI., UNIV. OF TORONTO SCARBOROUGH 2006– </A>; Postdoctoral Fellow, Toronto Western Hosp. 2000–06; Expert Cons., Auris Med. SAS, Montpellier, France; mem., Centre for the Neurobiol. of Stress; named INMHA Brainstar of the Year 2003; Bd. of Dirs. & Fundraising Chair, N'Sheemaehn Childcare; mem., Soc. for Neurosci.; Cdn. Physiol. Soc.; Cdn. Assn. for Neurosci.; <A name=WK>co-author: 'Therapeutic Tools in Brain Damage' in <EM>Proteomics and Protein Interactions: Biology, Chemistry, Bioinformatics and Drug Design </EM>2005; 18 pub. journal articles</A>; Office: <A name=OF1_L1>1265 Military Trail</A>, <A name=OF1_CT>Scarborough</A>, <A name=OF1_PR>Ont.</A> <A name=OF1_PC>M1C 1A4</A>. </BODY></HTML>
我使用的背后的代码是这样的,
Dim FullBio As New System.Xml.XmlDocument
Dim NodeList As System.Xml.XmlNodeList
Dim Node As System.Xml.XmlNode
FullBio.LoadXml(bio.Item(11))
NodeList = FullBio.SelectNodes("a")
For Each Node In NodeList
Dim name = Node.Attributes.GetNamedItem("name").Value()
lblEducation.Text = lblEducation.Text + name.ToString() + Node.InnerText + "<br />"
Next
所以将 XML 加载到 Xml 文档中
FullBio.LoadXml(bio.Item(11))is the XML I provided at the top. I am getting this error message:
'SN' is an unexpected token. The expected token is '"' or '''. Line 1, position 49.
我知道错误是因为属性没有被引用。无论如何,有没有办法让 XmlDocument 理解属性,或者在将字符串加载到 xmldoc 之前使用 reg 表达式向属性添加引号的简单方法?
I have an asp.net vb project that needs to parse some raw XML that is coming out of a database the XML is laid out like this:
<HTML><HEAD><TITLE></TITLE></HEAD><BODY><STRONG><A name=SN>AARTS</A>, <A name=GN>Michelle Marie</A>, </STRONG><A name=HO>B.Sc.</A>, <A name=HO>M.Sc.</A>, <A name=HO>Ph.D.</A>; <A name=OC>scientist, professor</A>; b. <A name=BC>St. Marys</A>, Ont. <A name=BY>1970</A>; <A name=PA>d. Wm. and H. Aarts</A>; <A name=ED>e. Univ. of Western Ont. B.Sc.(Hons.) 1994, M.Sc. 1997</A>; <A name=ED>McGill Univ. Ph.D. 2002</A>; <A name=MA>m. L. MacManus</A>; two children; <A name=PO>CANADA RESEARCH CHAIR IN SIGNAL TRANSDUCTION IN ISCHEMIA</A> and <A name=PO>ASST. PROF., DEPT. OF BIOL. SCI., UNIV. OF TORONTO SCARBOROUGH 2006– </A>; Postdoctoral Fellow, Toronto Western Hosp. 2000–06; Expert Cons., Auris Med. SAS, Montpellier, France; mem., Centre for the Neurobiol. of Stress; named INMHA Brainstar of the Year 2003; Bd. of Dirs. & Fundraising Chair, N'Sheemaehn Childcare; mem., Soc. for Neurosci.; Cdn. Physiol. Soc.; Cdn. Assn. for Neurosci.; <A name=WK>co-author: 'Therapeutic Tools in Brain Damage' in <EM>Proteomics and Protein Interactions: Biology, Chemistry, Bioinformatics and Drug Design </EM>2005; 18 pub. journal articles</A>; Office: <A name=OF1_L1>1265 Military Trail</A>, <A name=OF1_CT>Scarborough</A>, <A name=OF1_PR>Ont.</A> <A name=OF1_PC>M1C 1A4</A>. </BODY></HTML>
And the code behind I'm using is this
Dim FullBio As New System.Xml.XmlDocument
Dim NodeList As System.Xml.XmlNodeList
Dim Node As System.Xml.XmlNode
FullBio.LoadXml(bio.Item(11))
NodeList = FullBio.SelectNodes("a")
For Each Node In NodeList
Dim name = Node.Attributes.GetNamedItem("name").Value()
lblEducation.Text = lblEducation.Text + name.ToString() + Node.InnerText + "<br />"
Next
So the XML loaded into the Xml Document at
FullBio.LoadXml(bio.Item(11))
is the XML I provided at the top. I am getting this error message:
'SN' is an unexpected token. The expected token is '"' or '''. Line 1, position 49.
I know that the error is because the attributes are not quoted. Is there anyway to make XmlDocument understand the attributes anyway or an easy way to use a reg expression to add quotes to the attributes before loading the string into the xmldoc?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您拥有的是无效的 XML。 XmlDocument 期望输入是有效的 XML。我建议您使用 HTML 解析器,例如 Html Agility Pack 来解析 HTML(这就是您所拥有的)作为输入)。例如,如果您想列出所有锚点的所有
name
属性值,就这么简单:What you have is invalid XML. An XmlDocument expects that the input is valid XML. I would recommend you using an HTML parser such as Html Agility Pack in order to parse HTML (which is what you have as input). So for example if you wanted to list all
name
attribute values for all anchors it's as simple as that:我会编写一些逻辑来在属性值周围插入引号。如果 XML 格式不正确,则加载文档时会出现错误。
您可以使用 Html2Xhtml 库来实现此目的。这是一个链接:
http://corsis.sourceforge.net/index.php/Html2Xhtml
并且您应该能够使用该库将内容放入 XDocument 中,如下所示:
我相信 Html2Xhtml 支持 .NET 2.0 框架及更高版本,如果没有,我很确定以前的版本会,但如果没有,您可以使用:
http://www.codeproject .com/KB/XML/HTML2XHTML.aspx
本文使用 HTML Tidy,本文中的源代码应在 2.0 中运行。
I would write some logic to insert quotes around the attribute values. The document will load with errors if the XML isn't properly formatted.
You can use the Html2Xhtml library for this. Here is a link:
http://corsis.sourceforge.net/index.php/Html2Xhtml
And you should be able to use the library to put the contents into an XDocument, like this:
I believe that Html2Xhtml supports .NET 2.0 framework and above, and if not I'm pretty sure that one of the previous versions will, but if not you can use this:
http://www.codeproject.com/KB/XML/HTML2XHTML.aspx
This article uses HTML Tidy, and the source code from this article should work in 2.0.
你也可以尝试 SgmlReader,非常适合此类问题。
Yuo can also try SgmlReader, great for this kind of problem.