如何使用 XmlSerializer 获取 XML 元素的内容?
我有一个关于此 XML 字符串的 XML 阅读器:
<?xml version="1.0" encoding="UTF-8" ?>
<story id="1224488641nL21535800" date="20 Oct 2008" time="07:44">
<title>PRESS DIGEST - PORTUGAL - Oct 20</title>
<text>
<p> LISBON, Oct 20 (Reuters) - Following are some of the main
stories in Portuguese newspapers on Monday. Reuters has not
verified these stories and does not vouch for their accuracy. </p>
<p>More HTML stuff here</p>
</text>
</story>
我创建了一个 XSD 和一个用于反序列化的相应类。
[System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
public class story {
[System.Xml.Serialization.XmlAttributeAttribute()]
public string id;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string date;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string time;
public string title;
public string text;
}
然后,我使用 XmlSerializer 的 Deserialize
方法创建该类的实例。
XmlSerializer ser = new XmlSerializer(typeof(story));
return (story)ser.Deserialize(xr);
现在,story
的 text
成员始终为 null。 如何更改我的 story
类以便按预期解析 XML?
编辑:
使用 XmlText 不起作用,并且我无法控制正在解析的 XML。
I have an XML reader on this XML string:
<?xml version="1.0" encoding="UTF-8" ?>
<story id="1224488641nL21535800" date="20 Oct 2008" time="07:44">
<title>PRESS DIGEST - PORTUGAL - Oct 20</title>
<text>
<p> LISBON, Oct 20 (Reuters) - Following are some of the main
stories in Portuguese newspapers on Monday. Reuters has not
verified these stories and does not vouch for their accuracy. </p>
<p>More HTML stuff here</p>
</text>
</story>
I created an XSD and a corresponding class for deserialization.
[System.Xml.Serialization.XmlRootAttribute(Namespace="", IsNullable=false)]
public class story {
[System.Xml.Serialization.XmlAttributeAttribute()]
public string id;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string date;
[System.Xml.Serialization.XmlAttributeAttribute()]
public string time;
public string title;
public string text;
}
I then create an instance of the class using the Deserialize
method of XmlSerializer.
XmlSerializer ser = new XmlSerializer(typeof(story));
return (story)ser.Deserialize(xr);
Now, the text
member of story
is always null. How do I change my story
class so that the XML is parsed as expected?
EDIT:
Using an XmlText does not work and I have no control over the XML I'm parsing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我发现了一个非常令人不满意的解决方案。
像这样更改类(呃!)
并像这样更改调用代码(恶心!)
这是一种非常糟糕的方法,因为它破坏了封装。 有更好的方法吗?
I found a very unsatisfactory solution.
Change the class like this (ugh!)
And change the calling code like this (yuck!)
This is a really bad way of doing it because it breaks encapsulation. Is there a better way of doing it?
如果文本标签只包含 p 标签,我将提出的建议如下,它在短期内可能有用。
您可以将故事作为字符串数组,而不是将文本字段作为字符串。 然后,您可以使用正确的 XmlArray 属性(无法记住确切的名称,例如 XmlArrayItemAttribute)和正确的参数,使其看起来像:
这更近了一步,但不完全是您所需要的。
另一种选择是创建一个类似的类:
再次使用 XmlArray 属性使其看起来正确,不确定它们是否可配置,因为我之前只将它们用于简单类型。
编辑:
使用:
与提供的 XML 配合良好,但拥有该类似乎有点复杂。 它最终的结果类似于:
这显然不是我们想要的。
The suggestion that I was going to make if the text tag only ever contained p tags was the following, it may be useful in the short term.
Instead of story having the text field as a string, you could have it as an array of strings. You could then use the right XmlArray attributes (can't remember the exact names, something like XmlArrayItemAttribute), with the right parameters to make it look like:
Which is a step closer, but not completely what you need.
Another option is to make a class like:
And again use the XmlArray attributes to get it to look right, not sure if they are as configurable as that because I've only used them for simple types before.
Edit:
Using:
Works well with the supplied XML, but having the class seems a little more complicated. It ends up as something similar to:
which is obviously not what is desired.
您可以为您的类实现 IXmlSerialized 并处理其中的内部元素,这意味着您将用于反序列化数据的代码保留在目标类中(从而避免封装问题)。 这是一种足够简单的数据类型,编写代码应该很简单。
You could implement
IXmlSerializable
for your class and handle the inner elements there, this means that you keep the code for deserializing your data inside the target class (thus avoiding your problem with encapsulation). It's a simple enough data type that the code should be trivial to write.我认为 XML 不正确。
由于您在文本标签内使用 HTML 标签,因此 HTML 标签将被解释为 XML。
您应该使用 CDATA 来正确解释数据或转义 < 和>。
Looks to me that the XML is incorrect.
Since you use HTML tags within the text tag the HTML tags are interpreted as XML.
You should use CDATA to correctly interpret the data or escape < and >.
由于您无法控制 XML,因此您可以使用 StreamReader。
XmlReader 将 HTML 标签解释为 XML,这不是您想要的。
然而,XmlSerializer 将去除文本标记内的 HTML 标记。
Since you do not have control over the XML you could use StreamReader instead.
XmlReader interprets the HTML tags as XML which is not what you want.
XmlSerializer will however strip the HTML tags within the text tag.
也许使用 XmlAnyElement 属性而不是处理UnknownElement 事件可能更优雅。
Perhaps using the XmlAnyElement attribute instead of handling the UnknownElement event may be more elegant.
您是否尝试过xsd.exe? 它允许您从 xml 文档创建 xsd,然后从 xsd 生成适合 xml 反序列化的类。
Have you tried xsd.exe? It allows you to create xsd's from xml doc's and then generate classes from the xsd that should be ripe for xml deserialization.
使用 XSD.exe 从 XML 生成 XSD,然后将 XSD 生成类后,我遇到了同样的问题。 我在生成的类文件中的对象类之前添加了一个 [XmlText] 标记(在我的例子中称为 P,因为它推断为 XML 节点的
标记)并且它有效即刻。 拉入父节点内的完整 HTML 内容并放入该 P 对象,然后我将其重命名为更有用的内容。
I encountered this same issue after using XSD.exe to generate XSD from XML and then XSD to classes. I added an [XmlText] tag before the class of the object in the generated class file (called P in my case because of the
<p>
tag it was inferring as an XML node) and it worked instantly. pulling in the complete HTML content that was inside the parent node and putting in that P object, which I then renamed to something more useful.