当 RSS 文档包含

代码:

using (XmlReader xmlr = XmlReader.Create(new StringReader(allXml)))
{
    var items = from item in SyndicationFeed.Load(xmlr).Items
        select item;
}

例外情况:

Exception: System.Xml.XmlException: Unexpected node type Element. 
   ReadElementString method can only be called on elements with simple or empty content. Line 11, position 25.
   at System.Xml.XmlReader.ReadElementString()
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFeed(XmlReader reader)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader)
   at System.ServiceModel.Syndication.SyndicationFeed.Load[TSyndicationFeed](XmlReader reader)
   at System.ServiceModel.Syndication.SyndicationFeed.Load(XmlReader reader)
   at Ionic.ToolsAndTests.ReadRss.Run() in c:\dev\dotnet\ReadRss.cs:line 90

XML 内容:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/styles/rss.xsl" media="screen"?><rss version="2.0" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
  <title>Software architecture, software engineering, and Renaissance Jazz</title>
  <link>https://www.ibm.com/developerworks/mydeveloperworks/blogs/gradybooch</link>
  <atom:link rel="self" type="application/rss+xml" href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/gradybooch/feed/entries/rss?lang=en" />
  <description>Software architecture, software engineering, and Renaissance Jazz</description>
  <language>en-us</language>
  <copyright>Copyright <script type='text/javascript'> document.write(blogsDate.date.localize (1273534889181));</script></copyright>
  <lastBuildDate>Mon, 10 May 2010 19:41:29 -0400</lastBuildDate>

如您所见,在第 11 行第 25 处, 元素内有一个脚本块。

其他 人们报告了其他 XML 文档的类似错误

我解决这个问题的方法是执行 StreamReader.ReadToEnd,然后对其结果执行 Regex.Replace 以拉出脚本块,然后再执行 将修改后的字符串传递给 XmlReader.Create()。感觉就像黑客。


  1. 有人有更好的方法吗?我不喜欢这个,因为我必须将 125k 字符串读入内存。

  2. 包含像这样的“复杂内容”(元素内的脚本块)是否有效?

The code:

using (XmlReader xmlr = XmlReader.Create(new StringReader(allXml)))
{
    var items = from item in SyndicationFeed.Load(xmlr).Items
        select item;
}

The exception:

Exception: System.Xml.XmlException: Unexpected node type Element. 
   ReadElementString method can only be called on elements with simple or empty content. Line 11, position 25.
   at System.Xml.XmlReader.ReadElementString()
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFeed(XmlReader reader)
   at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader)
   at System.ServiceModel.Syndication.SyndicationFeed.Load[TSyndicationFeed](XmlReader reader)
   at System.ServiceModel.Syndication.SyndicationFeed.Load(XmlReader reader)
   at Ionic.ToolsAndTests.ReadRss.Run() in c:\dev\dotnet\ReadRss.cs:line 90

The XML content:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/roller-ui/styles/rss.xsl" media="screen"?><rss version="2.0" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
  <title>Software architecture, software engineering, and Renaissance Jazz</title>
  <link>https://www.ibm.com/developerworks/mydeveloperworks/blogs/gradybooch</link>
  <atom:link rel="self" type="application/rss+xml" href="https://www.ibm.com/developerworks/mydeveloperworks/blogs/gradybooch/feed/entries/rss?lang=en" />
  <description>Software architecture, software engineering, and Renaissance Jazz</description>
  <language>en-us</language>
  <copyright>Copyright <script type='text/javascript'> document.write(blogsDate.date.localize (1273534889181));</script></copyright>
  <lastBuildDate>Mon, 10 May 2010 19:41:29 -0400</lastBuildDate>

As you can see, on line 11, at position 25, there's a script block inside the <copyright> element.

Other people have reported similar errors with other XML documents.

The way I worked around this was to do a StreamReader.ReadToEnd, then do Regex.Replace on the result of that to yank out the script block, before
passing the modified string to XmlReader.Create(). Feels like a hack.


  1. Has anyone got a better approach? I don't like this because I have to read in a 125k string into memory.

  2. Is it valid rss to include "complex content" like that - a script block inside an element?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

掀纱窥君容 2024-09-08 16:48:46

您可以子类化 XmlTextReader 并重写 ReadElementString 以在读取时跳过或修改有问题的元素。仍然感觉像是黑客,但至少避免了使用正则表达式进行预处理。

这是一个完成工作的简单实现:


class BrokenFeedXmlReader : XmlTextReader 
{
    // Additional XmlTextReader constructors can be added in 
    // similar fashion as needed
    public BrokenFeedXmlReader(TextReader input)
        : base(input)
    {
    }

    public override string ReadElementString()
    {
        if ("copyright" == Name)
        {
            base.Skip();
            return String.Empty; 
        }

        return base.ReadElementString();
    }            
}

您的示例代码将如下所示:


using (XmlReader xmlr = new BrokenFeedXmlReader(new StringReader(allXml)))
{
    var items = from item in SyndicationFeed.Load(xmlr).Items
                select item;
} 

You can subclass XmlTextReader and override ReadElementString to skip or modify the offending element as it's being read. Still feels like a hack but at least avoids the pre-processing with regex.

Here's a simple implementation that gets the job done:


class BrokenFeedXmlReader : XmlTextReader 
{
    // Additional XmlTextReader constructors can be added in 
    // similar fashion as needed
    public BrokenFeedXmlReader(TextReader input)
        : base(input)
    {
    }

    public override string ReadElementString()
    {
        if ("copyright" == Name)
        {
            base.Skip();
            return String.Empty; 
        }

        return base.ReadElementString();
    }            
}

Your example code would then look something like this:


using (XmlReader xmlr = new BrokenFeedXmlReader(new StringReader(allXml)))
{
    var items = from item in SyndicationFeed.Load(xmlr).Items
                select item;
} 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文