使用 Javascript 从 RSS XML 中提取 CDATA

发布于 2024-10-20 18:55:49 字数 3716 浏览 6 评论 0原文

我已经使用 JS 提取了 RSS 提要内容，但是“描述”节点包含 CDATA，我想将其拆分出来。

例如，对于 Item 下的每个 Description 节点，我只想提取从 Brief Description: 到第一个 的内容。 。

这可能吗？下面是我迄今为止所拥有的示例以及来自下面 RSS 提要的 xml。

希望有人可以提供帮助:)

脚本示例

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>

原文

I have extracted RSS feed content using JS, however the 'Description' node contains CDATA and I want to split this out.

For example, for each Description node under Item I would like to extract only the content that is from <b>Brief Description:</b> to the first </div> .

Is this possible? Below is an exmaple of what I have thus far and also the xml from the RSS feed below.

Hope someone can help :)

Script Example

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

停滞 2024-10-27 18:55:49

CDATA 部分内容只是文本，因此您无法使用 DOM 进一步解析其内容。您可以使用 DOMParser() 将 CDATA 部分的字符串内容重建回 XML 并使用其中的 DOM 方法，或者使用正则表达式。

要使用后一种方法，请将 document.write() 行更改为：

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

要使用前一种方法（在本例中不太理想，但在其他情况下可能会有所帮助），您可以这样做在 for 循环内：

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...但请确保仅在 DOM 内容加载后调用 media() 。

也许你有一些充分的理由，但根据你提供的代码，这样做会简单得多：

for (i=1; i<description.length; i++) {

...并忘记 a 和 b （即，将 b 更改为 i）

还有一个提示：如果您自己构建 RSS，请注意您将无法使用嵌套在 CDATA 节中的 CDATA 节。

CDATA section content is just text, so you can't parse its contents further using the DOM. You can either use DOMParser() to reconstruct the string contents of the CDATA section back into XML and use DOM methods from there, or else use regular expressions.

To use the latter approach, change your document.write() line to this:

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

To use the former approach, which is less than ideal in this case but could be helpful in other situations, you could do this inside the for loop:

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...but being sure to only invoke media() after the DOM content has loaded.

And maybe you have some good reason for it, but based on the code you supplied, it'd be a lot simpler just to do this: