使用 Javascript 从 RSS XML 中提取 CDATA
我已经使用 JS 提取了 RSS 提要内容,但是“描述”节点包含 CDATA,我想将其拆分出来。
例如,对于 Item 下的每个 Description 节点,我只想提取从 Brief Description:
到第一个 的内容。
。
这可能吗?下面是我迄今为止所拥有的示例以及来自下面 RSS 提要的 xml。
希望有人可以提供帮助:)
脚本示例
<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
function media(){
description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;
for (i=0;i<18;i++)
{
document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');
b++;
a++;
};
};
</SCRIPT>
RSS XML FEED
<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
<channel>
<title>XML Playground: Media News</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
<description>RSS feed for the Media News list.</description>
<lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
<generator>Windows SharePoint Services V3 RSS Generator</generator>
<ttl>60</ttl>
<image>
<title>XML Playground: Media News</title>
<url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
<link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
</image>
<item>
<title>new Item</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
<author>WALKER,Andrew</author>
<pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
<guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
</item>
<item>
<title>My School 2.0 launched</title>
<link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
<pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
<guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
</item>
</channel>
</rss>
I have extracted RSS feed content using JS, however the 'Description' node contains CDATA and I want to split this out.
For example, for each Description node under Item I would like to extract only the content that is from <b>Brief Description:</b>
to the first </div>
.
Is this possible? Below is an exmaple of what I have thus far and also the xml from the RSS feed below.
Hope someone can help :)
Script Example
<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
function media(){
description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;
for (i=0;i<18;i++)
{
document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');
b++;
a++;
};
};
</SCRIPT>
RSS XML FEED
<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
<channel>
<title>XML Playground: Media News</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
<description>RSS feed for the Media News list.</description>
<lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
<generator>Windows SharePoint Services V3 RSS Generator</generator>
<ttl>60</ttl>
<image>
<title>XML Playground: Media News</title>
<url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
<link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
</image>
<item>
<title>new Item</title>
<link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
<author>WALKER,Andrew</author>
<pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
<guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
</item>
<item>
<title>My School 2.0 launched</title>
<link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
<description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
<pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
<guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
</item>
</channel>
</rss>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
CDATA 部分内容只是文本,因此您无法使用 DOM 进一步解析其内容。您可以使用 DOMParser() 将 CDATA 部分的字符串内容重建回 XML 并使用其中的 DOM 方法,或者使用正则表达式。
要使用后一种方法,请将
document.write()
行更改为:要使用前一种方法(在本例中不太理想,但在其他情况下可能会有所帮助),您可以这样做在 for 循环内:
...但请确保仅在 DOM 内容加载后调用
media()
。也许你有一些充分的理由,但根据你提供的代码,这样做会简单得多:
...并忘记 a 和 b (即,将 b 更改为 i)
还有一个提示:如果您自己构建 RSS,请注意您将无法使用嵌套在 CDATA 节中的 CDATA 节。
CDATA section content is just text, so you can't parse its contents further using the DOM. You can either use
DOMParser()
to reconstruct the string contents of the CDATA section back into XML and use DOM methods from there, or else use regular expressions.To use the latter approach, change your
document.write()
line to this:To use the former approach, which is less than ideal in this case but could be helpful in other situations, you could do this inside the for loop:
...but being sure to only invoke
media()
after the DOM content has loaded.And maybe you have some good reason for it, but based on the code you supplied, it'd be a lot simpler just to do this:
...and forget about a and b (i.e., change b to i)
And one tip: if you construct the RSS yourself, note that you won't be able to use CDATA sections nested within CDATA sections.