如何从 xml 文件的 cdata 部分获取文本

发布于 2024-11-11 17:09:30 字数 431 浏览 5 评论 0原文

<text>
    <![CDATA[
        <img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_1.jpg" width="600" />
        <img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_2.jpg" width="600" />
    ]]>
</text>
</description>

这是我的 RSS feed，我想使用 sax 解析器从中获取描述。但我无法做到这一点，所以请帮助并建议我所有可能的方法来做到这一点提前致谢

原文

<text>
    <![CDATA[
        <img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_1.jpg" width="600" />
        <img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_2.jpg" width="600" />
    ]]>
</text>
</description>

this is my rss feed i want to fetch description from this by using sax parser .but am unable to do this so please help and suggest me all the possible way to do this
thanx in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

审判长 2024-11-18 17:09:30

CDATA 只是告诉解析器不要将尖括号视为 XML 标记。您可以像标签内的任何其他字符数据一样获取内容。因为你没有提到这里的Python：

import xml.sax
from cStringIO import StringIO

class Handler(xml.sax.handler.ContentHandler):
    def characters(self, content):
        print content

rss = '<text><![CDATA[<img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_1.jpg" width="600" /><img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_2.jpg" width="600" />]]></text>'

xml.sax.parse(StringIO(rss), Handler())

CDATA just tells the parser not to treat angle brackets as XML tags. You get the content just like any other character data inside a tag. Since you didn't mention anything here's Python:

import xml.sax
from cStringIO import StringIO

class Handler(xml.sax.handler.ContentHandler):
    def characters(self, content):
        print content

rss = '<text><![CDATA[<img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_1.jpg" width="600" /><img style="vertical-align: middle;" src="http://www.bjp.org/images/stories/economic_cell_2.jpg" width="600" />]]></text>'

xml.sax.parse(StringIO(rss), Handler())

回复收藏 0 原文

苹果你个爱泡泡 2024-11-18 17:09:30

不知道你想用哪种语言来解析。由于我只使用 C++ 工作，因此这里是使用 AX 解析器生成器编写的 CDATA 解析器：

std::string cdata;
auto cdata_rule = "<![CDATA[" & *(axe::r_any() - "]]>") >> cdata & "]]>";
// now do the parsing of input
cdata_rule(input.begin(), input.end());

// parse img elements
std::vector<std::string> sources; // all your img sources will be here
auto src_rule = "src=\"" & *(r_any() - '"') >> r_push_back(sources) & '"';
auto ignore = *(r_any() - "src=");
auto tail = *(r_any() - "/>") & "/>" & *r_any(" \t\n");
auto img_rule = *("<img & ignore & src_rule & tail);
auto result = img_rule(cdata.begin(), cdata.end());

免责声明：我没有测试上面的代码，可能会出现小错误。

Don't know which language you want to use for parsing. Since I'm working only in C++, here is a parser for CDATA written using AXE parser generator:

std::string cdata;
auto cdata_rule = "<![CDATA[" & *(axe::r_any() - "]]>") >> cdata & "]]>";
// now do the parsing of input
cdata_rule(input.begin(), input.end());

// parse img elements
std::vector<std::string> sources; // all your img sources will be here
auto src_rule = "src=\"" & *(r_any() - '"') >> r_push_back(sources) & '"';
auto ignore = *(r_any() - "src=");
auto tail = *(r_any() - "/>") & "/>" & *r_any(" \t\n");
auto img_rule = *("<img & ignore & src_rule & tail);
auto result = img_rule(cdata.begin(), cdata.end());

Disclaimer: I didn't test the code above, minor errors are possible.

回复收藏 0 原文

~没有更多了~