当前位置：文江博客话题详情

强制 feedparser 清理所有内容类型

发布于 2025-01-07 02:30:02 字数 261 浏览 1 评论 0原文

对于一个项目，我想使用 feedparser。基本上我已经成功了。

在文档有关清理的部分中描述了并非所有内容类型都被清理。如何强制 feedparser 对所有内容类型执行此操作？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

思念绕指尖 2025-01-14 02:30:02

我认为您引用的 feedparser 文档页面提供了很好的建议：

*建议您检查例如条目[i].summary_detail.type 中的内容类型。如果它是文本/纯文本，那么它还没有被清理（并且您应该在渲染内容之前执行 HTML 转义）。*

import cgi
import feedparser

d = feedparser.parse('http://rss.slashdot.org/Slashdot/slashdot')

# iterate through entries. If the type is not text/html, HTML clean it
for entry in d.entries:
    if entry.summary_detail.type != 'text/html':
        print cgi.escape(entry.summary)
else:
    print entry.summary

当然，您可以通过多种方式迭代条目，具体取决于您一次想要对它们执行什么操作他们很干净。

I think the feedparser doc page you referenced gives good advice:

*It is recommended that you check the content type in e.g. entries[i].summary_detail.type. If it is text/plain then it has not been sanitized (and you should perform HTML escaping before rendering the content).*

import cgi
import feedparser

d = feedparser.parse('http://rss.slashdot.org/Slashdot/slashdot')

# iterate through entries. If the type is not text/html, HTML clean it
for entry in d.entries:
    if entry.summary_detail.type != 'text/html':
        print cgi.escape(entry.summary)
else:
    print entry.summary

Of course, there are dozens of ways you can iterate through the entries depending on what you want to do with them once they are clean.

回复收藏 0 原文

~没有更多了~