BeautifulSoup 打印多个标签/属性

发布于 2024-11-07 05:58:55 字数 1207 浏览 4 评论 0原文

首先,这是我第一次尝试 Python,到目前为止它看起来很容易使用,尽管我仍然遇到了问题..

我正在尝试将 XML 文件更改为 rss-XML 原始的 xml 源看起来像这样:

<news title="Random Title" date="Date and Time" subtitle="The article txt"></news>

它最终应该看起来像这样:

<item>
<pubDate>Date and Time</pubDate>
<title>Random Title</title>
<content:encoded>The article txt</content:encoded>
</item>

我正在尝试使用 python 和 BeautifulSoup 来执行此操作,使用以下脚本

from BeautifulSoup import BeautifulSoup
import re

doc = [
'<news post_title="Random Title" post_date="Date and Time" post_content="The article txt">''</news></p>'
    ]
soup = BeautifulSoup(''.join(doc))

print soup.prettify()

posttitle = soup.news['post_title']
postdate = soup.news['post_date']
postcontent = soup.news['post_content']

print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"

这里的问题是,它只检索最上面的字符串 XML,而不是其他字符串。 有人能给我一些解决这个问题的指导吗?

干杯:)

First off all, this is my first try on Python, so far it looks pretty easy to use, though I still ran into a problem..

I am trying to change an XML-file to an rss-XML
The original xml source looks like this:

<news title="Random Title" date="Date and Time" subtitle="The article txt"></news>

It shoold eventually look like this:

<item>
<pubDate>Date and Time</pubDate>
<title>Random Title</title>
<content:encoded>The article txt</content:encoded>
</item>

I am trying to do this using python and BeautifulSoup, using the following script

from BeautifulSoup import BeautifulSoup
import re

doc = [
'<news post_title="Random Title" post_date="Date and Time" post_content="The article txt">''</news></p>'
    ]
soup = BeautifulSoup(''.join(doc))

print soup.prettify()

posttitle = soup.news['post_title']
postdate = soup.news['post_date']
postcontent = soup.news['post_content']

print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"

The problem here is, it only retrieves the most ontop string XML, and not the others.
Can anybody give me some directions in fixxing this?

Cheers :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

潜移默化 2024-11-14 05:58:55

您的示例 doc 变量仅包含一个 元素。

但一般来说,您需要循环浏览新闻元素,

例如

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"

Your example doc variable only holds one <news> element.

but in general you would need to loop through the news elements

something like

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"
不…忘初心 2024-11-14 05:58:55

窃取代码并更正它:

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"

Stealing the code and correcting it:

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文