BeautifulSoup:获取内容[]作为单个字符串

发布于 2024-10-08 03:05:46 字数 268 浏览 0 评论 0原文

有人知道一种优雅的方法来将 soup 对象的全部内容作为单个字符串获取吗?

目前我正在获取 contents,这当然是一个列表,然后迭代它:

notices = soup.find("div", {"class" : "middlecontent"})
con = ""
for content in notices.contents:
    con += str(content)
print con

谢谢!

Anyone know an elegant way to get the entire contents of a soup object as a single string?

At the moment I'm getting contents, which is of course a list, and then iterating over it:

notices = soup.find("div", {"class" : "middlecontent"})
con = ""
for content in notices.contents:
    con += str(content)
print con

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

昔梦 2024-10-15 03:05:46

contents = str(notices) 怎么样?

或者contents = notifications.renderContents(),这将隐藏 div 标签。

What about contents = str(notices) ?

Or maybe contents = notices.renderContents(), which will hide the div tag.

千柳 2024-10-15 03:05:46

您可以使用 join() 方法:

notices = soup.find("div", {"class": "middlecontent"})
contents = "".join([str(item) for item in notices.contents])

或者使用生成器表达式:

contents = "".join(str(item) for item in notices.contents)

You can use the join() method:

notices = soup.find("div", {"class": "middlecontent"})
contents = "".join([str(item) for item in notices.contents])

Or, using a generator expression:

contents = "".join(str(item) for item in notices.contents)
清风疏影 2024-10-15 03:05:46
#!/usr/bin/env python
# coding: utf-8
__author__ = 'spouk'

import BeautifulSoup
import requests


def parse_contents_href(url, url_args=None, check_content_find=None, tag='a'):
    """
    parse href contents url and find some text in href contents [ for example ]
    """
    html = requests.get(url, params=url_args)
    page = BeautifulSoup.BeautifulSoup(html.text)
    alllinks = page.findAll(tag,  href=True)
    result = check_content_find and filter(
        lambda x: check_content_find in x['href'], alllinks) or alllinks
    return result and "".join(map(str, result)) or False


url = 'https://vk.com/postnauka'
print parse_contents_href(url)
#!/usr/bin/env python
# coding: utf-8
__author__ = 'spouk'

import BeautifulSoup
import requests


def parse_contents_href(url, url_args=None, check_content_find=None, tag='a'):
    """
    parse href contents url and find some text in href contents [ for example ]
    """
    html = requests.get(url, params=url_args)
    page = BeautifulSoup.BeautifulSoup(html.text)
    alllinks = page.findAll(tag,  href=True)
    result = check_content_find and filter(
        lambda x: check_content_find in x['href'], alllinks) or alllinks
    return result and "".join(map(str, result)) or False


url = 'https://vk.com/postnauka'
print parse_contents_href(url)
记忆で 2024-10-15 03:05:46

但该列表是递归的,所以......
我认为这会起作用。
我是 python 新手,所以代码可能看起来有点奇怪

getString = lambda x: \
    x if type(x).__name__ == 'NavigableString' \
    else "".join( \
    getString(t) for t in x)

contents = getString(notices)

But the list is recursive, so...
I think this will work.
I'm new to python, so the code may look a little weird

getString = lambda x: \
    x if type(x).__name__ == 'NavigableString' \
    else "".join( \
    getString(t) for t in x)

contents = getString(notices)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文