从网站列表中提取数据，没有多余的标签

发布于 2024-08-14 09:13:28 字数 727 浏览 1 评论 0原文

工作代码：通过 python 和 beautiful soup 进行 Google 字典查找 ->只需执行并输入一个单词即可。

我非常简单地从特定列表项中提取了第一个定义。然而，为了获得纯数据，我必须在换行符处分割数据，然后将其剥离以删除多余的列表标记。

我的问题是，是否有一种方法可以提取特定列表中包含的数据，而无需执行上述字符串操作 - 也许是我尚未看到的 beautiful soup 中的函数？

这是代码的相关部分：

# Retrieve HTML and parse with BeautifulSoup.
    doc = userAgentSwitcher().open(queryURL).read()
    soup = BeautifulSoup(doc)

# Extract the first list item -> and encode it.
    definition = soup('li', limit=2)[0].encode('utf-8')

# Format the return as word:definition removing superfluous data.
    print word + " : " + definition.split("<br />")[0].strip("<li>")

原文

Working code: Google dictionary lookup via python and beautiful soup -> simply execute and enter a word.

I've quite simply extracted the first definition from a specific list item. However to get plain data, I've had to split my data at the line break, and then strip it to remove the superfluous list tag.

My question is, is there a method to extract the data contained within a specific list without doing my above string manipulation - perhaps a function in beautiful soup that I have yet to see?

This is the relevant section of code:

# Retrieve HTML and parse with BeautifulSoup.
    doc = userAgentSwitcher().open(queryURL).read()
    soup = BeautifulSoup(doc)

# Extract the first list item -> and encode it.
    definition = soup('li', limit=2)[0].encode('utf-8')

# Format the return as word:definition removing superfluous data.
    print word + " : " + definition.split("<br />")[0].strip("<li>")

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回忆躺在深渊里 2024-08-21 09:13:28

我认为您正在寻找 findAll(text=True) 这将从标签中提取文本

definitions = soup('ul')[0].findAll(text=True)

将返回在标签边界处断开的所有文本内容的列表

I think you are looking for findAll(text=True) this will extract the text from the tags

definitions = soup('ul')[0].findAll(text=True)

Will return a ist of all the text contents broken at the tag boundaries

回复收藏 0 原文

~没有更多了~

关于作者

指尖凝香

暂无简介

0 文章

0 评论

22 人气

关注发私信

1CH1MKgiKxn9p

文章 0 评论 0

关注

ゞ记忆︶ㄣ

文章 0 评论 0

关注

JackDx

文章 0 评论 0

关注

信远

文章 0 评论 0

关注

yaoduoduo1995

文章 0 评论 0

关注

霞映澄塘

文章 0 评论 0

友情链接

文江博客

从网站列表中提取数据，没有多余的标签

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

从网站列表中提取数据，没有多余的标签

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。