使用 Beautiful Soup 帮助从 HTML 检索产品代码

发布于 2024-09-14 13:04:34 字数 325 浏览 1 评论 0原文

一个网页有一个我需要检索的产品代码，它位于以下 HTML 部分中：

<table...>
<tr>
 <td>
 <font size="2">Product Code#</font>
 <br>
 <font size="1">2342343</font>
 </td>

</tr>
</table>

所以我想最好的方法是首先引用文本值“产品代码#”的 html 元素，然后引用TD 中的第二个字体标签。

有想法吗？

原文

A webpage has a product code I need to retrive, and it is in the following HTML section:

<table...>
<tr>
 <td>
 <font size="2">Product Code#</font>
 <br>
 <font size="1">2342343</font>
 </td>

</tr>
</table>

So I guess the best way to do this would be first to reference the html element with the text value 'Product Code#', and then reference the 2nd font tag in the TD.

Ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

む无字情书 2024-09-21 13:04:34

我的策略是：

查找与字符串“Product Code#”匹配的文本节点
对于每个此类节点，获取父级元素并查找父级的下一个兄弟 code> element
将同级元素的内容插入到列表中

代码：

from BeautifulSoup import BeautifulSoup


html = open("products.html").read()
soup = BeautifulSoup(html)

product_codes = [tag.parent.findNextSiblings('font')[0].contents[0]
                 for tag in 
                 soup.findAll(text='Product Code#')]

My strategy is:

Find text nodes matching the string "Product Code#"
For each such node, get the parent <font> element and find the parent's next sibling <font> element
Insert the contents of the sibling element into a list

The code:

from BeautifulSoup import BeautifulSoup


html = open("products.html").read()
soup = BeautifulSoup(html)

product_codes = [tag.parent.findNextSiblings('font')[0].contents[0]
                 for tag in 
                 soup.findAll(text='Product Code#')]

回复收藏 0 原文

自找没趣 2024-09-21 13:04:34

假设 soup 是您的 BeautifulSoup 实例：

int(''.join(soup("font", size="1")[0](text=True)))

或者，如果您需要获取多个产品代码：

[int(''.join(font(text=True))) for font in soup("font", size="1")]

Assuming soup is your BeautifulSoup instance:

int(''.join(soup("font", size="1")[0](text=True)))

Or, if you need to get multiple product codes:

[int(''.join(font(text=True))) for font in soup("font", size="1")]

回复收藏 0 原文

萌逼全场 2024-09-21 13:04:34

您可以使用此正则表达式（或类似的东西）：

\n\ Product\ Code\#\n\ \ n\ (?.+?)\n\

您可能可以删除一些转义符，具体取决于你的正则表达式引擎...我很谨慎。

回复收藏 0 原文

如痴如狂 2024-09-21 13:04:34

不要使用正则表达式来解析 HTML。我将使用以下 XPATH 来完成此任务：

//TABLE/TR/TD/FONT[@size='1']

或者，如果字体大小属性不能保证存在且等于 1：

//FONT[text()='Product Code#']/parent::*/FONT[2]

Don't use regular expressions to parse HTML. I would use the following XPATH for this task:

//TABLE/TR/TD/FONT[@size='1']

Or, if the font size attribute is not guaranteed to be there and equal to 1:

//FONT[text()='Product Code#']/parent::*/FONT[2]

回复收藏 0 原文

~没有更多了~

关于作者

拍不死你

暂无简介

0 文章

0 评论

598 人气

关注发私信

lioqio

文章 0 评论 0

关注

Single

文章 0 评论 0

关注

禾厶谷欠

文章 0 评论 0

关注

alipaysp_2zg8elfGgC

文章 0 评论 0

关注

qq_N6d4X7

文章 0 评论 0

关注

放低过去

文章 0 评论 0

友情链接

文江博客

使用 Beautiful Soup 帮助从 HTML 检索产品代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

使用 Beautiful Soup 帮助从 HTML 检索产品代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。