如何使用 BeautifulSoup 从内联样式中提取 CSS 属性

发布于 2025-01-06 01:56:12 字数 234 浏览 4 评论 0原文

我有这样的事情：

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/>

我正在使用 beautifulsoup 来解析 html。有没有办法提取“background”css属性中的“url”？

原文

I have something like this:

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/>

I am using beautifulsoup to parse the html. Is there away to pull out the "url" in the "background" css attribute?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烂柯人 2025-01-13 01:56:12

你有几个选择——快速而肮脏的或正确的方式。快速而肮脏的方式（如果标记发生更改，这种方式很容易中断）看起来像

>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
>>> style = soup.find('img')['style']
>>> urls = re.findall('url\((.*?)\)', style)
>>> urls
[u'/theRealImage.jpg']

显然，您必须使用它才能使其与多个 img 标签一起使用。

正确的方法是使用 CSS 解析器，因为我觉得建议某人在 CSS 字符串上使用正则表达式会很糟糕:)。 cssutils 是我刚刚在 Google 上找到的一个库，可以在 PyPi 上使用，看起来它可以完成这项工作。

You've got a couple options- quick and dirty or the Right Way. The quick and dirty way (which will break easily if the markup is changed) looks like

>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
>>> style = soup.find('img')['style']
>>> urls = re.findall('url\((.*?)\)', style)
>>> urls
[u'/theRealImage.jpg']

Obviously, you'll have to play with that to get it to work with multiple img tags.

The Right Way, since I'd feel awful suggesting someone use regex on a CSS string :), uses a CSS parser. cssutils, a library I just found on Google and available on PyPi, looks like it might do the job.

回复收藏 0 原文

~没有更多了~