用美丽的图像URL刮擦图像URL

发布于 2025-02-02 13:55:19 字数 1372 浏览 4 评论 0原文

尝试今天学习一些东西并做一些刮擦。

我正在尝试将产品名称和相应的图像URL列入电子表格。

我设法存储了名称,但图像似乎不起作用。希望您能提供帮助!

这是我用来提取文本的代码:

results[0].find('p', {'class': 'product-card__name'}).get_text()

这是我认为会提取图像的方法:

results[0].find('img', {'class':'product-card__image'}).get_src()

这显然是不起作用的。返回“'nontype'对象不可呼应”

吗?

作为参考,下面是我要刮擦的来源。

<li class="product-grid__item"><a href="/p/63818/bumbu-the-original-rum-glass-pack" class="product-card" title=" Bumbu The Original Rum Glass Pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])"><div class="product-card__image-container"><img src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" alt="Bumbu The Original Rum Glass Pack" class="product-card__image" loading="lazy" width="3" height="4"></div><div class="product-card__content"><p class="product-card__name"> Bumbu The Original Rum<span class="product-card__name-secondary">Glass Pack</span></p><p class="product-card__meta"> 70cl / 40% </p></div><div class="product-card__data"><p class="product-card__price"> £39.95 </p><p class="product-card__unit-price"> (£57.07 per litre) </p></div></a></li>

Trying to learn something today and doing a bit of scraping.

I am trying to list product names and corresponding image URLs into a spreadsheet.

I managed to store the names but the images don't seem to work. Hopefully you can help!

Here is the code I use for extracting the text:

results[0].find('p', {'class': 'product-card__name'}).get_text()

Here is what I thought would extract the image:

results[0].find('img', {'class':'product-card__image'}).get_src()

This is obvioulsy not working.Returning that "'NoneType' object is not callable"

Any pointers?

For reference, below is the source I am trying to scrape.

<li class="product-grid__item"><a href="/p/63818/bumbu-the-original-rum-glass-pack" class="product-card" title=" Bumbu The Original Rum Glass Pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])"><div class="product-card__image-container"><img src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" alt="Bumbu The Original Rum Glass Pack" class="product-card__image" loading="lazy" width="3" height="4"></div><div class="product-card__content"><p class="product-card__name"> Bumbu The Original Rum<span class="product-card__name-secondary">Glass Pack</span></p><p class="product-card__meta"> 70cl / 40% </p></div><div class="product-card__data"><p class="product-card__price"> £39.95 </p><p class="product-card__unit-price"> (£57.07 per litre) </p></div></a></li>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

衣神在巴黎 2025-02-09 13:55:19

要获取图像URL,您必须调用.get('src')而不是.get_src()

results[0].find('img', {'class':'product-card__image'}).get('src')

示例:

html='''
<li class="product-grid__item">
 <a class="product-card" href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack">
  <div class="product-card__image-container">
   <img alt="Bumbu The Original Rum Glass Pack" class="product-card__image" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/>
  </div>
  <div class="product-card__content">
   <p class="product-card__name">
    Bumbu The Original Rum
    <span class="product-card__name-secondary">
     Glass Pack
    </span>
   </p>
   <p class="product-card__meta">
    70cl / 40%
   </p>
  </div>
  <div class="product-card__data">
   <p class="product-card__price">
    £39.95
   </p>
   <p class="product-card__unit-price">
    (£57.07 per litre)
   </p>
  </div>
 </a>
</li>
'''

from bs4 import BeautifulSoup
soup=BeautifulSoup(html, "html.parser")
#print(soup.prettify())
print(soup.find('img', {'class':'product-card__image'}).get('src'))

输出:

https://img.thewhiskyexchange.com/480/rum_bum4.jpg

To grab the image url, you have to call .get('src') instead of .get_src()

results[0].find('img', {'class':'product-card__image'}).get('src')

Example:

html='''
<li class="product-grid__item">
 <a class="product-card" href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack">
  <div class="product-card__image-container">
   <img alt="Bumbu The Original Rum Glass Pack" class="product-card__image" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/>
  </div>
  <div class="product-card__content">
   <p class="product-card__name">
    Bumbu The Original Rum
    <span class="product-card__name-secondary">
     Glass Pack
    </span>
   </p>
   <p class="product-card__meta">
    70cl / 40%
   </p>
  </div>
  <div class="product-card__data">
   <p class="product-card__price">
    £39.95
   </p>
   <p class="product-card__unit-price">
    (£57.07 per litre)
   </p>
  </div>
 </a>
</li>
'''

from bs4 import BeautifulSoup
soup=BeautifulSoup(html, "html.parser")
#print(soup.prettify())
print(soup.find('img', {'class':'product-card__image'}).get('src'))

Output:

https://img.thewhiskyexchange.com/480/rum_bum4.jpg
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文