使用BS4从DIV和SRCSET中提取图像链接
HTML中的示例DIV标签:
[<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
https://img.example.image.link.here/954839 480w,
https://img.example.image.link.here/954839 600w,
https://img.example.image.link.here/954839 800w,
https://img.example.image.link.here/954839 1080w
">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>]
所需结果(SRCSET):
https://img.example.image.link.here/954839
我的功能:
def extract_img_link(html):
with open(html, 'rb') as file:
content = BeautifulSoup(file)
for image in content.findAll('div', attrs={'class':'event-info-and-content'}):
print(image.get("srcset"))
return(image)
#calling out the html and function
html = 'data/website/events.html'
print(extract_img_link(html))
我的功能只返回我正在寻找的整个标签,而不是内部的特定链接:
[<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
https://img.example.image.link.here/954839 480w,
https://img.example.image.link.here/954839 600w,
https://img.example.image.link.here/954839 800w,
https://img.example.image.link.here/954839 1080w
">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>]
Example div tag within html:
[<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
https://img.example.image.link.here/954839 480w,
https://img.example.image.link.here/954839 600w,
https://img.example.image.link.here/954839 800w,
https://img.example.image.link.here/954839 1080w
">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>]
Desired outcome (srcset):
https://img.example.image.link.here/954839
My function:
def extract_img_link(html):
with open(html, 'rb') as file:
content = BeautifulSoup(file)
for image in content.findAll('div', attrs={'class':'event-info-and-content'}):
print(image.get("srcset"))
return(image)
#calling out the html and function
html = 'data/website/events.html'
print(extract_img_link(html))
My function simply returns the entire tag i was looking for, rather than the specific link within:
[<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
https://img.example.image.link.here/954839 480w,
https://img.example.image.link.here/954839 600w,
https://img.example.image.link.here/954839 800w,
https://img.example.image.link.here/954839 1080w
">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您忘记了内部的额外图层,即
图片
内部div
s后面为我工作。
You forgot about an extra layer inside, namely
picture
insidediv
Following worked for me.
要获取图像路径,请更改您的选择,并使用
&lt; picture&gt;
的单个选择:或
&lt; source&gt;
:示例
输出
To get the image path change your selection and use the single one from the
<picture>
:or the
<source>
:Example
Output