在Python和Beautifutsoup中,如何获得URL链接,而不是从汤中获得HREF?

发布于 2025-02-12 20:37:42 字数 3273 浏览 0 评论 0原文

使用以下代码:

prop_img = prop_lst.find_all('a',{'class':'mpi_img_link'})

我将获得以下输出的列表:

[<a class="mpi_img_link" href="/homedetail/5324-palm-royale-blvd-sugar-land-tx-77479/2504873" style="background-image:url(https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300);"></a>,
 <a class="mpi_img_link" href="/homedetail/27-riverstone-island-dr-sugar-land-tx-77479/13157541" style="background-image:url(https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960);"></a>,
 <a class="mpi_img_link" href="/homedetail/23-beacon-hl-sugar-land-tx-77479/2507526" style="background-image:url(https://photos.harstatic.com/205977706/lr/img-1.jpeg?ts=2022-03-20T23:10:23.777);"></a>,
 <a class="mpi_img_link" href="/homedetail/0-hagerson-rd-sugar-land-tx-77479/2375356" style="background-image:url(https://photos.harstatic.com/205015725/lr/img-1.jpeg?ts=2022-02-16T10:20:54.847);"></a>,
 <a class="mpi_img_link" href="/homedetail/5-cypress-valley-ct-sugar-land-tx-77479/2505565" style="background-image:url(https://photos.harstatic.com/208809599/lr/img-1.jpeg?ts=2022-06-27T14:34:55.917);"></a>,
 <a class="mpi_img_link" href="/homedetail/21-grand-mnr-sugar-land-tx-77479/2506201" style="background-image:url(https://photos.harstatic.com/201552628/lr/img-1.jpeg?ts=2021-10-21T10:55:15.270);"></a>,
 <a class="mpi_img_link" href="/homedetail/427-w-alkire-lake-dr-sugar-land-tx-77478/10240223" style="background-image:url(https://photos.harstatic.com/203290759/lr/img-1.jpeg?ts=2022-01-02T15:48:57.463);"></a>,
 <a class="mpi_img_link" href="/homedetail/1309-n-horseshoe-dr-sugar-land-tx-77478/2390056" style="background-image:url(https://photos.harstatic.com/209561396/lr/img-1.jpeg?ts=2022-06-27T21:04:42.547);"></a>,
 <a class="mpi_img_link" href="/homedetail/1217-n-horseshoe-dr-sugar-land-tx-77478/10101841" style="background-image:url(https://photos.harstatic.com/207957668/lr/img-1.jpeg?ts=2022-06-12T19:32:34.500);"></a>,
 <a class="mpi_img_link" href="/homedetail/1990-hagerson-rd-sugar-land-tx-77479/15860752" style="background-image:url(https://photos.harstatic.com/208557478/lr/img-1.jpeg?ts=2022-06-02T16:14:04.770);"></a>,
 <a class="mpi_img_link" href="/homedetail/15202-old-richmond-rd-sugar-land-tx-77498/2387668" style="background-image:url(https://photos.harstatic.com/194038254/lr/img-1.jpeg?ts=2021-03-03T18:51:20.263);"></a>,
 <a class="mpi_img_link" href="/homedetail/323-w-alkire-lake-dr-sugar-land-tx-77478/2390859" style="background-image:url(https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540);"></a>]

这很棒!现在,我寻求获得'style =“背景图像:url”'url,这是以下ouputs。我做不是想要HREF链接。

https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300
https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960
:
:
:
https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540

我认为您应该在下面使用CSS选择器,但是我仍然无法实现最终目标。谁能帮忙吗?谢谢你!

for img in prop_img:
    print(prop_lst.select('style'))

Using the following code below:

prop_img = prop_lst.find_all('a',{'class':'mpi_img_link'})

I get a list of the following output:

[<a class="mpi_img_link" href="/homedetail/5324-palm-royale-blvd-sugar-land-tx-77479/2504873" style="background-image:url(https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300);"></a>,
 <a class="mpi_img_link" href="/homedetail/27-riverstone-island-dr-sugar-land-tx-77479/13157541" style="background-image:url(https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960);"></a>,
 <a class="mpi_img_link" href="/homedetail/23-beacon-hl-sugar-land-tx-77479/2507526" style="background-image:url(https://photos.harstatic.com/205977706/lr/img-1.jpeg?ts=2022-03-20T23:10:23.777);"></a>,
 <a class="mpi_img_link" href="/homedetail/0-hagerson-rd-sugar-land-tx-77479/2375356" style="background-image:url(https://photos.harstatic.com/205015725/lr/img-1.jpeg?ts=2022-02-16T10:20:54.847);"></a>,
 <a class="mpi_img_link" href="/homedetail/5-cypress-valley-ct-sugar-land-tx-77479/2505565" style="background-image:url(https://photos.harstatic.com/208809599/lr/img-1.jpeg?ts=2022-06-27T14:34:55.917);"></a>,
 <a class="mpi_img_link" href="/homedetail/21-grand-mnr-sugar-land-tx-77479/2506201" style="background-image:url(https://photos.harstatic.com/201552628/lr/img-1.jpeg?ts=2021-10-21T10:55:15.270);"></a>,
 <a class="mpi_img_link" href="/homedetail/427-w-alkire-lake-dr-sugar-land-tx-77478/10240223" style="background-image:url(https://photos.harstatic.com/203290759/lr/img-1.jpeg?ts=2022-01-02T15:48:57.463);"></a>,
 <a class="mpi_img_link" href="/homedetail/1309-n-horseshoe-dr-sugar-land-tx-77478/2390056" style="background-image:url(https://photos.harstatic.com/209561396/lr/img-1.jpeg?ts=2022-06-27T21:04:42.547);"></a>,
 <a class="mpi_img_link" href="/homedetail/1217-n-horseshoe-dr-sugar-land-tx-77478/10101841" style="background-image:url(https://photos.harstatic.com/207957668/lr/img-1.jpeg?ts=2022-06-12T19:32:34.500);"></a>,
 <a class="mpi_img_link" href="/homedetail/1990-hagerson-rd-sugar-land-tx-77479/15860752" style="background-image:url(https://photos.harstatic.com/208557478/lr/img-1.jpeg?ts=2022-06-02T16:14:04.770);"></a>,
 <a class="mpi_img_link" href="/homedetail/15202-old-richmond-rd-sugar-land-tx-77498/2387668" style="background-image:url(https://photos.harstatic.com/194038254/lr/img-1.jpeg?ts=2021-03-03T18:51:20.263);"></a>,
 <a class="mpi_img_link" href="/homedetail/323-w-alkire-lake-dr-sugar-land-tx-77478/2390859" style="background-image:url(https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540);"></a>]

This is great! Now I seek to get the 'style="background-image:url"' url, that is the following ouputs. I do NOT want the href links.

https://photos.harstatic.com/205383563/lr/img-1.jpeg?ts=2022-03-02T17:03:25.300
https://photos.harstatic.com/184277385/lr/img-1.jpeg?ts=2020-04-07T14:50:50.960
:
:
:
https://photos.harstatic.com/206236194/lr/img-1.jpeg?ts=2022-03-29T10:03:14.540

I think you are supposed to use the CSS selector like below, but I am still not able to achieve the end goal. Can anyone help with this please? Thank you!

for img in prop_img:
    print(prop_lst.select('style'))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只为守护你 2025-02-19 20:37:42

几乎在那里。尝试:

for img in prop_img:
    print(img['style'])

拔出URL部分:

import re

for img in prop_img:
    url = re.search('\((.*)\)', img['style']).group(1)  
    print(url)

Almost there. Try:

for img in prop_img:
    print(img['style'])

To pull out the url part:

import re

for img in prop_img:
    url = re.search('\((.*)\)', img['style']).group(1)  
    print(url)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文