如何为动态变化的元素编写CSS/XPATH？

发布于 2025-02-09 02:22:49 字数 738 浏览 2 评论 0原文

我正在使用美丽的汤，下面是我的选择器来刮擦HREF。

html = '''      <a data-testid="Link" class="sc-pciXn eUevWj JobTile___StyledJobLink-sc- 
                 1nulpkp-0 gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP" 
                  href="https://join.com/companies/talpasolutions/4978529-project-customer- 
                   success-manager-heavy-industries-d-f-m">'''

soup = beautifulsoup(HTML , "lxml")

jobs = soup.find_all( "a" ,class_= "sc-pciXn eUevWj JobTile___StyledJobLink-sc-1nulpkp-0 
                                gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP")

for job in jobs:
    job_url = job.get("href")

因为HREFS总共有3个元素。

我正在使用find_all ，我需要一种设计CSS/XPATH 的不同方法

原文

I am using beautiful soup and below is my selector to scrape href.

html = '''      <a data-testid="Link" class="sc-pciXn eUevWj JobTile___StyledJobLink-sc- 
                 1nulpkp-0 gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP" 
                  href="https://join.com/companies/talpasolutions/4978529-project-customer- 
                   success-manager-heavy-industries-d-f-m">'''

soup = beautifulsoup(HTML , "lxml")

jobs = soup.find_all( "a" ,class_= "sc-pciXn eUevWj JobTile___StyledJobLink-sc-1nulpkp-0 
                                gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP")

for job in jobs:
    job_url = job.get("href")

I am using find_all because there is a total of 3 elements with hrefs.

Above method is working but the website keeps changing the classes on a daily basis. I need a different way to design CSS/XPath

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梨涡少年 2025-02-16 02:22:49

尝试：

import requests
from bs4 import BeautifulSoup


url = "https://join.com/companies/talpasolutions"
soup = BeautifulSoup(requests.get(url).content, "lxml")

for a in soup.select("a:has(h3)"):
    print(a.get("href"))

打印：

https://join.com/companies/talpasolutions/4978529-project-customer-success-manager-heavy-industries-d-f-m
https://join.com/companies/talpasolutions/4925936-senior-data-engineer-d-f-m
https://join.com/companies/talpasolutions/4926107-senior-data-scientist-d-f-m

Try:

import requests
from bs4 import BeautifulSoup


url = "https://join.com/companies/talpasolutions"
soup = BeautifulSoup(requests.get(url).content, "lxml")

for a in soup.select("a:has(h3)"):
    print(a.get("href"))

Prints:

https://join.com/companies/talpasolutions/4978529-project-customer-success-manager-heavy-industries-d-f-m
https://join.com/companies/talpasolutions/4925936-senior-data-engineer-d-f-m
https://join.com/companies/talpasolutions/4926107-senior-data-scientist-d-f-m

回复收藏 0 原文

~没有更多了~