如何为动态变化的元素编写CSS/XPATH?

发布于 2025-02-09 02:22:49 字数 738 浏览 2 评论 0原文

我正在使用美丽的汤,下面是我的选择器来刮擦HREF。

html = '''      <a data-testid="Link" class="sc-pciXn eUevWj JobTile___StyledJobLink-sc- 
                 1nulpkp-0 gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP" 
                  href="https://join.com/companies/talpasolutions/4978529-project-customer- 
                   success-manager-heavy-industries-d-f-m">'''

soup = beautifulsoup(HTML , "lxml")

jobs = soup.find_all( "a" ,class_= "sc-pciXn eUevWj JobTile___StyledJobLink-sc-1nulpkp-0 
                                gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP")

for job in jobs:
    job_url = job.get("href")

因为HREFS总共有3个元素。

我正在使用find_all , 我需要一种设计CSS/XPATH 的不同方法

I am using beautiful soup and below is my selector to scrape href.

html = '''      <a data-testid="Link" class="sc-pciXn eUevWj JobTile___StyledJobLink-sc- 
                 1nulpkp-0 gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP" 
                  href="https://join.com/companies/talpasolutions/4978529-project-customer- 
                   success-manager-heavy-industries-d-f-m">'''

soup = beautifulsoup(HTML , "lxml")

jobs = soup.find_all( "a" ,class_= "sc-pciXn eUevWj JobTile___StyledJobLink-sc-1nulpkp-0 
                                gkKKqP JobTile___StyledJobLink-sc-1nulpkp-0 gkKKqP")

for job in jobs:
    job_url = job.get("href")

I am using find_all because there is a total of 3 elements with hrefs.

Above method is working but the website keeps changing the classes on a daily basis. I need a different way to design CSS/XPath

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梨涡少年 2025-02-16 02:22:49

尝试:

import requests
from bs4 import BeautifulSoup


url = "https://join.com/companies/talpasolutions"
soup = BeautifulSoup(requests.get(url).content, "lxml")

for a in soup.select("a:has(h3)"):
    print(a.get("href"))

打印:

https://join.com/companies/talpasolutions/4978529-project-customer-success-manager-heavy-industries-d-f-m
https://join.com/companies/talpasolutions/4925936-senior-data-engineer-d-f-m
https://join.com/companies/talpasolutions/4926107-senior-data-scientist-d-f-m

Try:

import requests
from bs4 import BeautifulSoup


url = "https://join.com/companies/talpasolutions"
soup = BeautifulSoup(requests.get(url).content, "lxml")

for a in soup.select("a:has(h3)"):
    print(a.get("href"))

Prints:

https://join.com/companies/talpasolutions/4978529-project-customer-success-manager-heavy-industries-d-f-m
https://join.com/companies/talpasolutions/4925936-senior-data-engineer-d-f-m
https://join.com/companies/talpasolutions/4926107-senior-data-scientist-d-f-m
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文