Python/Selenium - 无法访问节标记中的元素
我正在使用 selenium 抓取网页以获取产品型号。该页面有产品网格的两个部分,两个部分之间有一张卡片。我可以从“browse-search-pods-1”的第一部分获取型号,但无法从“browse-search-pods-2”之后的第二部分访问页面下半部分的元素。它忽略了第二部分。有 24 个产品,但它只抓取第一部分的前 12 个。我如何访问这两个部分?
这是网站: https://www.homedepot.com/b/ Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts
以下是一种产品的 html 示例:
<div class="grid">
<section id="browse-search-pods-1" class="grid">
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg" data-lg-name="Product Pod: 0">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">USG Sheetrock Brand</span><span class="product-pod__title__product">1/2 in. x 4 ft. x 8 ft. UltraLight Drywall</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 14113411708</div>
</div>
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width:89.80600000000001%"></span></div>
<span class="ratings__count--6r7g3">
(<!-- -->3753<!-- -->)
</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
<section id="browse-search-pods-2" class="grid">
<div class="category-cards col__12-12" data-lg-name="Product Pod: 0">
<div class="category-cards__zone-wrapper category-cards__zone-card">
<section class="zone-card__zone1">
<div class="zone-card__header-wrapper">
<h2 class="zone-card__header u__bold">Project Guide</h2>
<p class="zone-card__header-text">Installing Drywall Project Guide</p>
</div>
<div class="zone-card-details">
<div class="zone-card-details__image"><img src="https://www.homedepot.com/hdus/en_US/DTCCOMNEW/fetch/FetchRules/FetchPN/how-to-install-drywall-professional-steps-HT-PG-BM.jpg" alt="" class="stretchy" height="1" width="1" loading="lazy"></div>
<div class="zone-card-details__description">
<div class="zone-card-details__text category-cards-details__text--truncate">Hanging drywall is not difficult if you have patience, the right tools and a friend to help. Follow our instructions to learn more</div>
<div class="zone-card-details__actions"><a class="bttn-outline bttn-outline--primary bttn--inline zone-card-details__btn" href="//www.homedepot.com/c/how_to_install_drywall_professional_steps_HT_PG_BM"><span class="bttn__content">Read Our Guide</span></a></div>
</div>
</div>
</section>
<section class="zone-card__zone2">
<div class="zone-card__header-wrapper">
<h2 class="u__truncate zone-card__header u__bold">Buying Guide</h2>
<p class="zone-card__header-text">Types of Drywall</p>
</div>
<div class="zone-card__video-wrapper">
<a class="zone-card__vidcap-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">
<div class="zone-card-details__image zone-card-details__image--vidcap" style="background-image: url("https://i3.ytimg.com/vi/4hF9_z3IqaA/mqdefault.jpg");"></div>
</a>
</div>
<a class="zone-card__video-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">See Our Tips</a>
</section>
</div>
</div>
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">Westpac Materials</span><span class="product-pod__title__product">18 lb. Fast Set 20 Lite Setting-Type Joint Compound</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 22165H</div>
</div>
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width: 94.16%;"></span></div>
<span class="ratings__count--6r7g3">(226)</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
</div>
这是我尝试访问第二部分的代码,但我从第一部分获取了型号:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts')
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
product_model = section_two.find_elements(By.XPATH, "//div[contains(@class, 'product-identifier product-identifier__model')]")
for model in product_model:
print(model.text)
I'm using selenium to scrape a web page to get product model numbers. The page has two sections of a grid of products with a card between the two sections. I can grab the model numbers from the first section from "browse-search-pods-1" but I can't access the elements on the bottom half of the page from the second section after "browse-search-pods-2". It ignores the second section. There are 24 products but it only grabs the first 12 from the first section. How can I access both sections?
Here's the website:
https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts
Here's a sample of the html for one product:
<div class="grid">
<section id="browse-search-pods-1" class="grid">
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg" data-lg-name="Product Pod: 0">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">USG Sheetrock Brand</span><span class="product-pod__title__product">1/2 in. x 4 ft. x 8 ft. UltraLight Drywall</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 14113411708</div>
</div>
<a href="/p/USG-Sheetrock-Brand-1-2-in-x-4-ft-x-8-ft-UltraLight-Drywall-14113411708/202530243#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width:89.80600000000001%"></span></div>
<span class="ratings__count--6r7g3">
(<!-- -->3753<!-- -->)
</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
<section id="browse-search-pods-2" class="grid">
<div class="category-cards col__12-12" data-lg-name="Product Pod: 0">
<div class="category-cards__zone-wrapper category-cards__zone-card">
<section class="zone-card__zone1">
<div class="zone-card__header-wrapper">
<h2 class="zone-card__header u__bold">Project Guide</h2>
<p class="zone-card__header-text">Installing Drywall Project Guide</p>
</div>
<div class="zone-card-details">
<div class="zone-card-details__image"><img src="https://www.homedepot.com/hdus/en_US/DTCCOMNEW/fetch/FetchRules/FetchPN/how-to-install-drywall-professional-steps-HT-PG-BM.jpg" alt="" class="stretchy" height="1" width="1" loading="lazy"></div>
<div class="zone-card-details__description">
<div class="zone-card-details__text category-cards-details__text--truncate">Hanging drywall is not difficult if you have patience, the right tools and a friend to help. Follow our instructions to learn more</div>
<div class="zone-card-details__actions"><a class="bttn-outline bttn-outline--primary bttn--inline zone-card-details__btn" href="//www.homedepot.com/c/how_to_install_drywall_professional_steps_HT_PG_BM"><span class="bttn__content">Read Our Guide</span></a></div>
</div>
</div>
</section>
<section class="zone-card__zone2">
<div class="zone-card__header-wrapper">
<h2 class="u__truncate zone-card__header u__bold">Buying Guide</h2>
<p class="zone-card__header-text">Types of Drywall</p>
</div>
<div class="zone-card__video-wrapper">
<a class="zone-card__vidcap-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">
<div class="zone-card-details__image zone-card-details__image--vidcap" style="background-image: url("https://i3.ytimg.com/vi/4hF9_z3IqaA/mqdefault.jpg");"></div>
</a>
</div>
<a class="zone-card__video-link" href="//www.homedepot.com/c/ab/types-of-drywall/9ba683603be9fa5395fab90c24feaae">See Our Tips</a>
</section>
</div>
</div>
<div class="browse-search__pod col__true-12 col__6-12--xs col__4-12--sm col__3-12--md col__3-12--lg">
<div class="desktop product-pod" data-automation-id="podnode" data-type="product">
<div class="product-pod--padding">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" target="_blank" rel="noopener noreferrer" class="super-sku__inline-swatch__mini-swatch__more-options">More Options</a>
<div class="product-pod__title product-pod__title__product">
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411" class="header product-pod--ie-fix">
<div class="product-pod--ie-fix product-pod__title-control">
<h2 class="product-pod__title product-pod__title__product"><span class="product-pod__title__brand--bold">Westpac Materials</span><span class="product-pod__title__product">18 lb. Fast Set 20 Lite Setting-Type Joint Compound</span></h2>
</div>
</a>
</div>
<div class="ratings-and-model-number-container">
<div class="product-pod-list__identifiers">
<div class="product-identifier product-identifier__model">Model# 22165H</div>
</div>
<a href="/p/Westpac-Materials-18-lb-Fast-Set-20-Lite-Setting-Type-Joint-Compound-22165H/100320411#ratings-and-reviews" data-testid="product-pod__ratings-link">
<div class="ratings--6r7g3">
<div class="reviews--c43xm reviews--no-margin--c43xm" title=""><span class="stars--c43xm" style="width: 94.16%;"></span></div>
<span class="ratings__count--6r7g3">(226)</span>
</div>
</a>
</div>
</div>
</div>
</div>
</section>
</div>
Here's the code I've tried to access the second section but I get the model numbers from the first:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get('https://www.homedepot.com/b/Building-Materials-Drywall/N-5yc1vZar3d?catStyle=ShowProducts')
section_two = driver.find_element(By.XPATH, "//section[contains(@id, 'browse-search-pods-2')]")
product_model = section_two.find_elements(By.XPATH, "//div[contains(@class, 'product-identifier product-identifier__model')]")
for model in product_model:
print(model.text)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试滚动到元素
browse-search-pods-2
,然后执行滚动操作,您可以尝试:
org.openqa.selenium.interactions.Actions
反映在中ActionChains
类:或者,您也可以通过
scrollIntoView()
“滚动到视图”:Try scrolling to the element
browse-search-pods-2
and then doFor scrolling you can try:
org.openqa.selenium.interactions.Actions
are reflected inActionChains
class:Or, you can also "scroll into view" via
scrollIntoView()
: