单击使用循环使用同一类名称的多个Divs
我正在尝试单击具有同一类名称的多个DIV。解析HTML页面,提取一些信息并返回同一页面。 在此 page 。
- 选择项目并提取相关信息
- 返回 page < /a>
- 单击下一个项目。
这在for循环之外完美地工作。
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH,'//*[@class="product__wrapper"][1]'))).click()
但是,当我在循环中使用上述命令时。它引发了错误的错误sectectorexception
for i in range(1,len(all_profile_url)):
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH,'//*[@class="product__wrapper"][{i}]'))).click()
time.sleep(10)
wd.execute_script('window.scrollTo(0,1000)')
page_source = BeautifulSoup(wd.page_source, 'html.parser')
info_div = page_source.find('div', class_='ProductInfoCard__Breadcrumb-sc-113r60q-4 cfIqZP')
info_block = info_div.find_all('a')
try:
info_category = info_block[1].get_text().strip()
except IndexError:
info_category ="Null"
wd.back()
time.sleep(5)
,我想使用以下代码从每个页面中提取什么
page_source = BeautifulSoup(wd.page_source, 'html.parser')
info_div = page_source.find('div', class_='ProductInfoCard__Breadcrumb-sc-113r60q-4 cfIqZP')
info_block = info_div.find_all('a')
try:
info_category = info_block[1].get_text().strip()
except IndexError:
info_category ="Null"
try:
info_sub_category = info_block[2].get_text().strip()
except IndexError:
info_sub_category='Null'
try:
info_product_name = info_div.find_all('span')[0].get_text().strip()
except IndexError:
info_product_name='null'
# Extract Brand name
info_div_1 = page_source.find('div', class_='ProductInfoCard__BrandContainer-sc-113r60q-9 exyKqL')
try:
info_brand = info_div_1.find_all('a')[0].get_text().strip()
except IndexError:
info_brand='null'
# Extract details for rest of the page
info_div_2 = page_source.find('div', class_='ProductDetails__RemoveMaxHeight-sc-z5f4ag-3 fOPLcr')
info_block_2 = info_div_2.find_all('div', class_='ProductAttribute__ProductAttributesDescription-sc-dyoysr-2 lnLDYa')
try:
info_shelf_life = info_block_2[0].get_text().strip()
except IndexError:
info_shelf_life = 'null'
try:
info_country_of_origin = info_block_2[3].get_text().strip()
except IndexError:
info_country_of_origin='null'
try:
info_weight = info_block_2[9].get_text().strip()
except IndexError:
info_weight ='null'
try:
info_expiry_date = info_block_2[7].get_text().strip()
except IndexError:
info_expiry_date='null'
# Extract MRP and price
# Extract MRP and price
info_div_3 = page_source.find('div', class_='ProductVariants__VariantDetailsContainer-sc-1unev4j-7 fvkqJd')
info_block_3 = info_div_3.find_all('div', class_='ProductVariants__PriceContainer-sc-1unev4j-9 jjiIua')
info_price_raw = info_block_3[0].get_text().strip()
info_price = info_block_3[0].get_text().strip()[1:3]
info_MRP = info_price_raw[-2:]
I'm trying to click on multiple div with same class name. Parse the HTML page, extract some information and get back to same page.
On this page.
- Select item and extract relevant information
- Get back to same page
- Click on next item.
This works perfectly outside the for loop.
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH,'//*[@class="product__wrapper"][1]'))).click()
But when I use the above command inside my loop. It throws error InvalidSelectorException
for i in range(1,len(all_profile_url)):
WebDriverWait(wd, 20).until(EC.element_to_be_clickable((By.XPATH,'//*[@class="product__wrapper"][{i}]'))).click()
time.sleep(10)
wd.execute_script('window.scrollTo(0,1000)')
page_source = BeautifulSoup(wd.page_source, 'html.parser')
info_div = page_source.find('div', class_='ProductInfoCard__Breadcrumb-sc-113r60q-4 cfIqZP')
info_block = info_div.find_all('a')
try:
info_category = info_block[1].get_text().strip()
except IndexError:
info_category ="Null"
wd.back()
time.sleep(5)
WHAT I want to extract from each page using the code below
page_source = BeautifulSoup(wd.page_source, 'html.parser')
info_div = page_source.find('div', class_='ProductInfoCard__Breadcrumb-sc-113r60q-4 cfIqZP')
info_block = info_div.find_all('a')
try:
info_category = info_block[1].get_text().strip()
except IndexError:
info_category ="Null"
try:
info_sub_category = info_block[2].get_text().strip()
except IndexError:
info_sub_category='Null'
try:
info_product_name = info_div.find_all('span')[0].get_text().strip()
except IndexError:
info_product_name='null'
# Extract Brand name
info_div_1 = page_source.find('div', class_='ProductInfoCard__BrandContainer-sc-113r60q-9 exyKqL')
try:
info_brand = info_div_1.find_all('a')[0].get_text().strip()
except IndexError:
info_brand='null'
# Extract details for rest of the page
info_div_2 = page_source.find('div', class_='ProductDetails__RemoveMaxHeight-sc-z5f4ag-3 fOPLcr')
info_block_2 = info_div_2.find_all('div', class_='ProductAttribute__ProductAttributesDescription-sc-dyoysr-2 lnLDYa')
try:
info_shelf_life = info_block_2[0].get_text().strip()
except IndexError:
info_shelf_life = 'null'
try:
info_country_of_origin = info_block_2[3].get_text().strip()
except IndexError:
info_country_of_origin='null'
try:
info_weight = info_block_2[9].get_text().strip()
except IndexError:
info_weight ='null'
try:
info_expiry_date = info_block_2[7].get_text().strip()
except IndexError:
info_expiry_date='null'
# Extract MRP and price
# Extract MRP and price
info_div_3 = page_source.find('div', class_='ProductVariants__VariantDetailsContainer-sc-1unev4j-7 fvkqJd')
info_block_3 = info_div_3.find_all('div', class_='ProductVariants__PriceContainer-sc-1unev4j-9 jjiIua')
info_price_raw = info_block_3[0].get_text().strip()
info_price = info_block_3[0].get_text().strip()[1:3]
info_MRP = info_price_raw[-2:]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们不需要使用BeautifulSoup来解析数据。硒的方法对于我们的用例就足够了。
输出:
ps:请注意,
product_details
如果要为所有URL概括,那么我们不完全是一个结构化元素,只是我们需要使用正则元素来解析的文本,因此,您必须在索引列表product_details
时进行一些非凡的处理在您的代码中。We don't need to use BeautifulSoup to parse the data. Selenium has methods that will be sufficient for our use case.
Output :
P.S : Please note that
product_details
is not exactly a structured element and just text which we need to parse using regex if want to generalize it for all urls, hence you will have to do some exceptional handling while indexing the listproduct_details
which you have done in your code.