我想从HTML网站上使用美丽的汤从标签中获得HREF链接
我正在抓取此产品页面: https://www.hugoboss.com/us/interlock-cotton-t-shirt-with-exclusive-artwork/hbna50487153_739.html
当前代码:当前代码:
import numpy as np
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time
import requests
driverfile = r'C:\Users\Main\Documents\Work\Projects\Scraping Websites\extra\chromedriver'
#driver.implicitly_wait(10)
url = "https://www.hugoboss.com/us/interlock-cotton-t-shirt-with-exclusive-artwork/hbna50487153_739.html"
def make_soup(url):
page = requests.get(url)
page_soup = soup(page.content, 'lxml')
return page_soup
product_page_soup = make_soup(url)
print(product_page_soup.select('a.slides__slide slides__slide--color-selector.js-slide.js-
product-swatch.widget-initialized'))
`
当前输出:当前输出:是空列表<是一个空列表<代码> []
预期输出:a tag
fyi的html:在同一产品页面上选择另一个标签,例如: print(product_page_soup.select('a.dch-links-item.dch-links-item- real.dch-links-item- unstyled-selector.dch-links-links-item-item-item--bold-bold-innerscore。 dch-links-item-tracking')[0] .Text.Strip())
:此使用相同的方法输出所需的文本,所以我是感到困惑为什么它不适用于有问题的标签'a.slides__slide slides__slide-color-selector.js-slide.js-js- product-swatch.widget-initialized'
我也尝试使用> product_page_soup.findall('a', {“ class”:'slides__slide.slides__slide--color-selector.js-slide.js-product-swatch.widget-initialized'})
,但获得了相同的空列表
I am scraping this product page: https://www.hugoboss.com/us/interlock-cotton-t-shirt-with-exclusive-artwork/hbna50487153_739.html
I want the links of each color of this product from this HTML code:
Current code:
import numpy as np
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time
import requests
driverfile = r'C:\Users\Main\Documents\Work\Projects\Scraping Websites\extra\chromedriver'
#driver.implicitly_wait(10)
url = "https://www.hugoboss.com/us/interlock-cotton-t-shirt-with-exclusive-artwork/hbna50487153_739.html"
def make_soup(url):
page = requests.get(url)
page_soup = soup(page.content, 'lxml')
return page_soup
product_page_soup = make_soup(url)
print(product_page_soup.select('a.slides__slide slides__slide--color-selector.js-slide.js-
product-swatch.widget-initialized'))
`
Current output: is an empty list []
Expected Output: HTML of the a tag
FYI: Selecting another A tag on the same product page works e.g: print(product_page_soup.select('a.dch-links-item.dch-links-item--released.dch-links-item--unstyled-selector.dch-links-item--bold--underscore.dch-links-item-tracking')[0].text.strip())
: This outputs desired text using the same method so I am confused why it would not work for a tag in question 'a.slides__slide slides__slide--color-selector.js-slide.js- product-swatch.widget-initialized'
I also tried using product_page_soup.findAll ('a', {"class":'slides__slide.slides__slide--color-selector.js-slide.js-product-swatch.widget-initialized'})
but got the same empty list
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
以下使用BS4的CSS表达式将获得所需的链接
完整的工作代码:
输出:
The following CSS expression with bs4 will grab the desired links
Full working code:
Output:
在页面源链接中具有@class
“ widget”
。我猜它用替换为“小部件initialized”
呈现页面后。因此,请尝试而不是这样
,因此也应该
为了更好的可读性,我建议使用CSS选择器
In page source link has @class
"widget"
. I guess it replaced with"widget-initialized"
after page rendered. So tryinstead of
And so complete selector should be
Also for better readability I would recommend to use CSS selector