我有以下
response=requests.get(item_url,headers=headers).text
soup=BeautifulSoup(response,'lxml')
print(soup)
product=soup.find_all('a',class_='shelfProductTile-descriptionLink')
print(product)
price_per_weight=soup.find_all('div',class_='shelfProductTile-cupPrice ng-star-inserted')
print(price_per_weight)
来自URL的代码: https://www.woolworths.com.au/shop/shop/search/products?searchterm = uncle%20Tobys%20OATS%20OATS%20500G&; SORTBY = TRADERREELELELELELELELELELELELELELELELELELELELELELELELELELELELELELELVANCE
我已经尝试了LXML和HTML.Parsers and html.parsers'' t在请求html中获取上述变量的类。我还尝试过按美丽的汤find_all find_all return none
但是,仍然获得产品和Price_per_weight的空列表。
可以使用美丽的汤来刮擦此信息,还是需要使用其他工具等工具? (如果可能的话,我更喜欢不使用硒)。
I have the following code
response=requests.get(item_url,headers=headers).text
soup=BeautifulSoup(response,'lxml')
print(soup)
product=soup.find_all('a',class_='shelfProductTile-descriptionLink')
print(product)
price_per_weight=soup.find_all('div',class_='shelfProductTile-cupPrice ng-star-inserted')
print(price_per_weight)
from the url: https://www.woolworths.com.au/shop/search/products?searchTerm=uncle%20tobys%20oats%20500g&sortBy=TraderRelevance
I have tried the lxml and html.parser and don't get the classes for the variables above in the requests HTML. I have also tried using cloudscraper as per Beautiful Soup find_all return None
but still, get an empty list for both product and price_per_weight.
Can this information be scraped using beautiful soup or do I need to use another tool like scrapy? (I prefer not to use selenium if possible).
发布评论
评论(1)
您看到的数据是通过JavaScript从外部URL加载的,因此
BeautifulSoup
看不到它。要加载数据,您可以使用下一个示例:打印:
The data you see is loaded from external URL via JavaScript, so
beautifulsoup
doesn't see it. To load the data you can use next example:Prints: