LXML抓住所有共享某个XPath的项目

发布于 2025-02-04 04:43:00 字数 1027 浏览 2 评论 0原文

我正在尝试使用XPath从网站上获取所有价格。所有价格都具有相同的XPATH,只有[0],或者我认为第一个项目有效...让我向您展示:

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[1]/div[5]/div/div/div/div[1]/ul/li[1]/article/div[1]/div[2]/div')[0].text)

这成功打印出第一个价格!!! 我尝试将“ [0] .Text”更改为1,以打印第二项,但它返回了“超出范围”。 然后,我试图考虑一些可以打印所有物品的循环,因此我可以创建一个平均值。

任何帮助将不胜感激!!!

代码

我深表歉意编辑的是BS4的 从LXML导入 导入请求

url =“ https://www.newegg.com/p/pl?d=gpu&n=601357247247%20100007709”

#headers =您需要在此处添加自己的标头,不会让发布。

webpage = requests.get(url,headers =标题)

soup = beautifutsoup(webpage.content,“ html.parser”)

dom = eTree.html(str(str(str))

print(dom.xpath('/html/html/html/hody/div) [10]/div [4]/pection/div/div/div/div [2]/div/div/div/div/div/div/div/div/div/div [2]/div [2]/div [1]/div/div/div/div/div/div/div/div/div/div/ div [2]/ul/li [3]/strong')[0] .Text)

I'm trying to grab all prices from a website, using the xpath. all prices have the same xpath, and only [0], or I assume the 1st item works... let me show you:

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[1]/div[5]/div/div/div/div[1]/ul/li[1]/article/div[1]/div[2]/div')[0].text)

This successfully prints the 1st price!!!
I tried changing "[0].text" to 1, to print the 2nd item but it returned "out of range".
Then I was trying to think of some For loop that would print All Items, so I could create an average.

Any help would be Greatly appreciated!!!

I apologize edited in is the code

from bs4 import BeautifulSoup
from lxml import etree
import requests

URL = "https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709"

#HEADERS = you'll need to add your own headers here, won't let post.

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[10]/div[4]/section/div/div/div[2]/div/div/div/div[2]/div/div[2]/div[2]/div[1]/div/div[2]/ul/li[3]/strong')[0].text)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

岁月如刀 2025-02-11 04:43:00

您可以使用CSS选择器,在这种情况下,它更可读。我还将删除一些报价信息,以留下实际价格。

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = {}

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices[i.select_one('.item-title').text] = i.select_one('.price-current').get_text(strip=True)[:-1]

pprint(prices)

价格清单

import requests, re
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = []

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices.append(float(re.sub('\$|,', '', i.select_one('.price-current').get_text(strip=True)[:-1])))

pprint(prices)

You could just use css selectors which, in this instance, are a lot more readable. I would also remove some of the offers info to leave just the actual price.

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = {}

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices[i.select_one('.item-title').text] = i.select_one('.price-current').get_text(strip=True)[:-1]

pprint(prices)

prices as list of floats

import requests, re
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = []

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices.append(float(re.sub('\$|,', '', i.select_one('.price-current').get_text(strip=True)[:-1])))

pprint(prices)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文