LXML抓住所有共享某个XPath的项目

发布于 2025-02-04 04:43:00 字数 1027 浏览 2 评论 0原文

我正在尝试使用XPath从网站上获取所有价格。所有价格都具有相同的XPATH，只有[0]，或者我认为第一个项目有效...让我向您展示：

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[1]/div[5]/div/div/div/div[1]/ul/li[1]/article/div[1]/div[2]/div')[0].text)

这成功打印出第一个价格！！！我尝试将“ [0] .Text”更改为1，以打印第二项，但它返回了“超出范围”。然后，我试图考虑一些可以打印所有物品的循环，因此我可以创建一个平均值。

任何帮助将不胜感激！！！

代码

我深表歉意编辑的是BS4的从LXML导入导入请求

url =“ https://www.newegg.com/p/pl?d=gpu&n=601357247247%20100007709”

#headers =您需要在此处添加自己的标头，不会让发布。

webpage = requests.get（url，headers =标题）

soup = beautifutsoup（webpage.content，“ html.parser”）

dom = eTree.html（str（str（str））

print（dom.xpath（'/html/html/html/hody/div） [10]/div [4]/pection/div/div/div/div [2]/div/div/div/div/div/div/div/div/div/div [2]/div [2]/div [1]/div/div/div/div/div/div/div/div/div/div/ div [2]/ul/li [3]/strong'）[0] .Text）

原文

I'm trying to grab all prices from a website, using the xpath. all prices have the same xpath, and only [0], or I assume the 1st item works... let me show you:

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[1]/div[5]/div/div/div/div[1]/ul/li[1]/article/div[1]/div[2]/div')[0].text)

This successfully prints the 1st price!!!
I tried changing "[0].text" to 1, to print the 2nd item but it returned "out of range".
Then I was trying to think of some For loop that would print All Items, so I could create an average.

Any help would be Greatly appreciated!!!

I apologize edited in is the code

from bs4 import BeautifulSoup
from lxml import etree
import requests

URL = "https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709"

#HEADERS = you'll need to add your own headers here, won't let post.

webpage = requests.get(URL, headers=HEADERS)

soup = BeautifulSoup(webpage.content, "html.parser")

dom = etree.HTML(str(soup))

print(dom.xpath('/html/body/div[10]/div[4]/section/div/div/div[2]/div/div/div/div[2]/div/div[2]/div[2]/div[1]/div/div[2]/ul/li[3]/strong')[0].text)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月如刀 2025-02-11 04:43:00

您可以使用CSS选择器，在这种情况下，它更可读。我还将删除一些报价信息，以留下实际价格。

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = {}

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices[i.select_one('.item-title').text] = i.select_one('.price-current').get_text(strip=True)[:-1]

pprint(prices)

价格清单

import requests, re
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = []

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices.append(float(re.sub('\$|,', '', i.select_one('.price-current').get_text(strip=True)[:-1])))

pprint(prices)

You could just use css selectors which, in this instance, are a lot more readable. I would also remove some of the offers info to leave just the actual price.

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = {}

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices[i.select_one('.item-title').text] = i.select_one('.price-current').get_text(strip=True)[:-1]

pprint(prices)

prices as list of floats

import requests, re
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = []

for i in soup.select('.item-container'):
    if a:=i.select_one('.price-current-num'): a.decompose()
    prices.append(float(re.sub('\$|,', '', i.select_one('.price-current').get_text(strip=True)[:-1])))

pprint(prices)

回复收藏 0 原文

~没有更多了~