找不到Python网站的正确XPath

发布于 2025-02-08 18:38:36 字数 1045 浏览 2 评论 0原文

我正在尝试从 https: //www.reuters.com/markets/companies/aali.jk/key-metrics/per-share-data 对于EPS的数据,XPath中的EPS标准化(年度)1,010.14请单击“ cybe scortshot”

https://i.sstatic.net/gnfrn.png 从Inspect中,我使用XPATH如下所示,但返回['1,833.60','741.35','113.12','1,702.00','1.0498','1.2223','1.2223','0.0074' '1.667','-0.029','2.493','-0.007','0.42','+0.195','3,674.84','3,438.46','3,438.46','7,016.25正确的数据。

from lxml import html
import requests

page= requests.get('https://www.reuters.com/markets/companies/AALI.JK/key-metrics/per-share-data')
data = html.fromstring(page.content)
data.xpath('//td/span/text()')

如何在强大的代码中获得EPS标准化(年度)1,010.14

I am trying to get the data from https://www.reuters.com/markets/companies/AALI.JK/key-metrics/per-share-data
for data of the EPS Normalized (Annual) 1,010.14 in xpath.
Kindly click for the inspect screenshot

https://i.sstatic.net/gnFrN.png
From inspect, I use xpath as below but return ['1,833.60', '741.35', '113.12', '1,702.00', '1.0498', '1.2223', '0.0074', '0.1489', '3.231', '1.667', '-0.029', '2.493', '-0.007', '0.42 ', '+0.195', '3,674.84', '3,438.46', '7,016.25', '25,963.00'] which is not the correct data.

from lxml import html
import requests

page= requests.get('https://www.reuters.com/markets/companies/AALI.JK/key-metrics/per-share-data')
data = html.fromstring(page.content)
data.xpath('//td/span/text()')

How may I get EPS Normalized (Annual) 1,010.14 in a robust code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夏有森光若流苏 2025-02-15 18:38:36

本网站使用JavaScript加载其内容,

因此,如果您要使用Selenium,则可以使用此

搜索th标记文本“ EPS BASIC DERVEN。父/..,您将获得其余的

// th [contains(text(),“ eps basic dept。


如果不是,并且想处理请求,以下是一个临时解决方案

import requests, re
from lxml import html

page = requests.get('https://www.reuters.com/markets/companies/AALI.JK/key-metrics/per-share-data')

data = html.fromstring(page.content)

txt = data.xpath('//*[@id="fusion-metadata"]/text()')[0]

eps_norm_annual = float(re.search('"eps_normalized_annual":"([0-9.]+)"', txt).group(1))

This site uses JavaScript to load its content,

so if you're going to use selenium you can use this

Search for the th tag with the text "EPS Basic Excl. Extra Items (Annual)" then get the parent /.. and you get the rest

//th[contains(text(), "EPS Basic Excl. Extra Items (Annual)")]/../td/span


if not and want to work with requests, here is a temporary solution

import requests, re
from lxml import html

page = requests.get('https://www.reuters.com/markets/companies/AALI.JK/key-metrics/per-share-data')

data = html.fromstring(page.content)

txt = data.xpath('//*[@id="fusion-metadata"]/text()')[0]

eps_norm_annual = float(re.search('"eps_normalized_annual":"([0-9.]+)"', txt).group(1))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文