如何使用LXML作为字符串中的路径中的HREF属性返回?

发布于 2025-02-09 18:03:24 字数 1156 浏览 2 评论 0 原文

我有工作代码,可以

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a/text()'

从网站

使用脚本:

import requests

from lxml import html

boxScore = "CHA/CHA202206200"

url = "https://www.baseball-reference.com/boxes/" + boxScore + ".shtml"

page = requests.get(url)

tree = html.fromstring(b''.join(line for line in page.content.splitlines() if b'<!--' not in line and b'-->' not in line))

getTeams = tree.xpath('//*[@class="scorebox"]/div/div/strong/a/text()')

for team in getTeams:

team = team.replace(" ", "")

stringy = '"all_' + team + 'pitching"'

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/text()'


tambellini = tree.xpath(stringx)

print(tambellini)

问题是我不想打印此文本,我想打印其中一条路径。这意味着我或多或少正在尝试进入

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a'

,然后在/a中值HREF(在这种情况下为href =

-一个元素,但我不知道如何作为变量访问路径本身。

I have working code that prints element

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a/text()'

From the site https://www.baseball-reference.com/boxes/CHA/CHA202206200.shtml

Using the script:

import requests

from lxml import html

boxScore = "CHA/CHA202206200"

url = "https://www.baseball-reference.com/boxes/" + boxScore + ".shtml"

page = requests.get(url)

tree = html.fromstring(b''.join(line for line in page.content.splitlines() if b'<!--' not in line and b'-->' not in line))

getTeams = tree.xpath('//*[@class="scorebox"]/div/div/strong/a/text()')

for team in getTeams:

team = team.replace(" ", "")

stringy = '"all_' + team + 'pitching"'

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/text()'


tambellini = tree.xpath(stringx)

print(tambellini)

The problem is I do not want to print this text, I want to print one of the paths. Meaning I more or less am trying to get to

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a'

And then that value href in /a (which in this case is href=-"/players/b/berrijo01.shtml"

Any guidance here would be helpful. I know how to successfully print an element, but I don't know how to access the path itself as a variable. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

优雅的叶子 2025-02-16 18:03:24

将字符串X更改为

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/@href'

应该输出

[
  '/players/l/lynnla01.shtml', 
  '/players/l/lopezre01.shtml', 
  '/players/g/graveke01.shtml', 
  '/players/k/kellyjo05.shtml'
]

Change the stringx to

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/@href'

This should output

[
  '/players/l/lynnla01.shtml', 
  '/players/l/lopezre01.shtml', 
  '/players/g/graveke01.shtml', 
  '/players/k/kellyjo05.shtml'
]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文