python mechanize find_link 找到最终匹配的链接

发布于 2024-10-28 12:25:15 字数 384 浏览 0 评论 0原文

我有一个页面,其中有 >=1 个链接,文本中带有“显示费用”。 我可以找到第一个此类链接,

firstLink = br.find_link(text_regex=re.compile("Display charges"),nr=0)

我希望能够找到最终链接。我希望这会起作用

lastLink = br.find_link(text_regex=re.compile("Display charges"),nr=-1)

,但在只有一个匹配链接的情况下,它会失败。

请注意:Python 和 mechanize 初学者,但发现了帮助(mechanize.Browser),这是一个很大的突破:)

I've got a page with >=1 links with "Display charges" in the text.
I can find the first such link with

firstLink = br.find_link(text_regex=re.compile("Display charges"),nr=0)

I'd love to be able to find the final link. I hoped this would work

lastLink = br.find_link(text_regex=re.compile("Display charges"),nr=-1)

but in the case of only one matching link, it's failing.

Please note: Python and mechanize beginner but have discovered help(mechanize.Browser) which was a big breakthrough :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

追我者格杀勿论 2024-11-04 12:25:15

您可以使用 br.links() 生成所有此类链接,然后使用 list(...)[-1] 选取最后一个:

lastLink = list(br.links(text_regex=re.compile("Display charges")))[-1]

例如:

In [29]: import mechanize

In [30]: import re

In [31]: br=mechanize.Browser()

In [32]: br.open('http://www.example.com')
Out[32]: <response_seek_wrapper at 0xa2b59ec whose wrapped object = <closeable_response at 0xa2b554c whose fp = <socket._fileobject object at 0xa3143ac>>>

In [33]: br.links()
Out[33]: <generator object __call__ at 0xa289af4>

In [34]: list(br.links())
Out[34]: 
[Link(base_url='http://www.iana.org/domains/example/', url='/', text='Homepage[IMG]', tag='a', attrs=[('href', '/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/domains/', text='Domains', tag='a', attrs=[('href', '/domains/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/numbers/', text='Numbers', tag='a', attrs=[('href', '/numbers/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/protocols/', text='Protocols', tag='a', attrs=[('href', '/protocols/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About IANA', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/go/rfc2606', text='RFC 2606', tag='a', attrs=[('href', '/go/rfc2606')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/domains/', text='Domains', tag='a', attrs=[('href', '/domains/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/protocols/', text='Protocols', tag='a', attrs=[('href', '/protocols/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/numbers/', text='Number Resources', tag='a', attrs=[('href', '/numbers/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='http://www.icann.org/', text='Internet Corporation for Assigned Names and Numbers', tag='a', attrs=[('href', 'http://www.icann.org/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='mailto:[email protected]?subject=General%20website%20feedback', text='[email protected]', tag='a', attrs=[('href', 'mailto:[email protected]?subject=General%20website%20feedback')])]

In [35]: list(br.links(text_regex=re.compile("About")))
Out[35]: 
[Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About IANA', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About', tag='a', attrs=[('href', '/about/')])]

You could use br.links() to generate all such links, then use list(...)[-1] to pick off the last one:

lastLink = list(br.links(text_regex=re.compile("Display charges")))[-1]

For example:

In [29]: import mechanize

In [30]: import re

In [31]: br=mechanize.Browser()

In [32]: br.open('http://www.example.com')
Out[32]: <response_seek_wrapper at 0xa2b59ec whose wrapped object = <closeable_response at 0xa2b554c whose fp = <socket._fileobject object at 0xa3143ac>>>

In [33]: br.links()
Out[33]: <generator object __call__ at 0xa289af4>

In [34]: list(br.links())
Out[34]: 
[Link(base_url='http://www.iana.org/domains/example/', url='/', text='Homepage[IMG]', tag='a', attrs=[('href', '/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/domains/', text='Domains', tag='a', attrs=[('href', '/domains/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/numbers/', text='Numbers', tag='a', attrs=[('href', '/numbers/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/protocols/', text='Protocols', tag='a', attrs=[('href', '/protocols/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About IANA', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/go/rfc2606', text='RFC 2606', tag='a', attrs=[('href', '/go/rfc2606')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/domains/', text='Domains', tag='a', attrs=[('href', '/domains/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/protocols/', text='Protocols', tag='a', attrs=[('href', '/protocols/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/numbers/', text='Number Resources', tag='a', attrs=[('href', '/numbers/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='http://www.icann.org/', text='Internet Corporation for Assigned Names and Numbers', tag='a', attrs=[('href', 'http://www.icann.org/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='mailto:[email protected]?subject=General%20website%20feedback', text='[email protected]', tag='a', attrs=[('href', 'mailto:[email protected]?subject=General%20website%20feedback')])]

In [35]: list(br.links(text_regex=re.compile("About")))
Out[35]: 
[Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About IANA', tag='a', attrs=[('href', '/about/')]),
 Link(base_url='http://www.iana.org/domains/example/', url='/about/', text='About', tag='a', attrs=[('href', '/about/')])]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文