如何从 Mechanize::Page 的搜索方法获取 Mechanize 对象?

发布于 2025-01-02 11:02:42 字数 355 浏览 3 评论 0原文

我正在尝试抓取一个只能依靠类和元素层次结构来找到正确节点的网站。但是使用 Mechanize::Page#search 返回 Nokogiri::XML::Element 我无法使用它来填写和提交表单等。

我真的很想使用纯 CSS 选择器,但使用各种 _with 方法匹配类似乎也非常简单。然而,与简单地使用 CSS 选择器相比,匹配像 :not(.class) 这样的东西相当冗长,而我不知道如何匹配元素层次结构。

有没有办法将 Nokogiri 元素转换回 Mechanize 对象,或者更好地直接从 search 方法获取它们?

I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search returns Nokogiri::XML::Elements which I can't use to fill and submit forms etc.

I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with methods too. However, matching things like :not(.class) is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.

Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search method?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

与酒说心事 2025-01-09 11:02:42

正如在此答案中所述,您可以使用 Mechanize::Form 简单地构造一个新的 Mechanize::Form 对象。通过 Mechanize::Page#searchMechanize::Page#at 检索 code>Nokogiri::XML::Element:

a = Mechanize.new
page = a.get 'https://stackoverflow.com/'

# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'

# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page

# Use it!
form.q = 'Foobar'
result = form.submit

注意:您有提供Mechanize 对象和 Mechanize::Page 对象传递给构造函数以便能够提交表单。否则它只是一个没有上下文的 Mechanize::Form 对象。


似乎没有中央实用函数来转换 Nokogiri::XML::Element机械化元素,而是在需要的地方实施转换。因此,编写一个通过 CSS 或 XPath 搜索文档并返回 Mechanize 元素(如果适用)的方法将需要在节点类型上进行相当大的 switch-case。不完全是我想象的那样。

Like stated in this answer you can simply construct a new Mechanize::Form object using your Nokogiri::XML::Element retrieved via Mechanize::Page#search or Mechanize::Page#at:

a = Mechanize.new
page = a.get 'https://stackoverflow.com/'

# Get the search form via ID as a Nokogiri::XML::Element
form = page.at '#search'

# Convert it back to a Mechanize::Form object
form = Mechanize::Form.new form, a, page

# Use it!
form.q = 'Foobar'
result = form.submit

Note: You have to provide the Mechanize object and the Mechanize::Page object to the constructor to be able to submit the form. Otherwise it would just be a Mechanize::Form object without context.


There seems to be no central utility function to convert Nokogiri::XML::Elements to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文