如何从 Mechanize::Page 的搜索方法获取 Mechanize 对象?
我正在尝试抓取一个只能依靠类和元素层次结构来找到正确节点的网站。但是使用 Mechanize::Page#search
返回 Nokogiri::XML::Element
我无法使用它来填写和提交表单等。
我真的很想使用纯 CSS 选择器,但使用各种 _with
方法匹配类似乎也非常简单。然而,与简单地使用 CSS 选择器相比,匹配像 :not(.class)
这样的东西相当冗长,而我不知道如何匹配元素层次结构。
有没有办法将 Nokogiri 元素转换回 Mechanize 对象,或者更好地直接从 search
方法获取它们?
I'm trying to scrape a site where I can only rely on classes and element hierarchy to find the right nodes. But using Mechanize::Page#search
returns Nokogiri::XML::Element
s which I can't use to fill and submit forms etc.
I'd really like to use pure CSS selectors but matching for classes seems to be pretty straight forward with the various _with
methods too. However, matching things like :not(.class)
is pretty verbose compared to simply using CSS selectors while I have no idea how to match for element hierarchy.
Is there a way to convert Nokogiri elements back to Mechanize objects or even better get them straight from the search
method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如在此答案中所述,您可以使用
Mechanize::Form
简单地构造一个新的Mechanize::Form
对象。通过Mechanize::Page#search
或Mechanize::Page#at
检索 code>Nokogiri::XML::Element:注意:您有提供
Mechanize
对象和Mechanize::Page
对象传递给构造函数以便能够提交表单。否则它只是一个没有上下文的Mechanize::Form
对象。似乎没有中央实用函数来转换
Nokogiri::XML::Element
机械化元素,而是在需要的地方实施转换。因此,编写一个通过 CSS 或 XPath 搜索文档并返回 Mechanize 元素(如果适用)的方法将需要在节点类型上进行相当大的 switch-case。不完全是我想象的那样。Like stated in this answer you can simply construct a new
Mechanize::Form
object using yourNokogiri::XML::Element
retrieved viaMechanize::Page#search
orMechanize::Page#at
:Note: You have to provide the
Mechanize
object and theMechanize::Page
object to the constructor to be able to submit the form. Otherwise it would just be aMechanize::Form
object without context.There seems to be no central utility function to convert
Nokogiri::XML::Element
s to Mechanize elements but rather the conversions are implemented where they are needed. Consequently, writing a method that searches the document by CSS or XPath and returns Mechanize elements if applicable would require a pretty big switch-case on the node type. Not exactly what I imagined.