使用框架自动化表单和抓取网站(使用 Mechanize)

发布于 2024-11-26 15:51:28 字数 546 浏览 3 评论 0原文

我正在尝试将数据输入到表单中,然后在使用框架的网站上抓取结果。我一直在使用 Mechanize (ruby gem) 将数据输入到表单中,这很好。问题是 Mechanize 将框架视为链接,并“加载”框架并“查看”其中包含的表单,您需要“单击”框架并像加载单独的 HTML 页面一样加载页面。

由于该网站使用单独的框架进行身份验证、搜索表单和结果,因此我无法单击框架、填写表单,然后进入结果框架以查看表单生成的数据,因为我被困在框架中点击进入。如果我尝试通过加载原始 URL 来返回,我就会丢失在上一帧中所做的操作。

如果有一个应用程序可以加载所有框架的所有内容而无需单击它们,那就完美了。我还没找到。

有没有办法使用 ruby​​ 或任何执行与加载框架的 Mechanize(并与 nokogiri 一起使用)相同功能的应用程序来做到这一点?

I am trying to input data into a form and then scrape the results on a site using frames. I've been using Mechanize (ruby gem) for inputting data into the forms, which is fine. The problem is that Mechanize treats frames as links, and to "load" the frames and "see" the forms contained therein, you need to "click" the frames and load the pages like a separate HTML page.

Since this site uses separate frames for authentication, search forms, and results, I can't click on frames, fill in forms, and then get to the resulting frames to see the data that the forms generate since I am stuck in the frame I click into. If I try to go back by loading the original URL, I loose what I did in the previous frame.

If there is an app that loads all the content from all the frames without having to click on them, that would be perfect. I haven't found one yet.

Is there a way to do this using ruby, or any app that performs the same functions as Mechanize (and works with nokogiri) that loads frames?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

他是夢罘是命 2024-12-03 15:51:28

Mechanise 对会话有一些支持,如果您点击登录页面,然后调用 back() 并点击搜索页面,网站是否仍然保持登录状态?

过去,当表单让我感到沮丧时,我经常求助于使用 LiveHTTPHeaders (或类似的插件)来检测登录和搜索时正在执行的 POST,然后执行这些操作而不需要浏览页面本身。

但我不确定这与身份验证的配合效果如何。

Mechanise has some support for sessions, does the website not still keep you logged in if you click to the login page, then call back() and click to the search page?

When forms have frustrated me in the past, I've often resorted to using LiveHTTPHeaders (or a similar plugin) to detect the POSTs that are being carried out when logging in and searching, and then performing those without going through the pages themselves.

I'm not sure how well that will work with the authentication though.

最舍不得你 2024-12-03 15:51:28

为了详细说明 Ben 的回应,我想我应该发布我的解决方案来解决 Mechanize 无法访问框架然后导航回框架的问题,因为对于我的特定网站,当您导航回来时它会取消身份验证。他使用回调()的解决方案可能适用于大多数网站,但我最终采取了不同的路线。

我使用 Firewatir 通过 Firefox 浏览器将数据传递到表单。访问框架中元素的代码如下所示:

    b.frame(:name, "frame_name").field_type(:name, "field_name").action

由于在这种情况下您不必导航到框架,因此在来回导航时不必担心取消身份验证或依赖框架重新加载。尽管 Mechanize 是一个有用的工具,但我发现在上述条件下使用框架时 Firewatir 是更好的选择。

To elaborate on Ben's response, I thought I would post my solution to the problem of Mechanize not being able to access frames and then navigate back to a frame since for my particular site it deauthenticates when you navigate back. His solution of using the call back() probably works for most sites, but I ended up taking a different route in the meantime.

I used Firewatir to pass data to forms through the Firefox browser. The code to access an element in a frame looks like this:

    b.frame(:name, "frame_name").field_type(:name, "field_name").action

Since you don't have to navigate to a frame in this situation, you don't have to worry about deauthentication or dependent frames reloading when you are navigating back and forth. Although Mechanize is a useful tool, I found Firewatir to be the better option when working with frames when the conditions are as stated above.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文