自动浏览器导航和数据提取

发布于 2024-07-27 00:50:17 字数 575 浏览 2 评论 0原文

我正在尝试自动从网站提取数据,但我真的不知道从哪里开始。 我们的供应商之一允许我们通过“Business Objects 11”在线应用程序访问一些设备日志数据。 如果您不熟悉这个在线应用程序,请将其视为基于网络的报告生成器。 问题是我正在尝试监视大量设备,而该供应商仅创建了一次提取一个日志的请求。 该请求需要设备编号、开始日期和结束日期...更糟糕的是,我们只能导出为二进制 Excel 格式,因为“csv”导出已损坏并且他们拒绝修复它...因此我们受到 Excel 65 536 行限制的限制...(在我的情况下,这相当于 3-4 天的数据记录)。 我无法创建新请求,因为只有供应商拥有必要的管理权限。

您认为通过 Web GUI 运行大量请求(大约 800 个)的最优雅的方式是什么? 我想我可以对鼠标位置、单击事件和带有延迟的击键等进行硬编码……但必须有更好的方法。

我读过有关 AutoHotKey 和 AutoIt 脚本的内容,但它们在网络上的功能似乎受到限制。 另外...我被IE6困住了...但是如果你知道一种涉及其他浏览器的方法,我仍然对你的答案很感兴趣。

(一旦我在本地有了日志文件,提取数据就不再是问题了)

I am trying to automate data extraction from a website and I really don't know where to start. One of our suppliers is giving us access to some equipment logging data through a "Business Objects 11" online application. If you are not familiar with this online app, think of it as a web based report generator. The problem is that I am trying to monitor a lot of equipment and this supplier has only created a request to extract one log at a time. This request takes the equipment number, the start date and the end date... To make matters worse, we can only export to the binary Excel format since de "csv" export is broke and they refuse to fix it... hence we are limited by Excel's 65 536 row limitation... (that amounts to 3-4 days of data recording in my case). I can't create a new resquest as only the supplier has the necessary admin rights.

What do you think would be the most elegant way of running a lot of requests (around 800) through a web GUI ? I guess I could hardcode mouse positions, click events, and keystrokes with delays and everything... But there has to be a better way.

I read about AutoHotKey and AutoIt scripting but they seem to be limited as to what they can do on the web. Also... I am stuck with IE6... But if you know a way that involves another browser, I am still very interested in your answer.

(once I have the log files locally, extracting the data is not a problem)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

記憶穿過時間隧道 2024-08-03 00:50:17

您可以尝试一些事情。 如果站点是 html 并且可以通过简单的 POST 或 GET 请求报告,则 urlib/urlib2< /a> 和 cookielib Python 模块应该足以获取 Excel 文档。

然后你可以尝试这个: xlrd 从 Excel 中提取数据。

另外,请查看:http://pamie.sourceforge.net/。 我自己从未尝试过,但看起来很有前途并且易于使用。

There are some things you might try. If the site is a html and reports can be requested by a simple POST or GET then urlib/urlib2 and cookielib Python modules should be enough to fetch an excel document.

Then you can try this: xlrd to extract data from excel.

Also, take a look at: http://pamie.sourceforge.net/. I never tried it myself but looks promising and easy to use.

云醉月微眠 2024-08-03 00:50:17

通常,我建议根本不要使用 IE(或任何浏览器)。 请记住,Web 浏览器软件只是用于发出 http 请求并以有意义的方式显示结果的代理程序。 您还可以通过其他方式发出类似的 http 请求并处理响应。 几乎所有现代语言都将其内置到其 API 中的某个位置。 这称为屏幕抓取或网页抓取。

但为了完成这个建议,我需要更多地了解您的编程环境:即,您打算用什么编程语言编写这个脚本?

使用 C# 的典型示例如下所示:您只需获取字符串形式的 html 结果:

new System.Net.WebClient().DownloadString("http://example.com");

然后解析该字符串以查找所需的任何字段并发送另一个请求。 WebClient 类还有一个 .DownloadFile() 方法,您可能会发现该方法对于检索 Excel 文件很有用。

Normally, I would suggest not to use IE (or any browser) at all. Remember, web browser software are just proxy programs for making http requests and displaying the results in meaningful ways. There are other ways you can make similar http requests and process the responses. Almost every modern language has this built into it's API somewhere. This is called screen scraping or web scraping.

But to complete this suggestion I need to know more about your programming environment: ie, in what programming language do you envision writing this script?

A typical example using C# where you just get the html result as string would look like this:

new System.Net.WebClient().DownloadString("http://example.com");

You then parse the string to find any fields you need and send another request. The WebClient class also have a .DownloadFile() method that you might find useful for retrieving the excel files.

鹤仙姿 2024-08-03 00:50:17

由于您可以使用 .NET,因此您应该考虑使用 Windows 窗体 WebBrowser 控件。 您可以自动导航到站点、按按钮等。加载报告页面后,您可以使用代码导航 HTML DOM 以查找所需的数据 - 无需涉及正则表达式。

几年前我做了类似的事情,从 eBay 提取拍卖数据。

Since you can use .NET, you should consider using the Windows Forms WebBrowser control. You can automate it to navigate to the site, press buttons, etc. Once the report page is loaded, you can use code to navigate the HTML DOM to find the data you want - no regular expressions involved.

I did something like this years ago, to extract auction data from eBay.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文