登录后在 PHP 中进行屏幕抓取

发布于 2024-08-11 02:05:09 字数 137 浏览 2 评论 0原文

四处寻找解决方案,我发现了不同的方法。有些使用正则表达式,有些使用 DOM 脚本或其他东西。

我想要访问一个网站,登录,填写表格,然后检查表格是否已发送。登录部分是我找不到任何内容的部分。

有人知道一个简单的方法来做到这一点吗?

Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something.

I want to go to a site, log in, fill out a form and then check if the form sent. The logging in part is the part I can't find anything on.

Anyone know of an easy way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

万劫不复 2024-08-18 02:05:09

我同意莱斯的观点。 Curl + Charles(或 Fiddler、Firefox 的 Tamper Data 扩展、wireshark 等)是我一直这样做的方式。我发现的一个技巧是,某些站点需要三个步骤的过程:

  1. 首先使用 GET 请求点击登录页面以获取任何会话 id、cookie 和/或必需字段(例如 .net 站点具有 __VIEWSTATE 和 __EVENTVALIDATION)。
  2. 获得这些值后,您就可以发布到登录页面。
  3. 最后,请求您想要的任何资源。

不要指望curl 的cookie jar 和cookie 文件有多大帮助。您可能最好使用简单的正则表达式从标头中解析出会话 ID 和 cookie。

希望这有帮助!

I'd agree with Les. Curl + Charles (or Fiddler, Firefox's Tamper Data extension, wireshark, etc.) is the way I've always done this. The one trick I've found is that some sites require a three step process:

  1. Hit the login page with a GET request first to get any session ids, cookies, and/or required fields (e.g. .net sites have __VIEWSTATE and __EVENTVALIDATION).
  2. Once you have these values, then you post to the login page
  3. Finally, request whatever resource you're after.

Don't plan on curl's cookie jar and cookie file being much help. You'll probably be best off parsing out the session id and cookies from the headers using a simple regex.

Hope this helps!

吻安 2024-08-18 02:05:09

如果您需要执行大量 GUI 操作,那么使用某种可编写脚本的浏览器可能会更好。如果您需要使用PHP,请查看curl:https://www.php.net/curl

You might be better off with some sort of scriptable browser if you need to do a lot of GUI stuff. If you need to use PHP, check out curl: https://www.php.net/curl

生生漫 2024-08-18 02:05:09

我通常做的就是启动 charles 在浏览器中完成登录过程并记录原始请求。复制+粘贴请求并通过 fopencurl (根据响应进行一些小的调整)。

what I usually do is fire up charles go through the login process in a browser and record the raw requests. Copy+paste the requests and throw them through fopen or curl (with some small adjustments according to the responses).

音栖息无 2024-08-18 02:05:09

您可能想看看 Perl 的 LWP 库(我知道它不是 PHP,但它对于屏幕抓取、Web 单元测试等非常有用):

You may want to take a look at Perl's LWP library (I know it isn't PHP, but it's very useful for screen scraping, web unit testing, and such):

末骤雨初歇 2024-08-18 02:05:09

我在这方面有相当多的经验。我曾经使用过 Curl,但使用它并不有趣。特别是很多时候站点会交换 XSRF 令牌,或传递隐藏变量,或设置各种 cookie。使用 Curl 跟踪所有这些变得很困难。至少对我来说。

然后我探索了 Selenium,我喜欢它。有两件事 - 1)安装 Selenium IDE(仅适用于 Firefox)。 2) 安装 Selenium RC 服务器

启动 Selenium IDE 后,转到您尝试自动化的站点并开始记录您在该站点上执行的事件。将其视为在浏览器中录制宏。然后,您将获得所需语言的代码输出。

正如您所知,Browsermob 使用 Selenium 进行负载测试和在浏览器上自动执行任务。

我上传了一份我前段时间做的ppt。这应该可以节省您大量的时间 - http://www.4shared.com/get /tlwT3qb_/SeleniumInstructions.html

在上面的链接中选择常规下载选项。

I have fair bit of experience in this. I used to use Curl but it is no fun using it. In particular many times sites exchange XSRF tokens, or pass hidden variables, or set all kinds of cookies. Tracking all this with Curl becomes difficult. Atleast for me.

I then explored Selenium and I love it. There are 2 things- 1) install Selenium IDE (works only in Firefox). 2) Install Selenium RC Server

After starting Selenium IDE, go to the site that you are trying to automate and start recording events that you do on the site. Think it as recording a macro in the browser. Afterwards, you get the code output for the language you want.

Just so you know Browsermob uses Selenium for load testing and for automating tasks on browser.

I've uploaded a ppt that I made a while back. This should save you a good amount of time- http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html

In the above link select the option of regular download.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文