登录后在 PHP 中进行屏幕抓取
四处寻找解决方案,我发现了不同的方法。有些使用正则表达式,有些使用 DOM 脚本或其他东西。
我想要访问一个网站,登录,填写表格,然后检查表格是否已发送。登录部分是我找不到任何内容的部分。
有人知道一个简单的方法来做到这一点吗?
Looking around for a solution to this, I have found different methods. Some use regex, some use DOM scripting or something.
I want to go to a site, log in, fill out a form and then check if the form sent. The logging in part is the part I can't find anything on.
Anyone know of an easy way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我同意莱斯的观点。 Curl + Charles(或 Fiddler、Firefox 的 Tamper Data 扩展、wireshark 等)是我一直这样做的方式。我发现的一个技巧是,某些站点需要三个步骤的过程:
不要指望curl 的cookie jar 和cookie 文件有多大帮助。您可能最好使用简单的正则表达式从标头中解析出会话 ID 和 cookie。
希望这有帮助!
I'd agree with Les. Curl + Charles (or Fiddler, Firefox's Tamper Data extension, wireshark, etc.) is the way I've always done this. The one trick I've found is that some sites require a three step process:
Don't plan on curl's cookie jar and cookie file being much help. You'll probably be best off parsing out the session id and cookies from the headers using a simple regex.
Hope this helps!
如果您需要执行大量 GUI 操作,那么使用某种可编写脚本的浏览器可能会更好。如果您需要使用PHP,请查看curl:https://www.php.net/curl
You might be better off with some sort of scriptable browser if you need to do a lot of GUI stuff. If you need to use PHP, check out curl: https://www.php.net/curl
我通常做的就是启动 charles 在浏览器中完成登录过程并记录原始请求。复制+粘贴请求并通过 fopen 或 curl (根据响应进行一些小的调整)。
what I usually do is fire up charles go through the login process in a browser and record the raw requests. Copy+paste the requests and throw them through fopen or curl (with some small adjustments according to the responses).
您可能想看看 Perl 的 LWP 库(我知道它不是 PHP,但它对于屏幕抓取、Web 单元测试等非常有用):
You may want to take a look at Perl's LWP library (I know it isn't PHP, but it's very useful for screen scraping, web unit testing, and such):
我在这方面有相当多的经验。我曾经使用过 Curl,但使用它并不有趣。特别是很多时候站点会交换 XSRF 令牌,或传递隐藏变量,或设置各种 cookie。使用 Curl 跟踪所有这些变得很困难。至少对我来说。
然后我探索了 Selenium,我喜欢它。有两件事 - 1)安装 Selenium IDE(仅适用于 Firefox)。 2) 安装 Selenium RC 服务器
启动 Selenium IDE 后,转到您尝试自动化的站点并开始记录您在该站点上执行的事件。将其视为在浏览器中录制宏。然后,您将获得所需语言的代码输出。
正如您所知,Browsermob 使用 Selenium 进行负载测试和在浏览器上自动执行任务。
我上传了一份我前段时间做的ppt。这应该可以节省您大量的时间 - http://www.4shared.com/get /tlwT3qb_/SeleniumInstructions.html
在上面的链接中选择常规下载选项。
I have fair bit of experience in this. I used to use Curl but it is no fun using it. In particular many times sites exchange XSRF tokens, or pass hidden variables, or set all kinds of cookies. Tracking all this with Curl becomes difficult. Atleast for me.
I then explored Selenium and I love it. There are 2 things- 1) install Selenium IDE (works only in Firefox). 2) Install Selenium RC Server
After starting Selenium IDE, go to the site that you are trying to automate and start recording events that you do on the site. Think it as recording a macro in the browser. Afterwards, you get the code output for the language you want.
Just so you know Browsermob uses Selenium for load testing and for automating tasks on browser.
I've uploaded a ppt that I made a while back. This should save you a good amount of time- http://www.4shared.com/get/tlwT3qb_/SeleniumInstructions.html
In the above link select the option of regular download.