如何下载安全网页
我希望以编程方式下载需要登录才能查看的网页。有什么明智的方法可以做到这一点吗?通过查看 HTTP 标头等,我可以看到用户名/密码作为 POST 数据传递,但请求附加此信息的页面还不够好。我认为cookie也参与其中,看起来它们包含某种加密的授权数据。
有什么办法可以伪造这个吗?语言在这里并不是太重要,但是像 Perl 这样可以相对轻松地在 Linux 上运行的东西就更好了。或者也许可以编写命令行浏览器脚本?
I wish to programmatically download a webpage which requires a log in to view. Is there any sane way of doing this? By looking at HTTP headers and such, I can see the username / password being passed as POST data, but requesting a page with this info attached isn't good enough. I think cookies are involved too, and it looks like they contain some kind of encrypted authorisation data.
Is there any way of faking this? Language isn't too important here, but something like Perl that can be run on Linux with relative ease would be nice. Or maybe a command line browser could be scripted?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,您可以通过
curl
命令行工具或 CURL 库来执行此操作。您需要弄清楚 cookie 中应该包含什么内容,然后使用curl
的-b
选项或等效的 CURL API 传递它们。您还可以通过 CURL 执行 HTTP 基本身份验证。
如果页面确实很复杂,则必须预先进行 HTML 解析甚至 JS 解释来提取 cookie 数据。这仍然是可行的,但仅使用 CURL 是不行的。
一般而言,Web 浏览器可以执行的任何操作都可以编写脚本。图灵完备性等等。 BlueSocket 所销售的“无法编写脚本”的强制门户网站纯属胡言乱语。它们基本上只是混淆的网页。他们会减慢您的速度,但永远无法阻止您 - 他们必须给您钥匙才能工作!
Yes, you can do this via the
curl
command-line tool or the CURL library. You need to figure out what's supposed to be in the cookies, and then pass them withcurl
's-b
option or the equivalent CURL API.You can also perform HTTP Basic authentication via CURL.
If the page is really sophisticated, you'll have to do HTML parsing or even JS interpretation to extract the cookie data beforehand. That's still doable, but not with CURL alone.
As a general note, anything a web browser can do can be scripted. Turing-completeness and all that. "Unscriptable" captive portals like BlueSocket sells are a load of bunk; they're basically just obfuscated web pages. They'll slow you down but can never, ever stop you - they have to give you the keys in order to work!
PHP 的 CURL 可以做到这一点。另请检查此处 该解决方案是否适合您。
Php's CURL would do it. Also check here if this solution is right for you.