使用 Wget 从需要设置 cookie 的站点下载 PDF 文件

发布于 2024-11-29 18:06:48 字数 1823 浏览 3 评论 0原文

我想访问报纸网站,然后下载他们的电子纸副本(PDF 格式)。该网站要求我使用我的电子邮件地址和密码登录,然后它允许我访问这些 PDF URL。

我在 wget 中“设置会话”时遇到问题。当我从浏览器登录该网站时,它设置了两个 cookie 值:

[email protected]
Password=12345

我尝试过:

wget --post-data "[email protected]&Password=12345" http://epaper.abc.com/login.aspx

但是,刚刚下载了登录页面并将其保存在本地

登录页面上的表单有两个字段:

txtUserID
txtPassword

和单选按钮如下:

<input id="rbtnManchester" type="radio" checked="checked" name="txtpub" value="44">

另一个按钮:

<input id="rbtnLondon" type="radio" name="txtpub" value="64">

如果我将其发布到 login.aspx 页面,我得到相同的输出

wget --post-data "[email protected]&txtPassword=12345&txtpub=44" http://epaper.abc.com/login.aspx

如果我这样做:

--save-cookies abc_cookies.txt

它似乎除了默认内容之外没有任何内容。

最后,如果我执行 --debug ,它也会说:

...
Set-Cookie: ASP.NET_SessionId=05kphcn4hjmblq45qgnjoe41; path=/; HttpOnly
...
Stored cookie epaper.abc.com -1 (ANY) / <session> <insecure> [expiry none] ASP.NET_SessionId 05kphcn4hjmblq45qgnjoe41
Length: 107253 (105K) [text/html]
Saving to: `login.aspx'
...
Saving cookies to abc_cookies.txt.

但是,abc_cookies.txt 仅显示以下内容:

# HTTP cookie file.
# Generated by Wget on 2011-08-16 08:03:05.
# Edit at your own risk.

I want to access a newspaper site and then download their epaper copies (in PDF). The site requires me to login using my email address and password and then it permits me to access those PDF URLs.

I'm having trouble 'setting my session' in wget. When I login into the site from my browser, it sets two cookie values:

[email protected]
Password=12345

I tried:

wget --post-data "[email protected]&Password=12345" http://epaper.abc.com/login.aspx

However, that just downloaded the login page and saved it locally

The FORM on the login page has two fields:

txtUserID
txtPassword

and radiobuttons like this:

<input id="rbtnManchester" type="radio" checked="checked" name="txtpub" value="44">

Another button:

<input id="rbtnLondon" type="radio" name="txtpub" value="64">

If I post this to the login.aspx page, I get the same output

wget --post-data "[email protected]&txtPassword=12345&txtpub=44" http://epaper.abc.com/login.aspx

If I do:

--save-cookies abc_cookies.txt

it doesnt seem to have anything other than the default content.

For the last if I do --debug as well it says:

...
Set-Cookie: ASP.NET_SessionId=05kphcn4hjmblq45qgnjoe41; path=/; HttpOnly
...
Stored cookie epaper.abc.com -1 (ANY) / <session> <insecure> [expiry none] ASP.NET_SessionId 05kphcn4hjmblq45qgnjoe41
Length: 107253 (105K) [text/html]
Saving to: `login.aspx'
...
Saving cookies to abc_cookies.txt.

However, abc_cookies.txt shows ONLY the following:

# HTTP cookie file.
# Generated by Wget on 2011-08-16 08:03:05.
# Edit at your own risk.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花伊自在美 2024-12-06 18:06:48

只是一个建议,您是否尝试使用查询字符串变量(显然不太安全)?

wget "http://epaper.abc.com/[email protected]&Password=12345"

您可能必须转义特殊字符,具体取决于您的 shell/操作系统。

Just a suggestion, did you try using querystring variables (not too secure, obviously)?

wget "http://epaper.abc.com/[email protected]&Password=12345"

You might have to escape the special characters depending on your shell / OS.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文