wget 是 DL“downloading.aspx”;而不是“helloworld.doc”?

发布于 2024-10-09 12:38:39 字数 4039 浏览 0 评论 0原文

我为 wget 创建了两个文件: log.txt 和 docs.txt。

LOG.txt:

    --2010-12-27 23:17:12--  http://www.xyz.dk/docs/Getpaper.aspx?id=133337
Resolving www.xyz.com... 194.152.xx.xxx
Connecting to www.xyz.com|194.152.xx.xxx|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /Members/Login.aspx?uri=%2fdocs%2fGetpaper.aspx%3fid%3d133337 [following]
--2010-12-27 23:17:13--  http://www.xyz.com/Members/Login.aspx?uri=%2fdocs%2fGet$
Reusing existing connection to www.xyz.dk:80.
HTTP request sent, awaiting response... 200 OK


   Length: 22162 (22K) [text/html]
    Saving to: `Getpaper.aspx?id=133337'

         0K .......... .......... .                               100%  131K=0,2s

    2010-12-27 23:17:13 (131 KB/s) - `Getpaper.aspx?id=133337' saved [22162/22162]

FINISHED --2010-12-27 23:17:13--
Downloaded: 1 files, 22K in 0,2s (131 KB/s)

docs.txt 包含我要下载的文档的确切链接:例如

http://www.xyz.com/Docs/Download.aspx?id=133337

wget cmd: wget -c -o /home/repsak/project/log.txt -i /home/repsak/project/files.txt --load-cookies cookies.txt &

如果我将该链接复制到 Firefox,它会立即将该文档保存到我的硬盘上。

为什么我无法使用 wget 下载文件?

是因为这些文档包含了一些不同的 javascript 或其他东西吗?

<script src="/JS/SessionKeepAlive.js" type="text/javascript"></script><script type="text/javascript">rootPath = '/';</script><script src="/JS/global.js" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=AJEskFl7ncVJKI8lj-G6W4sh9UGvD53tzD78i10-xRRoHxUqVDJRIljLnDg0DEOQRGYBjddkMr2m5q0TwJMrPlh2lCGMwy6AbdKWsBMN3um5o4LnDIcgMmg4eL168e7m3B43U83ZbaGc8s_xVt_vZ2hFvfwQMI0SKfKljmKnwNaVA8FQ0&amp;t=2610f696" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=rrofmak1_XsHrfW-6KRAeAdeT0EQDJa_q326hm5CN4J0GBPhTP11DXVt2G_-CIqhw_AA2r6ANXbqrNBAfZJb87vm9NR16ygl56psOrqjJelzOVuoWI0lSRCCDSP-4J5YJvgxzae_vsLvVJpMHrQ7mXX8_hI1&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=xJYsnwNJqSs6T1dY1nb07O05PCnCv0HENyX8m8oYEpsDloRSVSOdAx2cDOoU25vJhhL8LPwNLRO05Ulu2LIX_37t_cIxCJEobVObp6psUp6DSGmAF0PYpHNyMJzhHg_vUB-QYAsvFcfw6L2JtytlVP2VdrU1&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=FEHpzPdSdYxI2gNT2bbhrwdiJC8UROU832G8EAnbu2x7ZhR5aHCI5gjoLx78FweIGYgh25_PhqZh8pJTkP-Lje-U1nV_YbChpYqSa_Xr0dJziG_pcM9-dscW_4SZOmuS9BJrt1XWtwpxC3ojfQpgFkjvMOM1&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=LMfuWgyYbfToT4AgZFNAGy1lJGHuFpVQsof2PGpAYw3XskDgdZvPcmO4fiqhPgD_Vi435ME1eUBlcv6AZmiY9FidHgAcyhXENHzwrFShjiIALwQTXY5RKXj7HMPZCxSbJ0Cfwgm4Ui03QN33_WpQWHrsuTDHCihktG9HezEENpgLTl-70&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=Mv3loikajFWni9j8CFnlwfW8Axn-lseUIp6tmfkC39-ni6FDU6ysiEhZMCZuzzXVNbXgsmxORQPQMpRI95K7JYp9fVr0FnSeWIkZJwqH_kJi0XG6K51DJIZZss513XX_cJ4bWJyyfNxFjhC1vX1ppFxpXGeb5otT2SRFS6Poz5FXo3aH0&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=Z4q30yfKzHtiFQj5EoeFk6Gjm42O-f-IQalIUwEu9yg-i18Twv_g-ZWXS7yH8Zkccsa1131CCYafYIOA8kV121H6c3gJhGDAmK7rsFyZqQckD1t1J8dNgnYu4SVsHDfwVs62Kq1DtBRqOThGJplgrMusl-vWwm7aUR_gQfvBWsjYcIBv0&amp;t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=SyKM6N5Suv8StxxiOiHTS-b1d5EIBE9i29YIsIhsZsfnNa2wLBxbZxEUH4t2c3xu0IDhDtwtjNlNW_6f0it8BvT_DzWDZTUvavqE6DeLeJY4PURzs1ZqWTWwJARu3Pgs3hQiqo8Z-pEomZIjXmJgNn-39teo7ZhXx2E2k_Rh3u0uARjN0&amp;t=7788d8db" type="text/javascript"></script>

<script src="../Webservices/MemberVisitcard.asmx/js" type="text/javascript"></script>

<script src="../Webservices/ToggleNotification.asmx/js" type="text/javascript"></script>

    <script type="text/javascript" language="JavaScript" src="http://www2.xyz.com/EAS_tag.1.0.js"></script>

I have created two files for wget:
log.txt and docs.txt.

LOG.txt:

    --2010-12-27 23:17:12--  http://www.xyz.dk/docs/Getpaper.aspx?id=133337
Resolving www.xyz.com... 194.152.xx.xxx
Connecting to www.xyz.com|194.152.xx.xxx|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /Members/Login.aspx?uri=%2fdocs%2fGetpaper.aspx%3fid%3d133337 [following]
--2010-12-27 23:17:13--  http://www.xyz.com/Members/Login.aspx?uri=%2fdocs%2fGet$
Reusing existing connection to www.xyz.dk:80.
HTTP request sent, awaiting response... 200 OK


   Length: 22162 (22K) [text/html]
    Saving to: `Getpaper.aspx?id=133337'

         0K .......... .......... .                               100%  131K=0,2s

    2010-12-27 23:17:13 (131 KB/s) - `Getpaper.aspx?id=133337' saved [22162/22162]

FINISHED --2010-12-27 23:17:13--
Downloaded: 1 files, 22K in 0,2s (131 KB/s)

docs.txt consist of the exact links to the documents I want to download: e.g.

http://www.xyz.com/Docs/Download.aspx?id=133337

wget cmd:
wget -c -o /home/repsak/project/log.txt -i /home/repsak/project/files.txt --load-cookies cookies.txt &

If I copy that link into firefox, It will instantly save the doc to my hdd.

Why can't I use wget to download the files?

Is it because the documents are wrapped around some different javascripts or something?

<script src="/JS/SessionKeepAlive.js" type="text/javascript"></script><script type="text/javascript">rootPath = '/';</script><script src="/JS/global.js" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=AJEskFl7ncVJKI8lj-G6W4sh9UGvD53tzD78i10-xRRoHxUqVDJRIljLnDg0DEOQRGYBjddkMr2m5q0TwJMrPlh2lCGMwy6AbdKWsBMN3um5o4LnDIcgMmg4eL168e7m3B43U83ZbaGc8s_xVt_vZ2hFvfwQMI0SKfKljmKnwNaVA8FQ0&t=2610f696" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=rrofmak1_XsHrfW-6KRAeAdeT0EQDJa_q326hm5CN4J0GBPhTP11DXVt2G_-CIqhw_AA2r6ANXbqrNBAfZJb87vm9NR16ygl56psOrqjJelzOVuoWI0lSRCCDSP-4J5YJvgxzae_vsLvVJpMHrQ7mXX8_hI1&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=xJYsnwNJqSs6T1dY1nb07O05PCnCv0HENyX8m8oYEpsDloRSVSOdAx2cDOoU25vJhhL8LPwNLRO05Ulu2LIX_37t_cIxCJEobVObp6psUp6DSGmAF0PYpHNyMJzhHg_vUB-QYAsvFcfw6L2JtytlVP2VdrU1&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=FEHpzPdSdYxI2gNT2bbhrwdiJC8UROU832G8EAnbu2x7ZhR5aHCI5gjoLx78FweIGYgh25_PhqZh8pJTkP-Lje-U1nV_YbChpYqSa_Xr0dJziG_pcM9-dscW_4SZOmuS9BJrt1XWtwpxC3ojfQpgFkjvMOM1&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=LMfuWgyYbfToT4AgZFNAGy1lJGHuFpVQsof2PGpAYw3XskDgdZvPcmO4fiqhPgD_Vi435ME1eUBlcv6AZmiY9FidHgAcyhXENHzwrFShjiIALwQTXY5RKXj7HMPZCxSbJ0Cfwgm4Ui03QN33_WpQWHrsuTDHCihktG9HezEENpgLTl-70&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=Mv3loikajFWni9j8CFnlwfW8Axn-lseUIp6tmfkC39-ni6FDU6ysiEhZMCZuzzXVNbXgsmxORQPQMpRI95K7JYp9fVr0FnSeWIkZJwqH_kJi0XG6K51DJIZZss513XX_cJ4bWJyyfNxFjhC1vX1ppFxpXGeb5otT2SRFS6Poz5FXo3aH0&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=Z4q30yfKzHtiFQj5EoeFk6Gjm42O-f-IQalIUwEu9yg-i18Twv_g-ZWXS7yH8Zkccsa1131CCYafYIOA8kV121H6c3gJhGDAmK7rsFyZqQckD1t1J8dNgnYu4SVsHDfwVs62Kq1DtBRqOThGJplgrMusl-vWwm7aUR_gQfvBWsjYcIBv0&t=7788d8db" type="text/javascript"></script>

<script src="/ScriptResource.axd?d=SyKM6N5Suv8StxxiOiHTS-b1d5EIBE9i29YIsIhsZsfnNa2wLBxbZxEUH4t2c3xu0IDhDtwtjNlNW_6f0it8BvT_DzWDZTUvavqE6DeLeJY4PURzs1ZqWTWwJARu3Pgs3hQiqo8Z-pEomZIjXmJgNn-39teo7ZhXx2E2k_Rh3u0uARjN0&t=7788d8db" type="text/javascript"></script>

<script src="../Webservices/MemberVisitcard.asmx/js" type="text/javascript"></script>

<script src="../Webservices/ToggleNotification.asmx/js" type="text/javascript"></script>

    <script type="text/javascript" language="JavaScript" src="http://www2.xyz.com/EAS_tag.1.0.js"></script>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半暖夏伤 2024-10-16 12:38:40

您可以使用 Phantom.js 来实现此目的: http://www.phantomjs.org/

这是文档用于抓取:http://code.google.com/p/phantomjs/wiki/快速入门#DOM_Manipulation

You can use Phantom.js for this purpose: http://www.phantomjs.org/

Here is the documentation for scraping: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文