wget 是 DL“downloading.aspx”;而不是“helloworld.doc”?
我为 wget 创建了两个文件: log.txt 和 docs.txt。
LOG.txt:
--2010-12-27 23:17:12-- http://www.xyz.dk/docs/Getpaper.aspx?id=133337
Resolving www.xyz.com... 194.152.xx.xxx
Connecting to www.xyz.com|194.152.xx.xxx|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /Members/Login.aspx?uri=%2fdocs%2fGetpaper.aspx%3fid%3d133337 [following]
--2010-12-27 23:17:13-- http://www.xyz.com/Members/Login.aspx?uri=%2fdocs%2fGet$
Reusing existing connection to www.xyz.dk:80.
HTTP request sent, awaiting response... 200 OK
Length: 22162 (22K) [text/html]
Saving to: `Getpaper.aspx?id=133337'
0K .......... .......... . 100% 131K=0,2s
2010-12-27 23:17:13 (131 KB/s) - `Getpaper.aspx?id=133337' saved [22162/22162]
FINISHED --2010-12-27 23:17:13--
Downloaded: 1 files, 22K in 0,2s (131 KB/s)
docs.txt 包含我要下载的文档的确切链接:例如
http://www.xyz.com/Docs/Download.aspx?id=133337
wget cmd: wget -c -o /home/repsak/project/log.txt -i /home/repsak/project/files.txt --load-cookies cookies.txt &
如果我将该链接复制到 Firefox,它会立即将该文档保存到我的硬盘上。
为什么我无法使用 wget 下载文件?
是因为这些文档包含了一些不同的 javascript 或其他东西吗?
<script src="/JS/SessionKeepAlive.js" type="text/javascript"></script><script type="text/javascript">rootPath = '/';</script><script src="/JS/global.js" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=AJEskFl7ncVJKI8lj-G6W4sh9UGvD53tzD78i10-xRRoHxUqVDJRIljLnDg0DEOQRGYBjddkMr2m5q0TwJMrPlh2lCGMwy6AbdKWsBMN3um5o4LnDIcgMmg4eL168e7m3B43U83ZbaGc8s_xVt_vZ2hFvfwQMI0SKfKljmKnwNaVA8FQ0&t=2610f696" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=rrofmak1_XsHrfW-6KRAeAdeT0EQDJa_q326hm5CN4J0GBPhTP11DXVt2G_-CIqhw_AA2r6ANXbqrNBAfZJb87vm9NR16ygl56psOrqjJelzOVuoWI0lSRCCDSP-4J5YJvgxzae_vsLvVJpMHrQ7mXX8_hI1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=xJYsnwNJqSs6T1dY1nb07O05PCnCv0HENyX8m8oYEpsDloRSVSOdAx2cDOoU25vJhhL8LPwNLRO05Ulu2LIX_37t_cIxCJEobVObp6psUp6DSGmAF0PYpHNyMJzhHg_vUB-QYAsvFcfw6L2JtytlVP2VdrU1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=FEHpzPdSdYxI2gNT2bbhrwdiJC8UROU832G8EAnbu2x7ZhR5aHCI5gjoLx78FweIGYgh25_PhqZh8pJTkP-Lje-U1nV_YbChpYqSa_Xr0dJziG_pcM9-dscW_4SZOmuS9BJrt1XWtwpxC3ojfQpgFkjvMOM1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=LMfuWgyYbfToT4AgZFNAGy1lJGHuFpVQsof2PGpAYw3XskDgdZvPcmO4fiqhPgD_Vi435ME1eUBlcv6AZmiY9FidHgAcyhXENHzwrFShjiIALwQTXY5RKXj7HMPZCxSbJ0Cfwgm4Ui03QN33_WpQWHrsuTDHCihktG9HezEENpgLTl-70&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=Mv3loikajFWni9j8CFnlwfW8Axn-lseUIp6tmfkC39-ni6FDU6ysiEhZMCZuzzXVNbXgsmxORQPQMpRI95K7JYp9fVr0FnSeWIkZJwqH_kJi0XG6K51DJIZZss513XX_cJ4bWJyyfNxFjhC1vX1ppFxpXGeb5otT2SRFS6Poz5FXo3aH0&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=Z4q30yfKzHtiFQj5EoeFk6Gjm42O-f-IQalIUwEu9yg-i18Twv_g-ZWXS7yH8Zkccsa1131CCYafYIOA8kV121H6c3gJhGDAmK7rsFyZqQckD1t1J8dNgnYu4SVsHDfwVs62Kq1DtBRqOThGJplgrMusl-vWwm7aUR_gQfvBWsjYcIBv0&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=SyKM6N5Suv8StxxiOiHTS-b1d5EIBE9i29YIsIhsZsfnNa2wLBxbZxEUH4t2c3xu0IDhDtwtjNlNW_6f0it8BvT_DzWDZTUvavqE6DeLeJY4PURzs1ZqWTWwJARu3Pgs3hQiqo8Z-pEomZIjXmJgNn-39teo7ZhXx2E2k_Rh3u0uARjN0&t=7788d8db" type="text/javascript"></script>
<script src="../Webservices/MemberVisitcard.asmx/js" type="text/javascript"></script>
<script src="../Webservices/ToggleNotification.asmx/js" type="text/javascript"></script>
<script type="text/javascript" language="JavaScript" src="http://www2.xyz.com/EAS_tag.1.0.js"></script>
I have created two files for wget:
log.txt and docs.txt.
LOG.txt:
--2010-12-27 23:17:12-- http://www.xyz.dk/docs/Getpaper.aspx?id=133337
Resolving www.xyz.com... 194.152.xx.xxx
Connecting to www.xyz.com|194.152.xx.xxx|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: /Members/Login.aspx?uri=%2fdocs%2fGetpaper.aspx%3fid%3d133337 [following]
--2010-12-27 23:17:13-- http://www.xyz.com/Members/Login.aspx?uri=%2fdocs%2fGet$
Reusing existing connection to www.xyz.dk:80.
HTTP request sent, awaiting response... 200 OK
Length: 22162 (22K) [text/html]
Saving to: `Getpaper.aspx?id=133337'
0K .......... .......... . 100% 131K=0,2s
2010-12-27 23:17:13 (131 KB/s) - `Getpaper.aspx?id=133337' saved [22162/22162]
FINISHED --2010-12-27 23:17:13--
Downloaded: 1 files, 22K in 0,2s (131 KB/s)
docs.txt consist of the exact links to the documents I want to download: e.g.
http://www.xyz.com/Docs/Download.aspx?id=133337
wget cmd:
wget -c -o /home/repsak/project/log.txt -i /home/repsak/project/files.txt --load-cookies cookies.txt &
If I copy that link into firefox, It will instantly save the doc to my hdd.
Why can't I use wget to download the files?
Is it because the documents are wrapped around some different javascripts or something?
<script src="/JS/SessionKeepAlive.js" type="text/javascript"></script><script type="text/javascript">rootPath = '/';</script><script src="/JS/global.js" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=AJEskFl7ncVJKI8lj-G6W4sh9UGvD53tzD78i10-xRRoHxUqVDJRIljLnDg0DEOQRGYBjddkMr2m5q0TwJMrPlh2lCGMwy6AbdKWsBMN3um5o4LnDIcgMmg4eL168e7m3B43U83ZbaGc8s_xVt_vZ2hFvfwQMI0SKfKljmKnwNaVA8FQ0&t=2610f696" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=rrofmak1_XsHrfW-6KRAeAdeT0EQDJa_q326hm5CN4J0GBPhTP11DXVt2G_-CIqhw_AA2r6ANXbqrNBAfZJb87vm9NR16ygl56psOrqjJelzOVuoWI0lSRCCDSP-4J5YJvgxzae_vsLvVJpMHrQ7mXX8_hI1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=xJYsnwNJqSs6T1dY1nb07O05PCnCv0HENyX8m8oYEpsDloRSVSOdAx2cDOoU25vJhhL8LPwNLRO05Ulu2LIX_37t_cIxCJEobVObp6psUp6DSGmAF0PYpHNyMJzhHg_vUB-QYAsvFcfw6L2JtytlVP2VdrU1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=FEHpzPdSdYxI2gNT2bbhrwdiJC8UROU832G8EAnbu2x7ZhR5aHCI5gjoLx78FweIGYgh25_PhqZh8pJTkP-Lje-U1nV_YbChpYqSa_Xr0dJziG_pcM9-dscW_4SZOmuS9BJrt1XWtwpxC3ojfQpgFkjvMOM1&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=LMfuWgyYbfToT4AgZFNAGy1lJGHuFpVQsof2PGpAYw3XskDgdZvPcmO4fiqhPgD_Vi435ME1eUBlcv6AZmiY9FidHgAcyhXENHzwrFShjiIALwQTXY5RKXj7HMPZCxSbJ0Cfwgm4Ui03QN33_WpQWHrsuTDHCihktG9HezEENpgLTl-70&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=Mv3loikajFWni9j8CFnlwfW8Axn-lseUIp6tmfkC39-ni6FDU6ysiEhZMCZuzzXVNbXgsmxORQPQMpRI95K7JYp9fVr0FnSeWIkZJwqH_kJi0XG6K51DJIZZss513XX_cJ4bWJyyfNxFjhC1vX1ppFxpXGeb5otT2SRFS6Poz5FXo3aH0&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=Z4q30yfKzHtiFQj5EoeFk6Gjm42O-f-IQalIUwEu9yg-i18Twv_g-ZWXS7yH8Zkccsa1131CCYafYIOA8kV121H6c3gJhGDAmK7rsFyZqQckD1t1J8dNgnYu4SVsHDfwVs62Kq1DtBRqOThGJplgrMusl-vWwm7aUR_gQfvBWsjYcIBv0&t=7788d8db" type="text/javascript"></script>
<script src="/ScriptResource.axd?d=SyKM6N5Suv8StxxiOiHTS-b1d5EIBE9i29YIsIhsZsfnNa2wLBxbZxEUH4t2c3xu0IDhDtwtjNlNW_6f0it8BvT_DzWDZTUvavqE6DeLeJY4PURzs1ZqWTWwJARu3Pgs3hQiqo8Z-pEomZIjXmJgNn-39teo7ZhXx2E2k_Rh3u0uARjN0&t=7788d8db" type="text/javascript"></script>
<script src="../Webservices/MemberVisitcard.asmx/js" type="text/javascript"></script>
<script src="../Webservices/ToggleNotification.asmx/js" type="text/javascript"></script>
<script type="text/javascript" language="JavaScript" src="http://www2.xyz.com/EAS_tag.1.0.js"></script>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Phantom.js 来实现此目的: http://www.phantomjs.org/
这是文档用于抓取:http://code.google.com/p/phantomjs/wiki/快速入门#DOM_Manipulation
You can use Phantom.js for this purpose: http://www.phantomjs.org/
Here is the documentation for scraping: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation