我被指派从一个用 Javascript 大量编码的网站自动下载几个 csv 文件(https://www.quantcast.com/download/plannerCSV?&d0Id=01&sc=1&mr=10000&rs2=- 1)。有一个方便的“下载”按钮提示下载,我使用 Firebug 找到了该按钮的 URL:
https://www.quantcast.com/download/plannerCSV?&d0Id=10&sc=1&mr=10000
但是,据我所知,URL 必须以结尾.csv 中,以便 python 中的 urllib2 模块将其读/写到我的磁盘。有没有办法获取以 .csv 结尾的文件的 URL,以便我可以使用 python 自动下载?
我更喜欢使用 python 编写脚本,但是如果有另一种更明智/直接的方法来完成此任务,我很想知道。任何指导表示赞赏!谢谢=)
(编辑)澄清:我尝试下载的数据由一个巨大的(定期更新)列表组成,可以根据选中的复选框以多种不同的方式进行过滤。 URL 根据复选框而变化,因此我相信 URL 可能会调用每次生成新的 csv 文件的 Javascript 调用。
I was assigned to automate the download of several csv files from a site heavily encoded in Javascript (https://www.quantcast.com/download/plannerCSV?&d0Id=01&sc=1&mr=10000&rs2=-1). There is a convenient 'download' button that prompts the download, the URL of which I found using Firebug:
https://www.quantcast.com/download/plannerCSV?&d0Id=10&sc=1&mr=10000
However, to the best of my knowledge, the URL must end in .csv in order for the urllib2 module in python to read/write it to my disk. Is there any way of obtaining a URL for this file that ends in .csv so I can automate the download using python?
I prefer to use python to write the script, but if there is another approach to achieve this task that is more sensible/straightforward, I would love to know about it. Any guidance is appreciated! Thanks =)
(edit) to clarify: The data I'm trying to download consists of a giant (regularly-updated) list, which can be filtered in many different ways based on which checkboxes are checked. The URL changes based on the checkboxes, so I believe the URL may invoke a Javascript call that generates a new csv file each time.
附加虚拟 GET 变量是否有效?对于某些事情——对我来说最明显的是关于图像文件的 Google Code wiki——它们需要文件扩展名并通过传递服务器不会使用的 GET 变量来实现这一点。类似于:
https:// /www.quantcast.com/download/plannerCSV?&d0Id=10&sc=1&mr=10000&foo=.csv
我想查看一下,但我没有 Quantcast 帐户。
Does appending a dummy GET variable work? For some things -- the obvious one to me is the Google Code wiki regarding image files -- they require a file extension and achieve that by passing GET variables that won't be used by the server. Something like:
https://www.quantcast.com/download/plannerCSV?&d0Id=10&sc=1&mr=10000&foo=.csv
I'd check it out, but I don't have a Quantcast account.