如何确定表单提交时需要哪些标头
我正在编写一个 python 脚本来收集大量数据,不幸的是,这涉及提交一些 Web 表单来获取我感兴趣的信息。由于这必须重复数千次,因此将其自动化是有意义的。
我从来没有处理过这类事情的形式方面,所以我有几个问题。这是我正在使用的代码片段:
url = 'http://example.com/result.do'
values = {'id' : 'value', 'id' : 'value', 'submit' : 'Submit'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
我的问题: 即使我不选择它们,我是否也需要传递所有表单字段?
我是否需要传递 HTTP 标头,如果需要的话,有人可以将我链接到对此的一个很好的解释吗?
我错过了什么吗?
I am writing a python script to collect a large amount of data, unfortunately this involves submitting a few web forms to get the information that I am interested in. Since this has to be repeated a few thousand times it makes sense to automate it.
I have never dealt with the forms aspect of this sort of thing so I have a few questions. Here is a snippet of the code I am using:
url = 'http://example.com/result.do'
values = {'id' : 'value', 'id' : 'value', 'submit' : 'Submit'}
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
My Questions:
Do I need to pass all the form fields even if I don't select them?
Do I need to pass HTTP headers with this, if so could someone link me to a good explanation of this?
Am I missing anything?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议您查看
Mechanize
模块。它允许您选择表单、填写字段、选择选项并单击按钮。它会自动生成表单在普通浏览器中提交的表单数据。这是一个很好的小教程/示例: http:// stockrt.github.com/p/emulated-a-browser-in-python-with-mechanize/
I'd recommend taking a look at the
Mechanize
module. It allows you to select a form, fill fields, select options, and click buttons. It automatically generates the form data that the form would have submitted in a normal browser.Here's a nice little tutorial/example: http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/