使用 Python“请求”进行代理模块
这是关于 Python 优秀 Requests 模块的简短介绍。
我似乎无法在文档中找到变量“代理”应包含的内容。当我向它发送一个带有标准“IP:PORT”值的字典时,它拒绝了它要求的 2 个值。 所以,我猜(因为这似乎没有在文档中涵盖)第一个值是 ip,第二个值是端口?
文档仅提到这一点:
代理 –(可选)字典映射协议到代理的 URL。
所以我尝试了这个...我应该做什么?
proxy = { ip: port}
在将它们放入字典之前我应该将它们转换为某种类型吗?
r = requests.get(url,headers=headers,proxies=proxy)
Just a short, simple one about the excellent Requests module for Python.
I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values.
So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?
The docs mention this only:
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
So I tried this... what should I be doing?
proxy = { ip: port}
and should I convert these to some type before putting them in the dict?
r = requests.get(url,headers=headers,proxies=proxy)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
proxies
的字典语法为{"protocol": "scheme://ip:port", ...}
。使用它,您可以为使用 http、https 和 ftp 协议的请求指定不同(或相同)代理:从
请求
文档:在 Linux 上,您还可以通过
HTTP_PROXY
、HTTPS_PROXY
和FTP_PROXY
环境变量来执行此操作:在 Windows 上:
The
proxies
' dict syntax is{"protocol": "scheme://ip:port", ...}
. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:Deduced from the
requests
documentation:On linux you can also do this via the
HTTP_PROXY
,HTTPS_PROXY
, andFTP_PROXY
environment variables:On Windows:
您可以在此处参考代理文档。
如果您需要使用代理,则可以使用任何请求方法的 proxies 参数配置单个请求:
要对代理使用 HTTP 基本身份验证,请使用
http://user:[电子邮件受保护]/
语法:You can refer to the proxy documentation here.
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
To use HTTP Basic Auth with your proxy, use the
http://user:[email protected]/
syntax:我发现 urllib 有一些非常好的代码来获取系统的代理设置,并且它们恰好采用正确的形式可以直接使用。你可以这样使用它:
它工作得非常好,并且 urllib 也知道如何获取 Mac OS X 和 Windows 设置。
I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:
It works really well and urllib knows about getting Mac OS X and Windows settings as well.
接受的答案对我来说是一个好的开始,但我不断收到以下错误:
解决此问题的方法是在代理网址中指定 http:// 因此:
我很想知道为什么原始版本对某些人有效,但是不是我。
编辑:我看到主要答案现在已更新以反映这一点:)
The accepted answer was a good start for me, but I kept getting the following error:
Fix to this was to specify the http:// in the proxy url thus:
I'd be interested as to why the original works for some people but not me.
Edit: I see the main answer is now updated to reflect this :)
如果您想保留 cookie 和会话数据,最好这样做:
If you'd like to persisist cookies and session data, you'd best do it like this:
晚了8年。但我喜欢:
8 years late. But I like:
文档
给出了一个非常清晰的代理使用示例
,但是,没有记录的是,即使架构相同,您甚至可以为单个 url 配置代理!
当您想对要抓取的不同网站使用不同的代理时,这会派上用场。
此外,requests.get本质上在底层使用了requests.Session,因此如果您需要更多控制,请直接使用它
我用它来设置后备(默认代理) )处理与字典中指定的架构/URL 不匹配的所有流量
The documentation
gives a very clear example of the proxies usage
What isn't documented, however, is the fact that you can even configure proxies for individual urls even if the schema is the same!
This comes in handy when you want to use different proxies for different websites you wish to scrape.
Additionally,
requests.get
essentially uses therequests.Session
under the hood, so if you need more control, use it directlyI use it to set a fallback (a default proxy) that handles all traffic that doesn't match the schemas/urls specified in the dictionary
我刚刚制作了一个代理抓取器,也可以在没有任何输入的情况下与相同的抓取代理连接
这是:
i just made a proxy graber and also can connect with same grabed proxy without any input
here is :
这是我在 python 中的请求模块的基本类,带有一些代理配置和秒表!
here is my basic class in python for the requests module with some proxy configs and stopwatch !
已经测试过,以下代码有效。需要使用HTTPProxyAuth。
Already tested, the following code works. Need to use HTTPProxyAuth.
虽然有点晚了,但这里有一个包装类,可以简化抓取代理,然后进行 http POST 或 GET:
ProxyRequests
It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:
ProxyRequests
我分享了一些代码,如何从网站“https://free-proxy-list.net”获取代理并将数据存储到与“Elite Proxy Switcher”(格式 IP:PORT)等工具兼容的文件中:
##PROXY_UPDATER -从 https://free-proxy-list.net/ 获取免费代理
I share some code how to fetch proxies from the site "https://free-proxy-list.net" and store data to a file compatible with tools like "Elite Proxy Switcher"(format IP:PORT):
##PROXY_UPDATER - get free proxies from https://free-proxy-list.net/