将 urllib2 与 SOCKS 代理结合使用

发布于 2024-08-26 13:15:29 字数 374 浏览 5 评论 0原文

是否可以通过 SOCKS 代理在每个 opener basic 的一台袜子服务器上使用 urllib2 获取页面?我已经看到使用 setdefaultproxy 方法的解决方案,但我需要在不同的开启器中使用不同的袜子。

所以有 SocksiPy 库,它工作得很好,但必须以这种方式使用:

import socks
import socket
socket.socket = socks.socksocket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)

也就是说,它为所有 urllib2 请求设置相同的代理。如何为不同的开启器使用不同的代理?

Is it possible to fetch pages with urllib2 through a SOCKS proxy on a one socks server per opener basic? I've seen the solution using setdefaultproxy method, but I need to have different socks in different openers.

So there is SocksiPy library, which works great, but it has to be used this way:

import socks
import socket
socket.socket = socks.socksocket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)

That is, it sets the same proxy for ALL urllib2 requests. How can I have different proxies for different openers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

梦在深巷 2024-09-02 13:15:29

尝试使用 pycurl

import pycurl
c1 = pycurl.Curl()
c1.setopt(pycurl.URL, 'http://www.google.com')
c1.setopt(pycurl.PROXY, 'localhost')
c1.setopt(pycurl.PROXYPORT, 8080)
c1.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c2 = pycurl.Curl()
c2.setopt(pycurl.URL, 'http://www.yahoo.com')
c2.setopt(pycurl.PROXY, 'localhost')
c2.setopt(pycurl.PROXYPORT, 8081)
c2.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c1.perform() 
c2.perform() 

Try with pycurl:

import pycurl
c1 = pycurl.Curl()
c1.setopt(pycurl.URL, 'http://www.google.com')
c1.setopt(pycurl.PROXY, 'localhost')
c1.setopt(pycurl.PROXYPORT, 8080)
c1.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c2 = pycurl.Curl()
c2.setopt(pycurl.URL, 'http://www.yahoo.com')
c2.setopt(pycurl.PROXY, 'localhost')
c2.setopt(pycurl.PROXYPORT, 8081)
c2.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c1.perform() 
c2.perform() 
明明#如月 2024-09-02 13:15:29

是的,你可以。我重复我的答案 如何将 SOCKS 4/5 代理与 urllib2 一起使用?
您需要为每个代理创建一个开启程序,就像使用 http 代理一样。将此功能添加到 SocksiPy 的代码可在 GitHub https://gist.github.com/869791 并且很简单:

opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, 'localhost', 9999))
print opener.open('http://www.whatismyip.com/automation/n09230945.asp').read()

有关更多信息,我编写了一个运行多个 Tor 实例的示例,其行为类似于旋转代理: 使用多个 Tor 电路进行分布式抓取

Yes, you can. I repeat my answer on How can I use a SOCKS 4/5 proxy with urllib2?
You need to create an opener for every proxy like you do with an http proxy. The code for adding this feature to SocksiPy is available in GitHub https://gist.github.com/869791 and is as simple as:

opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, 'localhost', 9999))
print opener.open('http://www.whatismyip.com/automation/n09230945.asp').read()

For more information I've written an example running multiple Tor instances to behave like a rotating proxy: Distributed Scraping With Multiple Tor Circuits

方觉久 2024-09-02 13:15:29

您只有一个用于所有开启器的套接字,并且在套接字级别实现袜子。所以,你不能。
我建议你使用 pycurl 库,它更灵活。

You have only one socket for all openers and implementing socks is in socket level. So, you can't.
I suggest you to use pycurl library, it much more flexible.

淡笑忘祈一世凡恋 2024-09-02 13:15:29

==编辑==(旧的HTTP代理示例在这里..)

我的错..urllib2没有对SOCKS代理的内置支持..

有一些“黑客”将 SOCKS 添加到 urllib2 (或一般的套接字对象) 此处
但我几乎不怀疑这将适用于您需要的多个代理。

只要您不想挂钩/子类 urllib2.ProxyHandler 我建议使用 pycurl。

== EDIT == (old HTTP-Proxy example was here..)

My fault.. urllib2 has no builtin support for SOCKS proxying..

There are some 'hacks' adding SOCKS to urllib2 (or the socket object in general) here.
But I hardly suspect that this will work with multiple proxies like you require it.

As long as you don't wan't to hook / subclass urllib2.ProxyHandler I would suggest to go with pycurl.

铁轨上的流浪者 2024-09-02 13:15:29

如果一次没有太多连接,并且需要从多个线程访问,您也许可以使用线程锁:

import socks
import socket
import thread
lock = thread.allocate_lock()
socket.socket = socks.socksocket

def GetConn():
    lock.acquire()
    import urllib2
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)
    conn = urllib2.urlopen(ARGUMENTS HERE)
    lock.release()
    return conn

您也可以在每次需要获取连接时使用类似的东西:

urllib2 = execfile('urllib2.py')
urllib2.socket = dummy_class() # dummy_class needs the socket module's methods

这些显然不是很好的解决方案,但我还是投入了 2 美分:-)

You might be able to use threading locks if there aren't too many connections being made at once, and you need to access from multiple threads:

import socks
import socket
import thread
lock = thread.allocate_lock()
socket.socket = socks.socksocket

def GetConn():
    lock.acquire()
    import urllib2
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)
    conn = urllib2.urlopen(ARGUMENTS HERE)
    lock.release()
    return conn

You might also be able to use something like this every time you need to get a connection:

urllib2 = execfile('urllib2.py')
urllib2.socket = dummy_class() # dummy_class needs the socket module's methods

These are obviously not fantastic solutions, but I've put in my 2¢ anyway :-)

时光无声 2024-09-02 13:15:29

使用 SOCKS 代理的一个麻烦但有效的解决方案是使用代理链接设置 provixy,然后通过系统变量或任何其他方式设置 privoxy 提供的 HTTP_PROXY。

A cumbersome but working solution for using a SOCKS proxy is to set up provixy with proxy chaining and then set the HTTP_PROXY provided by privoxy via system variable or any other way.

来日方长 2024-09-02 13:15:29

您可以通过以下格式设置环境变量 HTTP_PROXY 来实现:

user:pass@proxy:port

或者如果您使用 bat/cmd,请在调用脚本之前添加:

set HTTP_PROXY=user:pass@proxy:port

我正在使用这样的 cmd-文件使 easy_install 在代理下工作。

You could do you it by setting evironmental variable HTTP_PROXY in following format:

user:pass@proxy:port

or if you use bat/cmd, add before calling script:

set HTTP_PROXY=user:pass@proxy:port

I am using such cmd-file to make easy_install work under proxy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文