使用 selenium 包使用 python 打开 Tor 浏览器

发布于 2025-01-12 19:40:46 字数 2804 浏览 0 评论 0原文

我正在尝试从 Tor 浏览器中抓取网站。 我已经用这段代码完成了:

import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)

但在网络抓取方面,我实际上更熟悉 selenium 库。 我尝试使用此代码,但出现了 WebDriverException 错误。

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)

我想知道是什么原因导致这个错误以及如何解决它。

这是我遇到的完整错误:

WebDriverException                        Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
      1 url = 'https://www.google.com/'
----> 2 driver.get(url)

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
    434         Loads a web page in the current browser session.
    435         """
--> 436         self.execute(Command.GET, {'url': url})
    437 
    438     @property

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    422         response = self.command_executor.execute(driver_command, params)
    423         if response:
--> 424             self.error_handler.check_response(response)
    425             response['value'] = self._unwrap_value(
    426                 response.get('value', None))

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    245                 alert_text = value['alert'].get('text')
    246             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 247         raise exception_class(message, screen, stacktrace)
    248 
    249     def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:

WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25

I am trying to scrape websites from tor browser.
I have done it with this code :

import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)

but I'm actually more familiar with selenium library when it comes to web scraping.
I tried with this code but a WebDriverException error is raised.

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)

i wonder what causes this error and how can i solve it.

Here is the full error i encountered:

WebDriverException                        Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
      1 url = 'https://www.google.com/'
----> 2 driver.get(url)

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
    434         Loads a web page in the current browser session.
    435         """
--> 436         self.execute(Command.GET, {'url': url})
    437 
    438     @property

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    422         response = self.command_executor.execute(driver_command, params)
    423         if response:
--> 424             self.error_handler.check_response(response)
    425             response['value'] = self._unwrap_value(
    426                 response.get('value', None))

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    245                 alert_text = value['alert'].get('text')
    246             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 247         raise exception_class(message, screen, stacktrace)
    248 
    249     def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:

WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

久光 2025-01-19 19:40:46

您没有显示错误消息,所以我不知道您的问题是什么。

当我尝试在 Linux 上使用 tor 时,它会打开 tor 而不会出现错误
(仅带有警告“firefox_binary已被弃用”,但这不是问题)
但稍后它不会加载页面 - get(url) - 并且不会显示错误。
也许 Tor 是安全的浏览器,因为它阻止了 Selenium 需要控制浏览器的一些功能。


但如果您运行tor网络,那么您可以将其用作普通Firefox代理服务器

如果页面 http://127.0.0.1:9050 显示 “This is a SOCKs proxy, not an HTTP 代理。”
然后 tor 网络 正在运行,您可以执行以下操作:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'socksProxy': '127.0.0.1:9050',
    'socksVersion': 5,
})

options = Options()
options.proxy = proxy 
#options.binary_location = '/home/furas/bin/tor'  # doesn't work
#options.binary_location = '/path/to/normal/firefox'  # works

driver = webdriver.Firefox(options=options)  #  use path to standard `Firefox`

url = 'https://www.google.com/'
url = 'https://icanhazip.com'     # it shows your IP
#url = 'https://httpbin.org/get'  # it shows your IP and headers/cookies

driver.get(url)

PS。有时tor可能使用端口9150而不是9050

You didn't show error message so I don't know what is your problem.

When I try to use tor on Linux then it opens tor without errors
(only with warning "firefox_binary has been deprecated" but this is not problem)
but later it doesn't load page - get(url) - and it doesn't show error.
Maybe tor is safe browser because it blocks some functions which Selenium needs to control browser.


But if you run tor network then you can use it as proxy server with normal Firefox.

If page http://127.0.0.1:9050 shows "This is a SOCKs proxy, not an HTTP proxy."
then tor network is running and you can do:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'socksProxy': '127.0.0.1:9050',
    'socksVersion': 5,
})

options = Options()
options.proxy = proxy 
#options.binary_location = '/home/furas/bin/tor'  # doesn't work
#options.binary_location = '/path/to/normal/firefox'  # works

driver = webdriver.Firefox(options=options)  #  use path to standard `Firefox`

url = 'https://www.google.com/'
url = 'https://icanhazip.com'     # it shows your IP
#url = 'https://httpbin.org/get'  # it shows your IP and headers/cookies

driver.get(url)

PS. sometimes tor may use port 9150 instead of 9050.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文