使用 selenium 包使用 python 打开 Tor 浏览器
我正在尝试从 Tor 浏览器中抓取网站。 我已经用这段代码完成了:
import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)
但在网络抓取方面,我实际上更熟悉 selenium 库。 我尝试使用此代码,但出现了 WebDriverException 错误。
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)
我想知道是什么原因导致这个错误以及如何解决它。
这是我遇到的完整错误:
WebDriverException Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
1 url = 'https://www.google.com/'
----> 2 driver.get(url)
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
434 Loads a web page in the current browser session.
435 """
--> 436 self.execute(Command.GET, {'url': url})
437
438 @property
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
422 response = self.command_executor.execute(driver_command, params)
423 if response:
--> 424 self.error_handler.check_response(response)
425 response['value'] = self._unwrap_value(
426 response.get('value', None))
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
245 alert_text = value['alert'].get('text')
246 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 247 raise exception_class(message, screen, stacktrace)
248
249 def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:
WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25
I am trying to scrape websites from tor browser.
I have done it with this code :
import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)
but I'm actually more familiar with selenium library when it comes to web scraping.
I tried with this code but a WebDriverException error is raised.
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)
i wonder what causes this error and how can i solve it.
Here is the full error i encountered:
WebDriverException Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
1 url = 'https://www.google.com/'
----> 2 driver.get(url)
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
434 Loads a web page in the current browser session.
435 """
--> 436 self.execute(Command.GET, {'url': url})
437
438 @property
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
422 response = self.command_executor.execute(driver_command, params)
423 if response:
--> 424 self.error_handler.check_response(response)
425 response['value'] = self._unwrap_value(
426 response.get('value', None))
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
245 alert_text = value['alert'].get('text')
246 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 247 raise exception_class(message, screen, stacktrace)
248
249 def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:
WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您没有显示错误消息,所以我不知道您的问题是什么。
当我尝试在 Linux 上使用
tor
时,它会打开tor
而不会出现错误(仅带有警告
“firefox_binary已被弃用”
,但这不是问题)但稍后它不会加载页面 -
get(url)
- 并且不会显示错误。也许 Tor 是安全的浏览器,因为它阻止了 Selenium 需要控制浏览器的一些功能。
但如果您运行
tor网络
,那么您可以将其用作普通Firefox
的代理服务器
。如果页面 http://127.0.0.1:9050 显示
“This is a SOCKs proxy, not an HTTP 代理。”
然后
tor 网络
正在运行,您可以执行以下操作:PS。有时
tor
可能使用端口9150
而不是9050
。You didn't show error message so I don't know what is your problem.
When I try to use
tor
on Linux then it openstor
without errors(only with warning
"firefox_binary has been deprecated"
but this is not problem)but later it doesn't load page -
get(url)
- and it doesn't show error.Maybe
tor
is safe browser because it blocks some functions whichSelenium
needs to control browser.But if you run
tor network
then you can use it asproxy server
with normalFirefox
.If page http://127.0.0.1:9050 shows
"This is a SOCKs proxy, not an HTTP proxy."
then
tor network
is running and you can do:PS. sometimes
tor
may use port9150
instead of9050
.