用纤维中的硒剪切
我正在尝试从Masari.io中固定Daos列表,但我遇到了麻烦,因为我会遇到以下错误:
DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
DevTools listening on ws://127.0.0.1:56691/devtools/browser/b4609671-5e6e-4d25-b09e-4116b3dde4bf
[0525/100030.252:INFO:CONSOLE(1)] "enabling sentry error tracker", source: https://messari.io/static/js/main.977a4794.chunk.js (1)
[0525/100030.951:INFO:CONSOLE(2)] "Unable to refresh token: Login required", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.065:INFO:CONSOLE(2)] "
88b d88 88
888b d888 ""
88'8b d8'88
88 '8b d8' 88 ,adPPYba, ,adPPYba, ,adPPYba, ,adPPYYba, 8b,dPPYba, 88
88 '8b d8' 88 a8P_____88 I8[ "" I8[ "" "" 'Y8 88P' "Y8 88
88 '8b d8' 88 8PP""""""" '"Y8ba, '"Y8ba, ,adPPPPP88 88 88
88 '888' 88 "8b, ,aa aa ]8I aa ]8I 88, ,88 88 88
88 '8' 88 '"Ybbd8"' '"YbbdP"' '"YbbdP"' '"8bbdP"Y8 88 88
", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.069:INFO:CONSOLE(2)] "Interested in a CHALLENGE? Check out: https://messari.io/quiz", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
Traceback (most recent call last):
File "c:/Users/Student/webScrape/scraper.py", line 21, in <module>
matches = WebDriverWait(driver, 10).until(
File "C:\Users\Student\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x0096B8F3+2406643]
Ordinal0 [0x008FAF31+1945393]
Ordinal0 [0x007EC748+837448]
Ordinal0 [0x008192E0+1020640]
Ordinal0 [0x0081957B+1021307]
Ordinal0 [0x00846372+1205106]
Ordinal0 [0x008342C4+1131204]
Ordinal0 [0x00844682+1197698]
Ordinal0 [0x00834096+1130646]
Ordinal0 [0x0080E636+976438]
Ordinal0 [0x0080F546+980294]
GetHandleVerifier [0x00BD9612+2498066]
GetHandleVerifier [0x00BCC920+2445600]
GetHandleVerifier [0x00A04F2A+579370]
GetHandleVerifier [0x00A03D36+574774]
Ordinal0 [0x00901C0B+1973259]
Ordinal0 [0x00906688+1992328]
Ordinal0 [0x00906775+1992565]
Ordinal0 [0x0090F8D1+2029777]
BaseThreadInitThunk [0x777BFA29+25]
RtlGetAppContainerNamedObjectPath [0x77B77A7E+286]
RtlGetAppContainerNamedObjectPath [0x77B77A4E+238]
我知道Messari.io有一个API,但我几乎可以肯定它仅用于他们的资产,而不是他们的资产Daos清单。我尝试使用硒,因为它是一个动态的页面,但我仍然遇到麻烦。这是我的代码:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
url = 'https://messari.io/governor/daos'
DRIVER_PATH = 'PATH_TO_DRIVER_ON_MY_PC'
options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")
# s = Service('PATH_TO_DRIVER_ON_MY_PC')
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get('https://messari.io/governor/daos')
try:
matches = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "td")))
# for match in matches:
# print(match.text)
finally:
driver.quit()
Update 我修复了可执行文件警告,但是我仍会遇到相同的TimeOutException错误。而且,当我没有无头的情况下运行它时,我也会收到以下消息:
DevTools listening on ws://127.0.0.1:57773/devtools/browser/4450b78d-3a9f-401a-b39c-2c716ecad924
[9628:20616:0525/102300.840:ERROR:device_event_log_impl.cc(214)] [10:23:00.840] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9628:20616:0525/102300.841:ERROR:device_event_log_impl.cc(214)] [10:23:00.841] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
我认为这部分更像是一条硬件消息,当我拔下鼠标时,我不应该根据类似的问题担心,它删除了其中一个。
I am trying to webscrape the list of DAOs from masari.io but I am having trouble because I get the following errors:
DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
DevTools listening on ws://127.0.0.1:56691/devtools/browser/b4609671-5e6e-4d25-b09e-4116b3dde4bf
[0525/100030.252:INFO:CONSOLE(1)] "enabling sentry error tracker", source: https://messari.io/static/js/main.977a4794.chunk.js (1)
[0525/100030.951:INFO:CONSOLE(2)] "Unable to refresh token: Login required", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.065:INFO:CONSOLE(2)] "
88b d88 88
888b d888 ""
88'8b d8'88
88 '8b d8' 88 ,adPPYba, ,adPPYba, ,adPPYba, ,adPPYYba, 8b,dPPYba, 88
88 '8b d8' 88 a8P_____88 I8[ "" I8[ "" "" 'Y8 88P' "Y8 88
88 '8b d8' 88 8PP""""""" '"Y8ba, '"Y8ba, ,adPPPPP88 88 88
88 '888' 88 "8b, ,aa aa ]8I aa ]8I 88, ,88 88 88
88 '8' 88 '"Ybbd8"' '"YbbdP"' '"YbbdP"' '"8bbdP"Y8 88 88
", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.069:INFO:CONSOLE(2)] "Interested in a CHALLENGE? Check out: https://messari.io/quiz", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
Traceback (most recent call last):
File "c:/Users/Student/webScrape/scraper.py", line 21, in <module>
matches = WebDriverWait(driver, 10).until(
File "C:\Users\Student\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x0096B8F3+2406643]
Ordinal0 [0x008FAF31+1945393]
Ordinal0 [0x007EC748+837448]
Ordinal0 [0x008192E0+1020640]
Ordinal0 [0x0081957B+1021307]
Ordinal0 [0x00846372+1205106]
Ordinal0 [0x008342C4+1131204]
Ordinal0 [0x00844682+1197698]
Ordinal0 [0x00834096+1130646]
Ordinal0 [0x0080E636+976438]
Ordinal0 [0x0080F546+980294]
GetHandleVerifier [0x00BD9612+2498066]
GetHandleVerifier [0x00BCC920+2445600]
GetHandleVerifier [0x00A04F2A+579370]
GetHandleVerifier [0x00A03D36+574774]
Ordinal0 [0x00901C0B+1973259]
Ordinal0 [0x00906688+1992328]
Ordinal0 [0x00906775+1992565]
Ordinal0 [0x0090F8D1+2029777]
BaseThreadInitThunk [0x777BFA29+25]
RtlGetAppContainerNamedObjectPath [0x77B77A7E+286]
RtlGetAppContainerNamedObjectPath [0x77B77A4E+238]
I know there is an API for messari.io, but I am almost certain it is only for their assets and not their list of DAOs. I tried using Selenium since it is a dynamic page but I am still having trouble. Here is my code:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests
url = 'https://messari.io/governor/daos'
DRIVER_PATH = 'PATH_TO_DRIVER_ON_MY_PC'
options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")
# s = Service('PATH_TO_DRIVER_ON_MY_PC')
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get('https://messari.io/governor/daos')
try:
matches = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "td")))
# for match in matches:
# print(match.text)
finally:
driver.quit()
Update I fixed the executable_path warning, but I am still getting the same TimeoutException error. And when I run it without headless I also get the following message:
DevTools listening on ws://127.0.0.1:57773/devtools/browser/4450b78d-3a9f-401a-b39c-2c716ecad924
[9628:20616:0525/102300.840:ERROR:device_event_log_impl.cc(214)] [10:23:00.840] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9628:20616:0525/102300.841:ERROR:device_event_log_impl.cc(214)] [10:23:00.841] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
I assume this part is more of a hardware message that I shouldn't worry about based on similar questions bc when I unplugged my mouse it removed one of them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
此页面不使用
&lt; td&gt;
显示daos的列表。它使用
&lt; div&gt;
(带有css
)显示与表相似的显示。并且它在
&lt; h4&gt;
中至少在带有Linux的笔记本电脑上的Firefox中保留DAO的名称。
完整的工作代码(在Linux Mint上测试,Python 3.8,Selenium 4.x,Chrome 101.x)
我使用模块
webdriver_manager
,因此当Linux安装新版本Chrome时,它会 自动下载新的驱动程序
find_elements()
(Wordelements
)或stabere_of_all_elements_located()
获得所有&lt; H4&gt;
。结果:
编辑:
要获取所有可能必须滚动页面的值 - JavaScript将添加新项目。
有答案使用 -loop使用
execute_script()
使用JavaScript代码滚动到底部并获得当前高度。如果高度与滚动之前的高度不同,则必须再次滚动,但是如果高度相同,那么您将有页面结束,现在您可以获得所有项目。This page doesn't use
<td>
to display list of DAOs.It uses
<div>
(withCSS
) to display it similar to table.And it keeps name of DAO in
<h4>
At least it uses and in my Firefox on laptop with Linux.
Full working code (tested on Linux Mint, Python 3.8, Selenium 4.x, Chrome 101.x)
I used module
webdriver_manager
so it automatically downloads fresh driver when Linux installs newer version of ChromeI have to use
find_elements()
(withs
in wordelements
) orpresence_of_all_elements_located()
to get all<h4>
.Result:
EDIT:
TO get all values you may have to scroll page - and JavaScript will add new items.
There are answers which use
while
-loop withexecute_script()
which use JavaScript code to scroll to the bottom and get current height. If height is different than before scroll then you have to scroll again, but if height is the same then you have end of page and now you can get all items.使用
Selenium4
作为键executable_path
被弃用,您必须将service()
class的实例与chromedrivermanager()一起使用。 install()
命令如下讨论With
selenium4
as the keyexecutable_path
is deprecated you have to use an instance of theService()
class along withChromeDriverManager().install()
command as discussed below