如何使用 Selenium 和 Python 抓取 Stockrover 网站内的所有公司名称
如果下面的代码从表中抓取第一个公司名称 IBM,我将如何编码以从表的第一列中抓取所有公司名称?
相关代码:
table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))
例如,我需要的下一个是#gridview-1070-record-2990等等。
当前结果:
IBM
期望结果:
IBM
Microsoft Corporation
Apple Corporation
Google
Tesla
etc.
完整代码:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
import pandas as pd
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
ser = Service("./chromedriver.exe")
browser = driver = webdriver.Chrome(service=ser)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
wait = WebDriverWait(driver, 30)
driver.get("https://stockrover.com")
wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]/div/section[2]/div/ul/li[2]"))).click()
user = driver.find_element(By.NAME, "username")
password = driver.find_element(By.NAME, "password")
user.clear()
user.send_keys("vibajajo64")
password.clear()
password.send_keys("vincer64")
driver.find_element(By.NAME, "Sign In").click()
wait = WebDriverWait(driver, 30)
table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))
for tab in table:
print(tab.text)
If this code below scrapes the first company name, IBM from a table, how would I code it to scrape all the company names from the first column in the table?
Pertinent Code:
table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))
For instance, the next one I need is #gridview-1070-record-2990 and so on.
Current Result:
IBM
Desired Results:
IBM
Microsoft Corporation
Apple Corporation
Google
Tesla
etc.
Full Code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
import pandas as pd
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
ser = Service("./chromedriver.exe")
browser = driver = webdriver.Chrome(service=ser)
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
driver.execute_cdp_cmd("Network.enable", {})
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
wait = WebDriverWait(driver, 30)
driver.get("https://stockrover.com")
wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]/div/section[2]/div/ul/li[2]"))).click()
user = driver.find_element(By.NAME, "username")
password = driver.find_element(By.NAME, "password")
user.clear()
user.send_keys("vibajajo64")
password.clear()
password.send_keys("vincer64")
driver.find_element(By.NAME, "Sign In").click()
wait = WebDriverWait(driver, 30)
table = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '#gridview-1070-record-2989')))
for tab in table:
print(tab.text)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
从网站内的所有
元素中提取并打印文本,例如 IBM、Microsoft Corporation 等stockrover,而不是 presence_of_all_elements_ located() 你需要诱导WebDriver等待 href="https://stackoverflow.com/a/64770041/7429447">visibility_of_all_elements_ located() 并且您可以使用以下任一定位器策略:
使用CSS_SELECTOR:
使用XPATH:
注意:您必须添加以下导入:
To extract and print the texts e.g. IBM, Microsoft Corporation, etc from all of the
<table>
elements within the website stockrover, instead of presence_of_all_elements_located() you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:Using CSS_SELECTOR:
Using XPATH:
Note : You have to add the following imports :
您可以将相关代码放入 for 循环中,然后格式化输入的字符串以根据索引进行搜索,这样
这将为您提供一个公司数组
You can put the pertinent code in a for loop and then format the string inputted to search according to an index like so
this will give you an array of the companies