爬行问题,但仅加载

发布于 2025-02-13 12:11:30 字数 656 浏览 0 评论 0原文

我尝试了以下代码,但是我只会加载

我的代码是


    driver = webdriver.Chrome()
    wait = WebDriverWait(driver, 20) 
    driver.get("https://www.college.upenn.edu/majors-list")
    #print(driver.title)
    td5 = pq(driver.page_source)

输出,就像这样

Penn List of College Majors\nLoading...  List of College Majors\nLoading... List of College Majors\nLoading... 

我需要获取大学的主要清单,请帮助我。

已经尝试了平柏和苏那努姆,但失败了。

我想要的信息!

I tried the following code, but I only get loading instead

My code is


    driver = webdriver.Chrome()
    wait = WebDriverWait(driver, 20) 
    driver.get("https://www.college.upenn.edu/majors-list")
    #print(driver.title)
    td5 = pq(driver.page_source)

The output is like this

Penn List of College Majors\nLoading...  List of College Majors\nLoading... List of College Majors\nLoading... 

I need to get the College Major list, please help me.

Already tried Pyquery and Selenuim but failed.

The information that I want!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

╰沐子 2025-02-20 12:11:30
wait = WebDriverWait(driver, 30)
driver.get("https://www.college.upenn.edu/majors-list")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Fission Embed']")))
elems=[x.text for x in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".major_header")))]
print(elems)
  1. 您的元素在iframe开关中。
  2. 您在主要列表中的63个元素只需等待所有级别的Major_header和抓取文本的元素,那里有额外的跨度,但也没有文本。

输出:

['Africana Studies', 'Ancient History', 'Anthropology', 'Architecture', 'Asian American Studies (minor)', 'Biochemistry', 'Biology', 'Biophysics', 'Chemistry', 'Cinema and Media Studies', 'Classical Studies', 'Cognitive Science', 'Communication', 'Comparative Literature', 'Criminology', 'Design', 'Digital Humanities', 'Earth Science', 'East Asian Languages and Civilizations', 'Economics', 'Engineering Major', 'English', 'Environmental Studies', 'Fine Arts', 'French and Francophone Studies', "Gender, Sexuality and Women's Studies", 'German', 'Health and Societies', 'Hispanic Studies', 'History', 'History of Art', 'Huntsman Program in International Studies and Business', 'Individualized Major', 'International Relations', 'Italian Studies', 'Jewish Studies', 'Latin American and Latinx Studies', 'Linguistics', 'Logic, Information and Computation', 'Mathematical Economics', 'Mathematics', 'Modern Middle Eastern Studies', 'Music', 'Near Eastern Languages and Civilizations', 'Neuroscience', 'Nutrition Science', 'Philosophy', 'Philosophy, Politics and Economics', 'Physics and Astronomy', 'Political Science', 'Psychology', 'Religious Studies', 'Romance Languages Dual Major', 'Russian and East European Studies', 'Science, Technology and Society', 'Sociology', 'South Asia Studies', 'Theatre Arts', 'Urban Studies', 'Vagelos Integrated Program in Energy Research', 'Vagelos Program in Life Sciences and Management', 'Vagelos Scholars Program in Molecular Life Sciences', 'Visual Studies']

导入:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 30)
driver.get("https://www.college.upenn.edu/majors-list")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Fission Embed']")))
elems=[x.text for x in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".major_header")))]
print(elems)
  1. Your element is in an iframe switch to it.
  2. Your 63 elements within major lists just wait for all elements with the class major_header and grab text there is an extra span but it also has no text.

Output:

['Africana Studies', 'Ancient History', 'Anthropology', 'Architecture', 'Asian American Studies (minor)', 'Biochemistry', 'Biology', 'Biophysics', 'Chemistry', 'Cinema and Media Studies', 'Classical Studies', 'Cognitive Science', 'Communication', 'Comparative Literature', 'Criminology', 'Design', 'Digital Humanities', 'Earth Science', 'East Asian Languages and Civilizations', 'Economics', 'Engineering Major', 'English', 'Environmental Studies', 'Fine Arts', 'French and Francophone Studies', "Gender, Sexuality and Women's Studies", 'German', 'Health and Societies', 'Hispanic Studies', 'History', 'History of Art', 'Huntsman Program in International Studies and Business', 'Individualized Major', 'International Relations', 'Italian Studies', 'Jewish Studies', 'Latin American and Latinx Studies', 'Linguistics', 'Logic, Information and Computation', 'Mathematical Economics', 'Mathematics', 'Modern Middle Eastern Studies', 'Music', 'Near Eastern Languages and Civilizations', 'Neuroscience', 'Nutrition Science', 'Philosophy', 'Philosophy, Politics and Economics', 'Physics and Astronomy', 'Political Science', 'Psychology', 'Religious Studies', 'Romance Languages Dual Major', 'Russian and East European Studies', 'Science, Technology and Society', 'Sociology', 'South Asia Studies', 'Theatre Arts', 'Urban Studies', 'Vagelos Integrated Program in Energy Research', 'Vagelos Program in Life Sciences and Management', 'Vagelos Scholars Program in Molecular Life Sciences', 'Visual Studies']

Imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文