当前位置：文江博客话题详情

如何抓取Google People的问题和答案，还向Selenium和Python询问比Google默认输出的数量？

发布于 2025-01-28 23:07:28 字数 194 浏览 1 评论 0原文

我找到了一个很好的解决方案，但它可以默认情况下Google给出的问题和答案数量，但是例如i需要更多。

我是Python的新手开发人员。我如何获得更多问题和答案？我是否必须先点击以披露所需的金额然后解析？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掩于岁月 2025-02-04 23:07:28

以下代码解析出现在屏幕上的问题，然后询问您是否要解析更多问题。如果输入y，则单击“最后一个问题”按钮，以便在页面中加载更多内容。这些问题存储在列表问题，列表中的答案答案

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

your_path = '...'
driver = webdriver.Chrome(service=Service(your_path))

driver.get('https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')

questions, answers = [], []
while 1:
    for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
        if idx >= len(questions): # skip already parsed questions
            questions.append(question.text)
            txt = ''
            for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
                txt += answer.get_attribute('innerText')
            answers.append(txt)
    inp = input(f'{idx+1} questions parsed, continue? (y/n)')
    if inp == 'y':
        question.click()
        time.sleep(2)
    else:
        break

The following code parse the questions appearing on screen, then asks if you want to parse more questions or not. If you enter y then it clicks on the last question's button so that more are loaded in the page. The questions are stored in the list questions, the answers in the list answers

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

your_path = '...'
driver = webdriver.Chrome(service=Service(your_path))

driver.get('https://www.google.com/search?q=How%20to%20make%20bakery%3F&source=hp&ei=j0aZYYjRAvja2roPrcWcyAU&iflsig=ALs-wAMAAAAAYZlUn4NMUPjfIpQmrXSmjIDnaWjJXWIJ&ved=0ahUKEwjI1JDn0Kf0AhV4rVYBHa0iB1kQ4dUDCAc&uact=5&oq=How%20to%20make%20bakery%3F&gs_lcp=Cgdnd3Mtd2l6EAMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBMyBAgAEBNQAFgAYJMDaABwAHgAgAF-iAF-kgEDMC4xmAEAoAECoAEB&sclient=gws-wiz')

questions, answers = [], []
while 1:
    for idx,question in enumerate(driver.find_elements(By.CSS_SELECTOR, "div[id*='RELATED_QUESTION']")):
        if idx >= len(questions): # skip already parsed questions
            questions.append(question.text)
            txt = ''
            for answer in question.find_elements(By.CSS_SELECTOR, "div[id*='WEB_ANSWERS_RESULT']"):
                txt += answer.get_attribute('innerText')
            answers.append(txt)
    inp = input(f'{idx+1} questions parsed, continue? (y/n)')
    if inp == 'y':
        question.click()
        time.sleep(2)
    else:
        break

回复收藏 0 原文

~没有更多了~