如何使用硒在9GAG中获取所有评论?

发布于 2025-02-05 04:09:13 字数 8028 浏览 1 评论 0原文

我正在努力刮除模因及其从9GAG中的所有评论。 我在下面使用了此代码,但只收到很少的评论。

actions = ActionChains(driver)
link = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
actions.move_to_element(link).click(on_element=link).perform()

我还想通过模拟单击查看更多答复来访问评论中的子拨款。

从html我找到了此xpath element = driver.find_element(by.xpath,“ // div [@class ='vue-recycle-scroller ready page-mode direction-vertical']”)保持评论部分,但我不确定如何通过此元素中的每个评论迭代并模拟这些点击。

如果您想对其进行测试,则该代码应直接起作用。

请帮助我完成以下任务:

  1. 从查看所有评论中获取所有评论
  2. ,并单击查看更多答复以获取所有子截面

我的代码

import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import undetected_chromedriver as uc

if __name__ == '__main__':

    options = Options()
    # options.headless = True
    options.add_argument("start-maximized")  # ensure window is full-screen
    driver = uc.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get("https://9gag.com/gag/a5EAv9O")
    prev_h = 0
    for i in range(10):
        height = driver.execute_script("""
                   function getActualHeight() {
                       return Math.max(
                           Math.max(document.body.scrollHeight, document.documentElement.scrollHeight),
                           Math.max(document.body.offsetHeight, document.documentElement.offsetHeight),
                           Math.max(document.body.clientHeight, document.documentElement.clientHeight)
                       );
                   }
                   return getActualHeight();
               """)
        driver.execute_script(f"window.scrollTo({prev_h},{prev_h + 200})")
        time.sleep(1)
        prev_h += 200
        if prev_h >= height:
            break
    time.sleep(5)
    title = driver.title[:-7]
    try:
        upvotes_count = \
        driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[0]
        comments_count = \
        driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[3]
        upvotes_count = int(upvotes_count) if len(upvotes_count) <= 3 else int("".join(upvotes_count.split(',')))
        comments_count = int(comments_count) if len(comments_count) <= 3 else int("".join(comments_count.split(',')))
        date_posted = driver.find_element(By.XPATH, "//p[@class='message']")
        date_posted = date_posted.text.split("·")[1].strip()
        # actions = ActionChains(driver)
        # link = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
        # actions.move_to_element(link).click(on_element=link).perform()
        element = driver.find_element(By.XPATH,
                                      "//div[@class='vue-recycle-scroller ready page-mode direction-vertical']")
        print(element.text)
        driver.quit()
    except NoSuchElementException or Exception as err:
        print(err)

output

编辑:

我设法使代码工作得更好。它滚动在页面上,直到看到所有注释。如果有子收款,它还单击查看更多答复。

但这只能阅读从中间到结尾的评论。也许随着页面向下滚动,初始注释会动态隐藏。我不知道如何克服这一点。单击查看后,单击几下后,更多的答复停止,然后扔出错误,

selenium.common.exceptions.MoveTargetOutOfBoundsException: Message: move target out of bounds

这是更新的代码

import driver as driver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import undetected_chromedriver as uc

def scroll_page(scrl_hgt):
    prev_h = 0
    for i in range(10):
        height = driver.execute_script("""
                       function getActualHeight() {
                           return Math.max(
                               Math.max(document.body.scrollHeight, document.documentElement.scrollHeight),
                               Math.max(document.body.offsetHeight, document.documentElement.offsetHeight),
                               Math.max(document.body.clientHeight, document.documentElement.clientHeight)
                           );
                       }
                       return getActualHeight();
                   """)
        driver.execute_script(f"window.scrollTo({prev_h},{prev_h + scrl_hgt})")
        time.sleep(1)
        prev_h += scrl_hgt
        if prev_h >= height:
            break

if __name__ == '__main__':
    options = Options()
    # options.headless = True
    driver = uc.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.maximize_window()
    driver.get("https://9gag.com/gag/a5EAv9O")
    time.sleep(5)

    # click on I accept cookies
    actions = ActionChains(driver)
    consent_button = driver.find_element(By.XPATH, '//*[@id="qc-cmp2-ui"]/div[2]/div/button[2]')
    actions.move_to_element(consent_button).click().perform()

    scroll_page(150)
    time.sleep(2)

    # click on fresh comments sectin
    fresh_comments = driver.find_element(By.XPATH, '//*[@id="page"]/div[1]/section[2]/section/header/div/button[2]')
    actions.move_to_element(fresh_comments).click(on_element=fresh_comments).perform()

    time.sleep(5)

    # getting meta data
    title = driver.title[:-7]
    upvotes_count = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[0]
    comments_count = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[3]
    upvotes_count = int(upvotes_count) if len(upvotes_count) <= 3 else int("".join(upvotes_count.split(',')))
    comments_count = int(comments_count) if len(comments_count) <= 3 else int("".join(comments_count.split(',')))
    date_posted = driver.find_element(By.XPATH, "//p[@class='message']")
    date_posted = date_posted.text.split("·")[1].strip()

    time.sleep(3)

    # click on lood more comments button to load all the comments
    load_more_comments = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
    actions.move_to_element(load_more_comments).click(on_element=load_more_comments).perform()

    scroll_page(500)

    print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.comment-list-item__text")])

    comments = driver.find_elements(By.CSS_SELECTOR, "div.vue-recycle-scroller__item-view")
    for item in comments:
        html = item.get_attribute("innerHTML")
        if "comment-list-item__text" in html:
            print(item.find_element(By.CSS_SELECTOR, "div.comment-list-item__text").text)
        elif "comment-list-item__deleted-text" in html:
            print(item.find_element(By.CSS_SELECTOR, "div.comment-list-item__deleted-text").text)

        # get sub comments
        if "comment-list-item__replies" in html:
            #item.find_element(By.CSS_SELECTOR, "div.comment-list-item__replies").click()
            sub_comments = item.find_element(By.CSS_SELECTOR, "div.comment-list-item__replies")
            actions.move_to_element(sub_comments).click(on_element=sub_comments).perform()
        time.sleep(2)
    driver.quit()


PS:我的目标是以他们的顺序获取所有单个注释和所有sub评论(无论是文本,图像,gif等)出现并将其保存在某个地方,以便我能够再次重新创建评论部分。

I'm working on scraping the memes and all their comments from 9gag.
I used this code below but I am only getting few extra comments.

actions = ActionChains(driver)
link = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
actions.move_to_element(link).click(on_element=link).perform()

I would also like to access the subcomments under a comment by simulating click on view more replies.

From the html I found this XPATH element = driver.find_element(By.XPATH, "//div[@class='vue-recycle-scroller ready page-mode direction-vertical']")holds the comments section but I'm not sure how to iterate through each comment in this element and simulate these clicks.

This code should work directly provided the necessary libraries are present in case you wanna test it.

Please help me with these following tasks:

  1. Getting all the comments from view all comments
  2. Iterating through each comment section and clicking on view more replies to get all the subcomments

My Code

import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import undetected_chromedriver as uc

if __name__ == '__main__':

    options = Options()
    # options.headless = True
    options.add_argument("start-maximized")  # ensure window is full-screen
    driver = uc.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get("https://9gag.com/gag/a5EAv9O")
    prev_h = 0
    for i in range(10):
        height = driver.execute_script("""
                   function getActualHeight() {
                       return Math.max(
                           Math.max(document.body.scrollHeight, document.documentElement.scrollHeight),
                           Math.max(document.body.offsetHeight, document.documentElement.offsetHeight),
                           Math.max(document.body.clientHeight, document.documentElement.clientHeight)
                       );
                   }
                   return getActualHeight();
               """)
        driver.execute_script(f"window.scrollTo({prev_h},{prev_h + 200})")
        time.sleep(1)
        prev_h += 200
        if prev_h >= height:
            break
    time.sleep(5)
    title = driver.title[:-7]
    try:
        upvotes_count = \
        driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[0]
        comments_count = \
        driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[3]
        upvotes_count = int(upvotes_count) if len(upvotes_count) <= 3 else int("".join(upvotes_count.split(',')))
        comments_count = int(comments_count) if len(comments_count) <= 3 else int("".join(comments_count.split(',')))
        date_posted = driver.find_element(By.XPATH, "//p[@class='message']")
        date_posted = date_posted.text.split("·")[1].strip()
        # actions = ActionChains(driver)
        # link = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
        # actions.move_to_element(link).click(on_element=link).perform()
        element = driver.find_element(By.XPATH,
                                      "//div[@class='vue-recycle-scroller ready page-mode direction-vertical']")
        print(element.text)
        driver.quit()
    except NoSuchElementException or Exception as err:
        print(err)

Output
Output

Edit:

I managed to make the code work better. It scrolls through the page until it sees all the comments. It also clicks on view more replies if there are subcomments.

But it's only able to read comments from middle to end. Maybe as the page is scrolled down, the initial comments are hidden dynamically. I do not know how to overcome this. And clicking on view more replies stops after some clicks and is throwing the error

selenium.common.exceptions.MoveTargetOutOfBoundsException: Message: move target out of bounds

Here's the updated code

import driver as driver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import time
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException, ElementClickInterceptedException
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import undetected_chromedriver as uc

def scroll_page(scrl_hgt):
    prev_h = 0
    for i in range(10):
        height = driver.execute_script("""
                       function getActualHeight() {
                           return Math.max(
                               Math.max(document.body.scrollHeight, document.documentElement.scrollHeight),
                               Math.max(document.body.offsetHeight, document.documentElement.offsetHeight),
                               Math.max(document.body.clientHeight, document.documentElement.clientHeight)
                           );
                       }
                       return getActualHeight();
                   """)
        driver.execute_script(f"window.scrollTo({prev_h},{prev_h + scrl_hgt})")
        time.sleep(1)
        prev_h += scrl_hgt
        if prev_h >= height:
            break

if __name__ == '__main__':
    options = Options()
    # options.headless = True
    driver = uc.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.maximize_window()
    driver.get("https://9gag.com/gag/a5EAv9O")
    time.sleep(5)

    # click on I accept cookies
    actions = ActionChains(driver)
    consent_button = driver.find_element(By.XPATH, '//*[@id="qc-cmp2-ui"]/div[2]/div/button[2]')
    actions.move_to_element(consent_button).click().perform()

    scroll_page(150)
    time.sleep(2)

    # click on fresh comments sectin
    fresh_comments = driver.find_element(By.XPATH, '//*[@id="page"]/div[1]/section[2]/section/header/div/button[2]')
    actions.move_to_element(fresh_comments).click(on_element=fresh_comments).perform()

    time.sleep(5)

    # getting meta data
    title = driver.title[:-7]
    upvotes_count = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[0]
    comments_count = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content").split(' ')[3]
    upvotes_count = int(upvotes_count) if len(upvotes_count) <= 3 else int("".join(upvotes_count.split(',')))
    comments_count = int(comments_count) if len(comments_count) <= 3 else int("".join(comments_count.split(',')))
    date_posted = driver.find_element(By.XPATH, "//p[@class='message']")
    date_posted = date_posted.text.split("·")[1].strip()

    time.sleep(3)

    # click on lood more comments button to load all the comments
    load_more_comments = driver.find_element(By.XPATH, "//button[@class='comment-list__load-more']")
    actions.move_to_element(load_more_comments).click(on_element=load_more_comments).perform()

    scroll_page(500)

    print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.comment-list-item__text")])

    comments = driver.find_elements(By.CSS_SELECTOR, "div.vue-recycle-scroller__item-view")
    for item in comments:
        html = item.get_attribute("innerHTML")
        if "comment-list-item__text" in html:
            print(item.find_element(By.CSS_SELECTOR, "div.comment-list-item__text").text)
        elif "comment-list-item__deleted-text" in html:
            print(item.find_element(By.CSS_SELECTOR, "div.comment-list-item__deleted-text").text)

        # get sub comments
        if "comment-list-item__replies" in html:
            #item.find_element(By.CSS_SELECTOR, "div.comment-list-item__replies").click()
            sub_comments = item.find_element(By.CSS_SELECTOR, "div.comment-list-item__replies")
            actions.move_to_element(sub_comments).click(on_element=sub_comments).perform()
        time.sleep(2)
    driver.quit()


PS: My goal is to get every single comments and all their sub comments (whether they are text, image, gif, etc) in the order they appear and save them somewhere so that I should be able to recreate the comments section again.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

靑春怀旧 2025-02-12 04:09:13

要提取和打印评论文本,您需要诱导 webververiverwait visibility_of_all_elements_located() ,您可以使用以下 lotatories < /a>:

driver.get("https://9gag.com/gag/a5EAv9O")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.comment-list__load-more"))).click()
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.comment-list-item__text")])

控制台输出:

['Man, the battle of the cults is getting interesting now.', 'rent free in your head', 'Sorry saving all my money up for the Joe Biden Depends Multipack and the Karmella knee pads.', "It's basically a cult now.", "I'll take one. I'm not even American", '', 'that eagle looks familiar.', "Who doesn't want a trump card?"]

注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

To extract and print the comment texts you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use the following Locator Strategies:

driver.get("https://9gag.com/gag/a5EAv9O")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.comment-list__load-more"))).click()
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.comment-list-item__text")])

Console Output:

['Man, the battle of the cults is getting interesting now.', 'rent free in your head', 'Sorry saving all my money up for the Joe Biden Depends Multipack and the Karmella knee pads.', "It's basically a cult now.", "I'll take one. I'm not even American", '', 'that eagle looks familiar.', "Who doesn't want a trump card?"]

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文