Selenium“ Move_to_element”动作循环运行

发布于 2025-01-31 11:24:08 字数 1132 浏览 2 评论 0原文

我正在尝试在Instagram上的帖子上使用Web-Scrape。

我已经尝试浏览每个帖子,但是在一定数量的帖子之后,Instagram停止响应请求。

因此,现在我试图在不开放帖子的情况下刮擦类似的数量,即当您将鼠标悬停在缩略图上时,它显示出您的样子和评论计数。
缩略图的引用。

为此,我以这种方式使用move_to_element函数:

1-搜索列表帖子。
2-使用move_to_element在这些帖子上徘徊。
3-刮擦数据。
4-滚动
重复步骤1-4。

我的程序通过第一组没有任何问题和卷轴。 但是,然后开始从第一篇文章中浏览,而不是从滚动后停止或第一个元素加载的位置。

代码(简化):

newPost = True
while newPost:
    try:
        action =  webdriver.ActionChains(driver)
        newPost = False
        var = WebDriverWait ...
        container = driver.find_elements .. #GET THE LIST OF POSTS
        for post in container:
            action.move_to_element(post).perform()
            link = post.find_element(By.TAG_NAME, 'a').get_attribute("href")
            likes = ...
            
            if link not in postData:  #TO CHECK IF ENCOUNTERED NEW POST
                postData[link] = likes
                newPost = True 

        driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
        sleep(5)

I'm trying to web-scrape likes on posts on Instagram.

I've tried going through each post but after a certain number of posts Instagram stop responding to the requests.

So now I'm trying to scrape the like count without opening post, i.e when you hover your mouse on a post thumbnail it shows you the like and comment count.
Reference of thumbnail.

For this I use the move_to_element function in this manner:

1 - Search list of posts.
2 - Use move_to_element to hover on those posts.
3 - Scrape Data.
4 - Scroll
Repeat Steps 1-4.

My program goes through the first group without any problem and scrolls.
But then starts going through from the 1st post and not from where it stopped or first element loaded after scrolling.

Code (Simplified):

newPost = True
while newPost:
    try:
        action =  webdriver.ActionChains(driver)
        newPost = False
        var = WebDriverWait ...
        container = driver.find_elements .. #GET THE LIST OF POSTS
        for post in container:
            action.move_to_element(post).perform()
            link = post.find_element(By.TAG_NAME, 'a').get_attribute("href")
            likes = ...
            
            if link not in postData:  #TO CHECK IF ENCOUNTERED NEW POST
                postData[link] = likes
                newPost = True 

        driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
        sleep(5)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

笑饮青盏花 2025-02-07 11:24:08

问题在于,您是在变量(container)上循环,该变量随着加载更多的帖子而未更新。特别

container = driver.find_elements ...

是在循环之前运行,因此在向下滚动并加载新帖子时不会更新它。假设您第一次打开网页时,加载了10个帖子,然后container将是10个WebElements的列表。当Python向下滚动并加载更多帖子时,它们不会添加到container中。要解决此问题,您必须使用此逻辑

number_of_posts_to_scrape = 50
for idx in range(number_of_posts_to_scrape):
    container = driver.find_elements .. #GET THE LIST OF POSTS
    post = container[idx]
    action.move_to_element(post).perform()
    link = post.find_element(By.TAG_NAME, 'a').get_attribute("href")
    likes = ...
    

The problem is that you are looping on a variable (container) which is not updated as more posts are loaded. In particular

container = driver.find_elements ...

is run before the loop, so it is not updated as you scroll down and new posts are loaded in the webpage. Let's say that when you open the webpage for the first time, 10 posts are loaded, then container will be a list of 10 webelements. When python scrolls down and more posts are loaded, they are not added to container. To solve this problem you have to use this logic

number_of_posts_to_scrape = 50
for idx in range(number_of_posts_to_scrape):
    container = driver.find_elements .. #GET THE LIST OF POSTS
    post = container[idx]
    action.move_to_element(post).perform()
    link = post.find_element(By.TAG_NAME, 'a').get_attribute("href")
    likes = ...
    
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文