使用硒从Twitter提取推文

发布于 2025-02-06 20:10:12 字数 1596 浏览 2 评论 0原文

大家好,我对从Twitter提取推文的问题有一个问题向下滚动页面不能加载新的推文并停止滚动,并且没有新的推文

当我设置n = 1000时,他的工作正常,但是当他到达600或400滚动时,滚动停止且没有推文出现 如果有人能帮助我,我会很高兴 多谢
我的代码是:

def scrap_tweets_without(url,no_scroll):
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    drive.get(url)
    ################################################## 
    ################## GET   SUCCES ##################
    ##################################################
    texts = []

    time.sleep(3)        

    # Start Scroll Tweets
    for i in tqdm.tqdm(range(no_scroll)):
        ## scroll down 
        SCROLL_PAUSE_TIME = 0.3

        # Get scroll height
        drive.execute_script("window.scrollBy(0,200)", "")


        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)
        try:
            # Get Group of Tweets
            tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
        
            # Insert Tweet in the List 
            for tx in tweets:
                if tx.text not in texts:
                    texts.append(tx.text)
        except:
            pass
    return texts



url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)


Selenuim浏览器的此屏幕600后向下滚动页面滚动的滚动不能超过更多,这给我带来了450条推文,我相信,一个主题标签或搜索页面中有400条的推文,如果有人可以帮助您,为什么页面可以加载更多

hello all I have a problem about extract tweets from twitter I write a script to go to one of the trending page on twitter and scroll down (N Times) and when scroll it extract tweet and that is work with me fine but after a number of scrolling down the page can't load new tweets and stop scrolling and no new tweets appear

when I set N=1000 for example he work fine but when he reach 600 or 400 scroll , the scroll stop and no tweets appear
I will be very happy if any one can help me
thanks a lot

my code is:

def scrap_tweets_without(url,no_scroll):
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    drive.get(url)
    ################################################## 
    ################## GET   SUCCES ##################
    ##################################################
    texts = []

    time.sleep(3)        

    # Start Scroll Tweets
    for i in tqdm.tqdm(range(no_scroll)):
        ## scroll down 
        SCROLL_PAUSE_TIME = 0.3

        # Get scroll height
        drive.execute_script("window.scrollBy(0,200)", "")


        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)
        try:
            # Get Group of Tweets
            tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
        
            # Insert Tweet in the List 
            for tx in tweets:
                if tx.text not in texts:
                    texts.append(tx.text)
        except:
            pass
    return texts



url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)


this screen of selenuim browser after 600 scroll down the page can't scroll more than that and that give me around 450 tweets i believe that there is more tweets than 400 in one hashtag or in search page if any one can help why page can load more than that

enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌上芳菲 2025-02-13 20:10:12

在许多消息来源进行搜索之后,我发现我的问题是Twitter知道我是一个Selunuim机器人,而不是用户,所以请在向下滚动时停止加载更多的推文,所以添加此功能,这对我有帮助

def initilaize_driver():
    options = webdriver.ChromeOptions()
    header = Headers().generate()['User-Agent']
    options.add_argument('--headless')  # runs browser in headless mode
    options.add_argument('--no-sandbox')
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--disable-gpu')
    options.add_argument('--log-level=3')
    options.add_argument('--disable-notifications')
    options.add_argument('--disable-popup-blocking')
    options.add_argument('--user-agent={}'.format(header))
   
    drive =webdriver.Chrome(executable_path=ChromeDriverManager().install(),
                        options= options, )
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    return drive

after search in a lot of sources i found that my problem is that twitter know that i 'am a selunuim bot not user so stop loading more tweets when i scroll down so add this function and this help me

def initilaize_driver():
    options = webdriver.ChromeOptions()
    header = Headers().generate()['User-Agent']
    options.add_argument('--headless')  # runs browser in headless mode
    options.add_argument('--no-sandbox')
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--disable-gpu')
    options.add_argument('--log-level=3')
    options.add_argument('--disable-notifications')
    options.add_argument('--disable-popup-blocking')
    options.add_argument('--user-agent={}'.format(header))
   
    drive =webdriver.Chrome(executable_path=ChromeDriverManager().install(),
                        options= options, )
    drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
    return drive

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文