使用硒从Twitter提取推文
大家好,我对从Twitter提取推文的问题有一个问题向下滚动页面不能加载新的推文并停止滚动,并且没有新的推文
当我设置n = 1000时,他的工作正常,但是当他到达600或400滚动时,滚动停止且没有推文出现 如果有人能帮助我,我会很高兴 多谢
我的代码是:
def scrap_tweets_without(url,no_scroll):
drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
drive.get(url)
##################################################
################## GET SUCCES ##################
##################################################
texts = []
time.sleep(3)
# Start Scroll Tweets
for i in tqdm.tqdm(range(no_scroll)):
## scroll down
SCROLL_PAUSE_TIME = 0.3
# Get scroll height
drive.execute_script("window.scrollBy(0,200)", "")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
try:
# Get Group of Tweets
tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
# Insert Tweet in the List
for tx in tweets:
if tx.text not in texts:
texts.append(tx.text)
except:
pass
return texts
url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)
Selenuim浏览器的此屏幕600后向下滚动页面滚动的滚动不能超过更多,这给我带来了450条推文,我相信,一个主题标签或搜索页面中有400条的推文,如果有人可以帮助您,为什么页面可以加载更多
hello all I have a problem about extract tweets from twitter I write a script to go to one of the trending page on twitter and scroll down (N Times) and when scroll it extract tweet and that is work with me fine but after a number of scrolling down the page can't load new tweets and stop scrolling and no new tweets appear
when I set N=1000 for example he work fine but when he reach 600 or 400 scroll , the scroll stop and no tweets appear
I will be very happy if any one can help me
thanks a lot
my code is:
def scrap_tweets_without(url,no_scroll):
drive = webdriver.Chrome(r'C:\selinum\chromedriver.exe')
drive.get(url)
##################################################
################## GET SUCCES ##################
##################################################
texts = []
time.sleep(3)
# Start Scroll Tweets
for i in tqdm.tqdm(range(no_scroll)):
## scroll down
SCROLL_PAUSE_TIME = 0.3
# Get scroll height
drive.execute_script("window.scrollBy(0,200)", "")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
try:
# Get Group of Tweets
tweets = drive.find_elements_by_xpath('//div[@data-testid="tweetText" and @lang="ar"]')
# Insert Tweet in the List
for tx in tweets:
if tx.text not in texts:
texts.append(tx.text)
except:
pass
return texts
url ='https://twitter.com/search?q="جمال علام"&src=trend_click&pt=1535911024460718080&vertical=trends'
data = scrap_tweets_without(url,1000)
this screen of selenuim browser after 600 scroll down the page can't scroll more than that and that give me around 450 tweets i believe that there is more tweets than 400 in one hashtag or in search page if any one can help why page can load more than that
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在许多消息来源进行搜索之后,我发现我的问题是Twitter知道我是一个Selunuim机器人,而不是用户,所以请在向下滚动时停止加载更多的推文,所以添加此功能,这对我有帮助
after search in a lot of sources i found that my problem is that twitter know that i 'am a selunuim bot not user so stop loading more tweets when i scroll down so add this function and this help me