Selenium Python在获取视频时无法在Tiktok中向下滚动
我正在尝试使用Selenium Python打开Tiktok用户页面并向下滚动以加载所有用户视频 我可以打开URL并获取源代码,包括所有已加载的视频数据,但是当向下滚动和时间睡一会儿并获取源代码时,页面代码是带有相同视频的理智,没有任何新的加载!
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/@tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
我还尝试使用此代码向下滚动
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
print(wd.page_source)
,但也没有加载源代码! ,我正在使用Google Colab,有什么帮助吗?
- 更新:将变量“驱动程序”更改为“ WD”
- 的安装代码
更新:这是Chromium驱动程序安装
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
i am trying to use Selenium Python to open tiktok user page and scroll down to load all user videos
i can open the url and get the source code including all loaded videos data, but when scroll down and time sleep for a while and get source code, the page code is the sane with same videos and nothing new is loaded!!
from selenium import webdriver
from selenium.webdriver.common.by import By
import re
import json
from bs4 import BeautifulSoup
import time
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.tiktok.com/@tiktok")
time.sleep(20)
#wd.implicitly_wait(10)
#print(wd.page_source)
SCROLL_PAUSE_TIME = 20
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
print(wd.page_source)
i also tried to use this code for scroll down
wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
print(wd.page_source)
but also nothing is loaded in source code!
, i am using google colab, any help?
- update: changed variable "driver" to "wd"
- update: that's the install code for chromium driver
install
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我遇到了同样的问题。尽管我正在使用
playwright
,但使用playwright-stealth
,而不是selenium
。问题在于Tiktok发现您是无头的,并且会使您踩踏。或至少那是我的问题。
只需添加一个浏览器标志:“ - headless = new” 修复了它。该论点使新发布的新版本的无头铬要使用。而且此版本的检测要少得多。只需确保您使用最近版本的Chromium即可。
I had the same problem. Although I am using
playwright
, withplaywright-stealth
, notselenium
.The problem is that tiktok detects that you are headless and throttles you. Or at least that was my problem.
Simply adding a browser flag: "--headless=new" fixed it. This argument makes a new recently released version of headless chromium to be used. And this version is much less detectable. Just make sure you use a recent version of chromium.
在
中
您开始使用一个不变变量driver
,我将其更改为wd
,然后向下滚动,但是网络显示有试图从那里加载的问题。,并且代码还引发了错误
[9612:864:0614/164525.919:错误:util.cc(127)]无法创建基本目录:c:\ program Files \ Google \ googleupdater
我搜索了此错误,它似乎与Chrome和Chromedriver的版本有关: cant_create_base_directory/“ rel =“ nofollow noreferrer”> https://www.reddit.com/r/selenium/comments/uqt9z9/cant_create_base_base_directory/
我的成就,希望它对它有帮助。 :)
这是我当前的代码
,我提出了无头的论点,以更清楚地看到结果
In the
while
you start using an inexistent variable calleddriver
, i changed it forwd
and it scrolled down, but the web showed that there is a problem trying to load from there.and the code also throws an error
[9612:864:0614/164525.919:ERROR:util.cc(127)] Can't create base directory: C:\Program Files\Google\GoogleUpdater
I searched this error and it seems to be related to the version of chrome and chromedriver as stated here:https://www.reddit.com/r/selenium/comments/uqt9z9/cant_create_base_directory/
That's as far as i achived, hope it helps. :)
Here's my current code
I took out the headless argument to see the results more clearly