用 R 抓取 IMDB 电影的所有评论
我编写了代码来抓取电影的评论和详细评论。
但它会抓取已经加载到页面的信息。 (例如:如果有1000条评论,则网页只先显示10条评论,点击“加载更多”后才会显示其他评论)
require(rvest)
require(dplyr)
MOVIE_URL <- read_html("https://www.imdb.com/title/tt0167260/reviews?ref_=tt_urv")
ex_review <- MOVIE_URL %>% html_nodes(".lister-item a") %>%
html_text()
detialed <- MOVIE_URL %>% html_nodes(".content")%>%
html_text()
有没有办法抓取每条评论的信息?
I wrote the code to scrape the review and the detailed review for a movie.
But it scrape information that has been already loaded to the page. (Ex: If there are 1000 reviews, the web page only shows the 10 reviews first. The other reviews will display after clicking "Load more")
require(rvest)
require(dplyr)
MOVIE_URL <- read_html("https://www.imdb.com/title/tt0167260/reviews?ref_=tt_urv")
ex_review <- MOVIE_URL %>% html_nodes(".lister-item a") %>%
html_text()
detialed <- MOVIE_URL %>% html_nodes(".content")%>%
html_text()
Is there a way to scrape the information of every review?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这类似于上一个问题(如何使用 rvest 从 IMDB 中抓取所有电影评论),尽管答案不再有效。
现在,当您查看单页评论时,请说 (https://www.imdb. com/title/tt0167260/reviews),您可以通过以下网址加载下一页评论评论:
movieurl = “https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey="+pagination_key
其中
pagination_key
是隐藏在 html 中的数据键:。
因此,如果您从
movie_url = "https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey=g4xolermtiqhejcxxxxgs753i36t52q343andv6xeade6qp6qwx57ziim2edmxvqz2tftug54"
检索 html,您将获得第二个评论页面。要访问第三页,您需要重复该过程,即从第二页查找分页键并重复。
This is similar to a previous question (How to scrape all the movie reviews from IMDB using rvest), though the answer no longer works.
Now when you are looking at a single page of reviews, say (https://www.imdb.com/title/tt0167260/reviews), you can load the next page of reviews reviews via the url:
movieurl = "https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey="+pagination_key
where
pagination_key
is the data-key hidden in the html under:<div class="load-more-data" data-key="g4xolermtiqhejcxxxgs753i36t52q343andv6xeade6qp6qwx57ziim2edmxvqz2tftug54" data-ajaxurl="/title/tt0167260/reviews/_ajax">
.So if you retrieve the html from
movie_url = "https://www.imdb.com/title/tt0167260/reviews/_ajax?&paginationKey=g4xolermtiqhejcxxxgs753i36t52q343andv6xeade6qp6qwx57ziim2edmxvqz2tftug54"
you will get the second page of reviews.To then access the third page you need to repeat the process i.e. look for the pagination key from this second page and repeat.