使用 Rvest 同时抓取网站列表
我正在尝试抓取多个产品目录,每个链接都是指向不同产品的链接。
网页是包含链接的数据框。
webpages
"https............"
"https............"
"https............"
我有以下代码:
for (i in webpages){
book_page <- read_html(link)
}
我收到此错误错误:x必须是长度为1的字符串
,
我可以知道如何解决它吗?
I am trying to scrape multiple product catalogues and each link is the link towards a different product.
Webpages is a data frame containing the links.
webpages
"https............"
"https............"
"https............"
I have the following code:
for (i in webpages){
book_page <- read_html(link)
}
I got this error Error: x must be a string of length 1
,
may I know how could I resolve it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
for 循环不会按照问题标题的要求同时下载多个网站。但是,您可以使用并行化包,例如 pbmcapply:
< super>由 reprex 创建于 2022-03-01 package (v2.0.1)
read_html
必须在主线程中执行,以避免指针错误。A for loop does not download multiple website at the same time as required by the title of your question. However, you can use a parallelization package e.g. pbmcapply:
Created on 2022-03-01 by the reprex package (v2.0.1)
read_html
must be executed in the main thread to circumvent pointer errors.