这个 URL 循环有什么错误?

发布于 2025-01-14 23:36:32 字数 551 浏览 1 评论 0原文

对于一个 url,该代码可以工作,但对于列表中的多个 url,该代码不起作用,会出现错误。我是r新手,请帮忙。

library(rvest)


for (url in data_list){

webpage = read_html(url)


extracted_urls = webpage %>%
rvest::html_nodes("a") %>%
rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}

错误:

x 必须是长度为 1 的字符串


编辑

OP 评论中的链接。

data_list <- c(
  "ephsports.williams.edu", 
  "wilsonphoenix.com", 
  "wingatebulldogs.com", 
  "ycpspartans.com"
)

For one url the code works, but for multiple urls in a list this does not work, gives an error. I'm new to r, please help.

library(rvest)


for (url in data_list){

webpage = read_html(url)


extracted_urls = webpage %>%
rvest::html_nodes("a") %>%
rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}

Error:

x must be a string of length 1


Edit

Links in OP's comment.

data_list <- c(
  "ephsports.williams.edu", 
  "wilsonphoenix.com", 
  "wingatebulldogs.com", 
  "ycpspartans.com"
)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

臻嫒无言 2025-01-21 23:36:33

在 for 循环中创建的变量每次迭代都会被覆盖。在这里,extracted_urls 被反复破坏。在循环外部创建接收器对象(尝试 r <- list())允许将结果逐步添加到全局环境中的对象,这将在 for 循环内的本地环境外部保持可访问性。

Variables created in a for loop get overwritten each iteration. Here, extracted_urls gets repeatedly clobbered. Creating a receiver object outside the loop (try r <- list()) permits adding results stepwise to an object in the global environment, which will remain accessible outside the local environment within the for loop.

病毒体 2025-01-21 23:36:33

由于某些网址不起作用,我们可以使用 possible 函数跳过它们。

library(rvest)
library(tidyverse)

data_list <- c(
  'https://wilsonphoenix.com', 
 'https://wingatebulldogs.com',
'https://ycpspartans.com/sorry.ashx'
)
#the third link is broken 

# we create a function to get required info. 
roster = function(x){ 
webpage = read_html(x)
extracted_urls = webpage %>%
  rvest::html_nodes("a") %>%
  rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}
}

现在我们循环包含 urls data_list 的向量并跳过有错误的向量。

df <- map(data_list, 
                  possibly(roster, otherwise = NA_character_)) 
                  

As some of the urls are not working, we can skip them using possibly function.

library(rvest)
library(tidyverse)

data_list <- c(
  'https://wilsonphoenix.com', 
 'https://wingatebulldogs.com',
'https://ycpspartans.com/sorry.ashx'
)
#the third link is broken 

# we create a function to get required info. 
roster = function(x){ 
webpage = read_html(x)
extracted_urls = webpage %>%
  rvest::html_nodes("a") %>%
  rvest::html_attr("href")
extracted_urls = extracted_urls[grep("roster", extracted_urls)]
extracted_urls}
}

Now we loop over vector containing urls data_list and skipping the one with errors.

df <- map(data_list, 
                  possibly(roster, otherwise = NA_character_)) 
                  
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文