过滤 tibble 列以仅包含在单独的 tibble 中找到的值
问题
我有一个包含基本股票代码信息的小标题(此处以 .csv 文件形式提供:https ://www.nasdaq.com/market-activity/stocks/screener)。
如何仅筛选第二个小得多的小标题(称为 DowJones
)中列出的公司?请注意,数据集之间的公司名称并不完全一致(即 symbolData
中的“Apple Inc. - Common Stock”与 DowJones
中的“Apple Inc.”)。
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
之前的尝试
我已经尝试过多种方法,包括 %in%
/%chin%
、grep
/grepl< /code>、
agrepl
/agrep
、str_detect
,将DowJones
数据框转换为列表,并进行所有排序其他方法我不能 记住。到目前为止,我尝试过的所有操作都返回了空的 tibble,或者与不同列长度相关的错误消息。一些例子:
filter(symbolData, sapply(1:nrow(.), function(i) grepl(DowJones$`Dow Jones Industrial Average`[i], symbolData$Security.Name[i])))
#returns empty tibble
filter(symbolData, str_detect(symbolData$Security.Name, DowJones$`Dow Jones Industrial Average`) == TRUE)
Warning message:
In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
longer object length is not a multiple of shorter object length
#returns empty tibble
filter(symbolData, unlist(Map(function(x, y) grepl(x, y), DowJones$`Dow Jones Industrial Average`, symbolData$Security.Name)))
Warning message:
In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
#returns empty tibble
filter(symbolData, map2_lgl(symbolData$Security.Name, DowJones$`Dow Jones Industrial Average`, str_detect))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `map2_lgl(...)`.
x Mapped vectors must have consistent lengths:
* `.x` has length 5587
* `.y` has length 30
filter(symbolData, agrepl(DowJones$`Dow Jones Industrial Average`, symbolData$Security.Name, ignore.case = T, fixed = F))
#returns empty tibble
Question
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
filter(symbolData, sapply(1:nrow(.), function(i) grepl(DowJonesQuestion
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
Dow Jones Industrial Average`[i], symbolData$Security.Name[i])))
#returns empty tibble
filter(symbolData, str_detect(symbolData$Security.Name, DowJonesQuestion
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
Dow Jones Industrial Average`) == TRUE)
Warning message:
In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
longer object length is not a multiple of shorter object length
#returns empty tibble
filter(symbolData, unlist(Map(function(x, y) grepl(x, y), DowJonesQuestion
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
Dow Jones Industrial Average`, symbolData$Security.Name)))
Warning message:
In mapply(FUN = f, ..., SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter
#returns empty tibble
filter(symbolData, map2_lgl(symbolData$Security.Name, DowJonesQuestion
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
Dow Jones Industrial Average`, str_detect))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `map2_lgl(...)`.
x Mapped vectors must have consistent lengths:
* `.x` has length 5587
* `.y` has length 30
filter(symbolData, agrepl(DowJonesQuestion
I have a tibble containing basic stock symbol information (available here as a .csv file: https://www.nasdaq.com/market-activity/stocks/screener).
How do I filter this tibble (call it symbolData
) for only the companies listed in a second, much smaller tibble (call it DowJones
)? Note, the company names are not perfectly consistent between the datasets (i.e. 'Apple Inc. - Common Stock' in symbolData
vs. 'Apple Inc.' in DowJones
).
Reprex
#packages
library(dplyr) library(tibble)
library(httr) library(utils)
library(reshape2) library(xml2)
library(rvest)
remove_arrows <- function(x) {sub("[[:space:]]↑", "", x)}
DowJones <- "https://en.wikipedia.org/wiki/Historical_components_of_the_Dow_Jones_Industrial_Average" %>%
GET(config = config(ssl_verifypeer = FALSE)) %>% read_html() %>% html_node(".wikitable") %>% html_table(fill = TRUE) %>%
as_tibble() %>%
filter(!grepl('↓|Dropped', X1)) %>%
rowid_to_column("index") %>%
melt(id.vars="index", value.name="Dow Jones Industrial Average") %>%
select(-c("variable","index")) %>%
mutate(across("Dow Jones Industrial Average", remove_arrows)) %>% as_tibble()
symbolData <- read.csv("~/nasdaq_screener.csv") %>% as_tibble()
> head(DowJones)
# A tibble: 6 × 1
`Dow Jones Industrial Average`
<chr>
1 3M Company
2 American Express Company
3 Amgen Inc.
4 Apple Inc.
5 The Boeing Company
6 Caterpillar Inc.
> head(symbolData)
# A tibble: 6 × 11
Symbol Name Last.Sale Net.Change X..Change Market.Cap Country IPO.Year Volume Sector Industry
<chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <int> <int> <chr> <chr>
1 A "Agilent Technologies Inc. Common Stock" "$133.73 " 5.58 4.35% 40167959890 "United States" 1999 3144474 "Capi… "Electr…
2 AA "Alcoa Corporation Common Stock " "$77.85 " 4.55 6.21% 14332165382 "" 2016 7327361 "Basi… "Metal …
3 AAC "Ares Acquisition Corporation Class A Ordinary Shares" "$9.76 " 0.01 0.10% 1220000000 "" 2021 99883 "Fina… "Busine…
4 AACG "ATA Creativity Global American Depositary Shares" "$1.36 " 0.02 1.49% 42672611 "China" NA 7920 "Misc… "Servic…
5 AACI "Armada Acquisition Corp. I Common Stock" "$9.81 " 0.01 0.10% 203160195 "United States" 2021 264 "" ""
6 AACIW "Armada Acquisition Corp. I Warrant" "$0.23 " -0.0599 -20.66% 0 "United States" 2021 184363 "" ""
Previous Attempts
I've already tried a number of methods, including %in%
/%chin%
, grep
/grepl
, agrepl
/agrep
, str_detect
, turning the DowJones
data frame into a list, and all sorts of other methods I can't remember. Everything I've tried so far has returned either an empty tibble, or an error message related to differing column lengths. Some examples:
Dow Jones Industrial Average`, symbolData$Security.Name, ignore.case = T, fixed = F))
#returns empty tibble
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
应该返回多少?这将返回 7
How many should it return? This returns 7