在 dbplyr SQL 查询中的 string::str_detect 中使用带有正则表达式的变量

发布于 2025-01-09 00:15:42 字数 1772 浏览 0 评论 0原文

我想根据正则表达式是否出现在任何列中来过滤 SQL 数据库。我想将正则表达式指定为变量;但是它被读取为文字字符串。我无法将正则表达式作为变量输入。感谢您的帮助!

我查阅过的资源:

注意:我在使用 mtcars 数据集制作 reprex 时遇到了问题 https://www.tidyverse.org/blog/2018/01/dbplyr-1-2/。我收到错误:“错误:str_detect() 在此 SQL 变体中不可用”。我无法使用我的实际 SQL 数据库共享 reprex。因此,下面是一个伪表示。

library(dplyr)
library(stringr)

# Variable with regex (either lower- or uppercase "m")
my_string <- "(?i)m"

# WITHOUT SQL DATABASE ----------------------------------------------------

# This runs
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_length(rowname) > 5)

# This runs with STRING
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, "(?i)m"))

# This runs with VARIABLE
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, my_string))

# WITH SQL DATABASE -------------------------------------------------------

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, tibble::rownames_to_column(mtcars), "mtcars")

# This runs
tbl(con, "mtcars") %>% 
  filter(str_length(rowname) > 5)

# This *should* run with STRING -- pretend it does ;)
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, "M"))

# This does NOT run with VARIABLE
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, my_string))

I would like to filter a SQL database based whether a regular expression appears within any column. I would like to specify the regex as a variable; however it is read as a literal string. I am having trouble getting the regex in as a variable. Thank you for your help!

Resources I've consulted:

Note: I had trouble making a reprex using the mtcars dataset, following https://www.tidyverse.org/blog/2018/01/dbplyr-1-2/. I get the error: "Error: str_detect() is not available in this SQL variant". I cannot share a reprex using my actual SQL database. As such, below is a pseudo-reprex.

library(dplyr)
library(stringr)

# Variable with regex (either lower- or uppercase "m")
my_string <- "(?i)m"

# WITHOUT SQL DATABASE ----------------------------------------------------

# This runs
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_length(rowname) > 5)

# This runs with STRING
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, "(?i)m"))

# This runs with VARIABLE
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, my_string))

# WITH SQL DATABASE -------------------------------------------------------

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, tibble::rownames_to_column(mtcars), "mtcars")

# This runs
tbl(con, "mtcars") %>% 
  filter(str_length(rowname) > 5)

# This *should* run with STRING -- pretend it does ;)
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, "M"))

# This does NOT run with VARIABLE
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, my_string))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

瀞厅☆埖开 2025-01-16 00:15:42

这可能在很大程度上取决于您所使用的 SQL 风格。 问题提到了 str_detect 的翻译,并且还提供了选择。

测试 SQL Server:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mssql())
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# Error: str_detect() is not available in this SQL variant

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

看来 str_detect 无法转换为 SQL Server。但您可以使用 %like% 作为解决方法。

MySQL 测试:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mysql()) # changed to mysql
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE ('(?i)m' REGEXP `gear`)

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

看来 str_detect 可以正确翻译为 MySQL。

在每种情况下,my_string 都会被转换为查询。

其他一些需要检查的事情:

  • 更新 dbplyr 的版本。旧版本没有新版本中的所有翻译。
  • 尝试使用行名称以外的列。 R 中的数据框可以有行名,但 SQL 中的表只能有列。

This might depend a lot on the flavour of SQL you are using. This issue mentions a translation for str_detect and also provides an alternative.

Testing for SQL Server:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mssql())
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# Error: str_detect() is not available in this SQL variant

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

So it appears str_detect can not translate to SQL Server. But you can use %like% as a work around.

Testing for MySQL:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mysql()) # changed to mysql
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE ('(?i)m' REGEXP `gear`)

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

So it appears str_detect can be translated correctly for MySQL.

In every case my_string is translated into the query.

A couple of other things to check:

  • Updating the version of dbplyr. Older versions do not have all the translations in newer versions.
  • Try a column other than rownames. Data frames in R can have rownames but tables in SQL can only have columns.
不离久伴 2025-01-16 00:15:42

在同事的帮助下,我有了一个解决方案,可以在 string::str_detect 中使用正则表达式强制评估变量:

tbl(con, "mtcars") %>%
  filter(str_detect(rowname, {{my_string}}))

With the help of a colleague, I have a solution to force evaluation of a variable with regex in string::str_detect:

tbl(con, "mtcars") %>%
  filter(str_detect(rowname, {{my_string}}))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文