将动态值添加到 RMySQL getQuery 中

发布于 2024-10-04 20:46:29 字数 328 浏览 11 评论 0原文

是否可以将 RMySQL 包中的值传递到 dbGetQuery 中的查询中。

例如，如果我在字符向量中有一组值：

df <- c('a','b','c')

并且我想循环遍历这些值以从数据库中为每个值提取特定值。

library(RMySQL)    
res <- dbGetQuery(con, "SELECT max(ID) FROM table WHERE columna='df[2]'")

当我尝试添加对值的引用时，出现错误。想知道是否可以在查询中添加 R 对象的值。

原文

Is it possible to pass a value into the query in dbGetQuery from the RMySQL package.

For example, if I have a set of values in a character vector:

df <- c('a','b','c')

And I want to loop through the values to pull out a specific value from a database for each.

library(RMySQL)    
res <- dbGetQuery(con, "SELECT max(ID) FROM table WHERE columna='df[2]'")

When I try to add the reference to the value I get an error. Wondering if it is possible to add a value from an R object in the query.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉生梦死 2024-10-11 20:46:29

一种选择是在循环内操作 SQL 字符串。目前您有一个字符串文字，'df[2]' 不会被 R 解释为字符以外的任何内容。我的答案中会有一些含糊之处，因为 Q 中的 df 显然不是数据框（它是字符向量！）。像这样的东西会做你想做的事。

将输出存储在数字向量中：

require(RMySQL)
df <- c('a','b','c')
out <- numeric(length(df))
names(out) <- df

现在我们可以循环 df 的元素来执行查询三次。我们可以通过两种方式设置循环：i) 将 i 作为数字，用于引用 df 和 out 的元素，或者ii) 将i依次作为df的每个元素（即a，然后b，...）。我将在下面展示这两个版本。

## Version i
for(i in seq_along(df)) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", df[i], "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

或者：

## Version ii
for(i in df) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", i, "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

您使用哪种取决于个人品味。第二 (ii) 版本要求您在输出向量 out 上设置与 out 内的数据相同的名称。

话虽如此，假设您的实际 SQL 查询与您发布的查询类似，您不能在单个 SQL 语句中使用 GROUP BY 子句在计算之前对数据进行分组吗？代码>最大（ID）？像这样在数据库中做简单的事情可能会快得多。不幸的是，我没有可用的 MySQL 实例，而且我的 SQL-fu 目前很弱，所以我无法给出这样的例子。

One option is to manipulate the SQL string within the loop. At the moment you have a string literal, the 'df[2]' is not interpreted by R as anything other than characters. There are going to be some ambiguities in my answer, because df in your Q is patently not a data frame (it is a character vector!). Something like this will do what you want.

Store the output in a numeric vector:

require(RMySQL)
df <- c('a','b','c')
out <- numeric(length(df))
names(out) <- df

Now we can loop over the elements of df to execute your query three times. We can set the loop up two ways: i) with i as a number which we use to reference the elements of df and out, or ii) with i as each element of df in turn (i.e. a, then b, ...). I will show both versions below.

## Version i
for(i in seq_along(df)) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", df[i], "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

OR:

## Version ii
for(i in df) {
    SQL <- paste("SELECT max(ID) FROM table WHERE columna='", i, "';", sep = "")
    out[i] <- dbGetQuery(con, SQL)
    dbDisconnect(con)
}

Which you use will depend on personal taste. The second (ii) version requires you to set names on the output vector out that are the same as the data inside out.

Having said all that, assuming your actual SQL Query is similar to the one you post, can't you do this in a single SQL statement, using the GROUP BY clause, to group the data before computing max(ID)? Doing simple things in the data base like this will likely be much quicker. Unfortunately, I don't have a MySQL instance around to play with and my SQL-fu is weak currently, so I can't given an example of this.

回复收藏 0 原文