在关系数据库中存储 R 对象

发布于 2024-08-03 18:19:31 字数 204 浏览 3 评论 0原文

我经常对从关系数据库中提取的数据创建非参数统计数据(黄土、内核密度等)。为了使数据管理更容易,我想将 R 输出存储回我的数据库中。对于简单的数字或文本数据框来说,这很容易,但我还没有弄清楚如何将 R 对象存储回我的关系数据库中。那么有没有一种方法可以将内核密度向量存储回关系数据库中?

现在,我通过将 R 对象保存到网络驱动器空间来解决这个问题,以便其他人可以根据需要加载对象。

I frequently create nonparametric statistics (loess, kernel densities, etc) on data I pull out of a relational database. To make data management easier I would like to store R output back inside my DB. This is easy with simple data frames of numbers or text, but I have not figured out how to store R objects back in my relational database. So is there a way to store a vector of kernel densities, for example, back into a relational database?

Right now I work around this by saving the R objects to a network drive space so others can load the objects as needed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

尐偏执 2024-08-10 18:19:31

使用序列化功能将任何 R 对象转换为(原始或字符)字符串,然后存储该字符串。请参阅帮助(序列化)

反转此操作以进行检索:获取字符串,然后将 unserialize() 转换为 R 对象。

Use the serialization feature to turn any R object into a (raw or character) string, then store that string. See help(serialize).

Reverse this for retrieval: get the string, then unserialize() into a R object.

许一世地老天荒 2024-08-10 18:19:31

一个 R 变量示例,相当复杂:

library(nlme)
model <- lme(uptake ~ conc + Treatment, CO2, random = ~ 1 | Plant / Type)

R 变量的最佳存储数据库方法取决于您想要如何使用它。

我需要对值进行数据库内分析

在这种情况下,您需要将对象分解为数据库可以本地处理的值。这通常意味着将其转换为一个或多个数据帧。最简单的方法是使用 broom 包。

library(broom)
coefficients_etc <- tidy(model)
model_level_stats <- glance(model)
row_level_stats <- augment(model)

我只想要存储

在这种情况下,您想要序列化 ​​R 变量。也就是说,将它们转换为字符串或二进制 blob。有几种方法可以实现这一点。


我的数据必须可由 R 以外的程序访问,并且需要易于人类阅读

您应该以跨平台文本格式存储数据;可能是 JSON 或 YAML。 JSON 不支持一些重要概念,例如 Inf; YAML 更通用,但 R 中的支持还不够成熟。 XML 也是可能的,但过于冗长,无法用于存储大型数组。

library(RJSONIO)
model_as_json <- toJSON(model)
nchar(model_as_json) # 17916

library(yaml)
# yaml package doesn't yet support conversion of language objects,
# so preprocessing is needed
model2 <- within(
  model,
  {
     call <- as.character(call)
     terms <- as.character(terms)
  }
)
model_as_yaml <- as.yaml(model2) 
nchar(model_as_yaml) # 14493

我的数据必须可由 R 以外的程序访问,并且不需要人类可读

您可以将数据写入开放的跨平台二进制格式,例如 HFD5。目前支持 HFD5 文件(通过 rhdf5) 是有限的,因此不支持复杂的对象。 (您可能需要unclass

library(rhdf5)
h5save(rapply(model2, unclass, how = "replace"), file = "model.h5")
bin_h5 <- readBin("model.h5", "raw", 1e6)
length(bin_h5) # 88291 not very efficient in this case

feather 包让您能够以 R 和 Python 都可读的格式保存数据帧。要使用它,您首先必须将模型对象转换为数据帧,如答案前面的扫帚部分所述。

library(feather)
library(broom)
write_feather(augment(model), "co2_row.feather")  # 5474 bytes
write_feather(tidy(model), "co2_coeff.feather")   # 2093 bytes
write_feather(glance(model), "co2_model.feather") #  562 bytes

另一种替代方法是将变量的文本版本(请参阅上一节)保存到压缩文件并将其字节存储在数据库中。

writeLines(model_as_json)
tar("model.tar.bz", "model.txt", compression = "bzip2")
bin_bzip <- readBin("model.tar.bz", "raw", 1e6)
length(bin_bzip) # only 42 bytes!

我的数据只需可由 R 访问,并且需要可读

将变量转换为字符串有两种选择: 序列化deparse

p <- function(x)
{
  paste0(x, collapse = "\n")
}

serialize 需要发送到文本连接,您可以写入控制台并捕获它,而不是写入文件。

 model_serialized <- p(capture.output(serialize(model, stdout())))
 nchar(model_serialized) # 23830

deparsecontrol = "all" 结合使用,可以在稍后重新解析时最大化可逆性。

model_deparsed <- p(deparse(model, control = "all"))
nchar(model_deparsed) # 22036

我的数据只需可由 R 访问,不需要人类可读

前面部分中显示的相同类型的技术也可以应用于此处。您可以压缩序列化或解析的变量并将其重新读取为原始向量。

serialize 还可以以二进制格式写入变量。在这种情况下,最容易与其包装器 saveRDS 一起使用。

saveRDS(model, "model.rds")
bin_rds <- readBin("model.rds", "raw", 1e6)
length(bin_rds) # 6350

An example R variable, that's fairly complex:

library(nlme)
model <- lme(uptake ~ conc + Treatment, CO2, random = ~ 1 | Plant / Type)

The best storage database method for R variables depends upon how you want to use it.

I need to do in-database analytics on the values

In this case, you need to break the object down into values that the database can handle natively. This usually means converting it into one or more data frames. The easiest way to do this is to use the broom package.

library(broom)
coefficients_etc <- tidy(model)
model_level_stats <- glance(model)
row_level_stats <- augment(model)

I just want storage

In this case you want to serialize your R variables. That is, converting them to be a string or a binary blob. There are several methods for this.


My data has to be accessible by programs other than R, and needs to be human-readable

You should store your data in a cross-platform text format; probably JSON or YAML. JSON doesn't support some important concepts like Inf; YAML is more general but the support in R isn't as mature. XML is also possible, but is too verbose to be useful for storing large arrays.

library(RJSONIO)
model_as_json <- toJSON(model)
nchar(model_as_json) # 17916

library(yaml)
# yaml package doesn't yet support conversion of language objects,
# so preprocessing is needed
model2 <- within(
  model,
  {
     call <- as.character(call)
     terms <- as.character(terms)
  }
)
model_as_yaml <- as.yaml(model2) 
nchar(model_as_yaml) # 14493

My data has to be accessible by programs other than R, and doesn't need to be human-readable

You could write your data to an open, cross-platform binary format like HFD5. Currently support for HFD5 files (via rhdf5) is limited, so complex objects are not supported. (You'll probably need to unclass everything.)

library(rhdf5)
h5save(rapply(model2, unclass, how = "replace"), file = "model.h5")
bin_h5 <- readBin("model.h5", "raw", 1e6)
length(bin_h5) # 88291 not very efficient in this case

The feather package let's you save data frames in a format readable by both R and Python. To use this, you would first have to convert the model object into data frames, as described in the broom section earlier in the answer.

library(feather)
library(broom)
write_feather(augment(model), "co2_row.feather")  # 5474 bytes
write_feather(tidy(model), "co2_coeff.feather")   # 2093 bytes
write_feather(glance(model), "co2_model.feather") #  562 bytes

Another alternative is to save a text version of the variable (see previous section) to a zipped file and store its bytes in the database.

writeLines(model_as_json)
tar("model.tar.bz", "model.txt", compression = "bzip2")
bin_bzip <- readBin("model.tar.bz", "raw", 1e6)
length(bin_bzip) # only 42 bytes!

My data only needs to be accessible by R, and needs to be human-readable

There are two options for turning a variable into a string: serialize and deparse.

p <- function(x)
{
  paste0(x, collapse = "\n")
}

serialize needs to be sent to a text connection, and rather than writing to file, you can write to the console and capture it.

 model_serialized <- p(capture.output(serialize(model, stdout())))
 nchar(model_serialized) # 23830

Use deparse with control = "all" to maximise the reversibility when re-parsing later.

model_deparsed <- p(deparse(model, control = "all"))
nchar(model_deparsed) # 22036

My data only needs to be accessible by R, and doesn't need to be human-readable

The same sorts of techniques shown in the previous sections can be applied here. You can zip a serialized or deparsed variable and re-read it as a raw vector.

serialize can also write variables in a binary format. In this case, it is most easily used with its wrapper saveRDS.

saveRDS(model, "model.rds")
bin_rds <- readBin("model.rds", "raw", 1e6)
length(bin_rds) # 6350
爱情眠于流年 2024-08-10 18:19:31

对于sqlite(可能还有其他):

CREATE TABLE data (blob BLOB);

现在在R中:

RSQLite::dbGetQuery(db.conn, 'INSERT INTO data VALUES (:blob)', params = list(blob = list(serialize(some_object)))

请注意some_object周围的list包装器。 serialize 的输出是一个原始向量。如果没有 list,将为每个向量元素执行 INSERT 语句。将其包装在列表中允许 RSQLite::dbGetQuery 将其视为一个元素。

要从数据库取回对象:

some_object <- unserialize(RSQLite::dbGetQuery(db.conn, 'SELECT blob FROM data LIMIT 1')$blob[[1]])

这里发生的事情是获取字段 blob (这是一个列表,因为 RSQLite 不知道查询将返回多少行)。由于 LIMIT 1 确保仅返回 1 行,因此我们将其与 [[1]] 一起使用,这是原始的原始向量。然后您需要反序列化原始向量来获取您的对象。

For sqlite (and possibly others):

CREATE TABLE data (blob BLOB);

Now in R:

RSQLite::dbGetQuery(db.conn, 'INSERT INTO data VALUES (:blob)', params = list(blob = list(serialize(some_object)))

Note the list wrapper around some_object. The output of serialize is a raw vector. Without list, the INSERT statement would be executed for each vector element. Wrapping it in a list allows RSQLite::dbGetQuery to see it as one element.

To get the object back from the database:

some_object <- unserialize(RSQLite::dbGetQuery(db.conn, 'SELECT blob FROM data LIMIT 1')$blob[[1]])

What happens here is you take the field blob (which is a list since RSQLite doesn't know how many rows will be returned by the query). Since LIMIT 1 assures only 1 row is returned, we take it with [[1]], which is the original raw vector. Then you need to unserialize the raw vector to get your object.

等风也等你 2024-08-10 18:19:31

使用 textConnection / saveRDS / loadRDS 也许是最通用和最高级别的:

zz<-textConnection('tempConnection', 'wb')
saveRDS(myData, zz, ascii = T)
TEXT<-paste(textConnectionValue(zz), collapse='\n')

#write TEXT into SQL
...
closeAllConnections()  #if the connection persists, new data will be appended

#reading back:
#1. pull from SQL into queryResult
...
#2. recover the object
recoveredData <- readRDS(textConnection(queryResult$TEXT))

Using textConnection / saveRDS / loadRDS is perhaps the most versatile and high level:

zz<-textConnection('tempConnection', 'wb')
saveRDS(myData, zz, ascii = T)
TEXT<-paste(textConnectionValue(zz), collapse='\n')

#write TEXT into SQL
...
closeAllConnections()  #if the connection persists, new data will be appended

#reading back:
#1. pull from SQL into queryResult
...
#2. recover the object
recoveredData <- readRDS(textConnection(queryResult$TEXT))
东走西顾 2024-08-10 18:19:31

[100% 正常运行 - 2020 年 2 月 27 日]

说明:
如果要将模型存储到 POSTGRES 表中,然后查询并加载它,请执行以下步骤。一个重要的部分是ascii = TRUE,否则在序列化时会产生错误

db <- pgsql_connect #connection to your database

serialized_model <- rawToChar(serialize(model_fit, NULL, ascii=TRUE))

insert_query <-'INSERT INTO table (model) VALUES ($1)'
rs <- dbSendQuery(db, insert_query, list(serialized_model))
dbClearResult(rs)

serialized_model <- dbGetQuery(db, "select model from table order by created_at desc limit 1")

model_fit2 <- unserialize(charToRaw(as.character(serialized_model[,c('model')])))
model_fit2

[100% WORKING - 27 Feb 2020]

Description:
Here are the steps if you want to store your model into a POSTGRES table, then query it and load it. An important part is the ascii = TRUE, which otherwise would produce errors when serializing

db <- pgsql_connect #connection to your database

serialized_model <- rawToChar(serialize(model_fit, NULL, ascii=TRUE))

insert_query <-'INSERT INTO table (model) VALUES ($1)'
rs <- dbSendQuery(db, insert_query, list(serialized_model))
dbClearResult(rs)

serialized_model <- dbGetQuery(db, "select model from table order by created_at desc limit 1")

model_fit2 <- unserialize(charToRaw(as.character(serialized_model[,c('model')])))
model_fit2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文