RODBC 查询调优
我正在尝试使用 RODBC 将数据从 Oracle 数据集市提取到 R 中。我已经解决了一些问题,但我有一个基本问题。
如何保留数据库中的格式?
使用以下命令
uapp <- sqlQuery(ch, SQL, stringsAsFactors=F)
我的数据按预期检索,但某些字段在 R 中一次转换为数字和逻辑,而它们应该是字符。如果我使用以下方法,我可以解决这个问题:
uapp <- sqlQuery(ch, SQL, stringsAsFactors=F, as.is=T)
但是当我希望它们成为日期时,我“丢失”了被格式化为日期的列。
I am trying to pull data from our Oracle datamart into R using RODBC. I have been able to work through some of my issues, but I have one basic question.
How can I retain the formatting as it sits in the database?
Using the following command
uapp <- sqlQuery(ch, SQL, stringsAsFactors=F)
My data are retrieved as expected, but some fields are converted to numeric and logical once in R when they should be characters. I can get around this if I use:
uapp <- sqlQuery(ch, SQL, stringsAsFactors=F, as.is=T)
But I "lose" columns being formatted as dates when I would like them to be.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你也陷入了我也曾挣扎过的困境。我无法评论什么是“正确”或“最好”,而只能评论我过去所做的事情。
我通常会像您在第一个示例中所做的那样,只在类型更改进入 R 后处理类型更改。如果您想执行后一种方法,您可以在日期进入 R 后转换日期。我的 Oracle 系统似乎总是被设置最多以“22-NOV-10”格式返回日期,这解析起来很烦人。因此,我会在查询中使用 Oracle to_char() 函数将日期格式化为 R 可以轻松识别的格式。
因此,例如,我可能在我的 SELECT 语句中包含此内容:
然后我将其放入名为 myData 的数据框中并执行以下操作:
是否处理修复日期或处理修复其他字段实际上取决于您拥有多少个日期字段以及第一种方法弄乱了多少非日期字段。但根据我的经验,我最终要么修复其中一个,要么修复另一个。
使用方法 1 时您可能需要考虑的事项:尝试在 SQL 中使用cast() 来强制字段转换为特定类型。唯一一次我遇到 RODBC 干扰我的数据类型的麻烦是当服务器端的类型不明确时。这通常是 SQL 端 CASE 语句或类似语句的结果。
You're caught in a gap I've struggled with as well. I can't opine on what is "right" or "best" but only what I've done in the past.
I usually do what you did in the first example and just deal with type changes once they get into R. If you wanted to do the latter method, you could convert the date once it was in R. My Oracle systems seem to always be set up to return dates in the "22-NOV-10" format which is annoying as heck to parse. So I would use the Oracle to_char() function in my query to format my dates into something R can easily recognize.
So, for example, I might have this in my SELECT statement:
then I pull that into a data frame called myData and do this:
Whether to deal with fixing dates or deal with fixing other fields really depends on how many date fields you have and how many non-date fields the first method messes up. But in my experience I either end up fixing one, or the other.
Something you might consider when using method 1: Try using cast() in your SQL in order to force a field into a particular type. The only times I've had trouble with RODBC molesting my data types is when the type is ambiguous on the server side. This is usually the result of CASE statements or somesuch on the SQL end.
as.is
参数可以是布尔向量。因此,如果您的结果集由两个日期列和一个字符列组成,您可以执行以下操作:
编辑:按照 Kalin 的建议,您还可以引用应“按原样”的列。例如,
将“按原样”保留第二列和第四列。
The
as.is
argument can be a boolean vector.So if your result set consists of, say, two date columns followed by one character column, you can do:
EDIT: as suggested by Kalin you can also refer to the columns that should be "as is". For example
will leave columns two and four "as is".