R中的sqldf包,查询数据框
我正在尝试使用 R 中的 sqldf 库重写一些代码,这应该允许我在数据帧上运行 SQL 查询,但我遇到一个问题,每当我尝试运行查询时,R 似乎都会尝试查询我使用的实际 MySQL 数据库 con,并通过我试图搜索的数据框的名称查找表。
当我运行此命令时:
sqldf("SELECT COUNT(*) from work.class_scores")
我得到:
mysqlNewConnection(drv, ...) 中的错误: RS-DBI 驱动程序:(无法连接到数据库:错误:无法通过套接字“/tmp/mysql.sock”连接到本地 MySQL 服务器(2) )
当我尝试使用两种不同的方式指定位置时(第一种形成googlecode页面,第二种根据文档应该是正确的)
> sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite")
Error in sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite") :
unused argument(s) (sqldf.driver = "SQLite")
> sqldf("SELECT COUNT(*) from work.class_scores", drv = "SQLite")
Loading required package: tcltk
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
Reason: image not found
Error: require(tcltk) is not TRUE
所以,我认为这可能是这个包tcltk的问题,我有从来没有听说过,所以我尝试解决这个问题并发现一些问题:
> install.packages("tcltk")
Warning in install.packages :
argument 'lib' is missing: using '/Users/michaeldiscenza/Library/R/2.11/library'
Warning in install.packages :
package ‘tcltk’ is not available
> install.packages("tcltk2", lib="/Applications/RStudio.app/Contents/Resources/R/library")
trying URL 'http://lib.stat.cmu.edu/R/CRAN/bin/macosx/leopard/contrib/2.11/tcltk2_1.1-5.tgz'
Content type 'application/x-gzip' length 940835 bytes (918 Kb)
opened URL
==================================================
downloaded 918 Kb
The downloaded packages are in
/var/folders/Y1/Y1gdz9tKFiSnWsGP9+BDcU+++TI/-Tmp-//RtmpL07KTL/downloaded_packages
> library("tcltk")
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
Reason: image not found
Error: package/namespace load failed for 'tcltk'
!dbPreExists 中的错误:无效的参数类型
在这里,我真的不知道问题是什么,我需要移动一些东西吗?
我尝试的另一种方法是在数据框对象上运行查询之前,设置数据库连接,以便 R 会查看那里,而不是尝试连接到实际的本地 MySQL 数据库。但这没有用。回到套接字的问题(尽管我可以毫无问题地查询本地数据库本身。
> con <- sqldf()
Error in mysqlNewConnection(drv, ...) :
RS-DBI driver: (Failed to connect to database: Error: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
)
最终,我想查询以获取 C 值大于 2 的记录数,并且我觉得这样做很舒服唯一的问题是我不知道是否有另一种方法来指定我正在查询的是数据框而不是实际的数据库,我在这里错过了一些非常愚蠢和简单的东西吗
?
I'm trying to rewrite some code using the sqldf library in R, which should allow me to run SQL queries on data frames, but I am having an issue in that whenever I try to run a query, R seems like it tries to query the actual real MySQL db con that I use and look for a table by the name of a the data frame that I am trying to search by.
When I run this:
sqldf("SELECT COUNT(*) from work.class_scores")
I get:
Error in mysqlNewConnection(drv, ...) :
RS-DBI driver: (Failed to connect to database: Error: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
)
When I try to specify the location using two different ways (the first form the googlecode page, the second which should be right based on the docs)
> sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite")
Error in sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite") :
unused argument(s) (sqldf.driver = "SQLite")
> sqldf("SELECT COUNT(*) from work.class_scores", drv = "SQLite")
Loading required package: tcltk
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
Reason: image not found
Error: require(tcltk) is not TRUE
So, I'm thinking it might be a a problem with this package tcltk, which I have never heard of, so I try and take care of that and find some issues:
> install.packages("tcltk")
Warning in install.packages :
argument 'lib' is missing: using '/Users/michaeldiscenza/Library/R/2.11/library'
Warning in install.packages :
package ‘tcltk’ is not available
> install.packages("tcltk2", lib="/Applications/RStudio.app/Contents/Resources/R/library")
trying URL 'http://lib.stat.cmu.edu/R/CRAN/bin/macosx/leopard/contrib/2.11/tcltk2_1.1-5.tgz'
Content type 'application/x-gzip' length 940835 bytes (918 Kb)
opened URL
==================================================
downloaded 918 Kb
The downloaded packages are in
/var/folders/Y1/Y1gdz9tKFiSnWsGP9+BDcU+++TI/-Tmp-//RtmpL07KTL/downloaded_packages
> library("tcltk")
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
call: dyn.load(file, DLLpath = DLLpath, ...)
error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
Reason: image not found
Error: package/namespace load failed for 'tcltk'
Error in !dbPreExists : invalid argument type
Here, I just really don't know what the issue is, do I need to move something around?
Another approach that I tried was before running the query on the data frame object, setting my database connection so R would look there rather than trying to connect to the actual local MySQL database. But that didn't work. Back to the problem with the socket (even though I can query the local DB itself without any issues.
> con <- sqldf()
Error in mysqlNewConnection(drv, ...) :
RS-DBI driver: (Failed to connect to database: Error: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
)
Eventually, I want to query to get the count of records where the value for C is larger than 2 for example, and I feel comfortable doing that. The only problem is I don't know if there is another way to specify that what I am querying is a data frame and not an actual db. Am I missing something really silly and easy here?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这个答案是从我之前的评论中转来的。
帖子和评论表明:
即使加载了 RMySQL,也希望将 SQLite 与 sqldf 一起使用
并且
有一个问题:
sqldf("select count(*) from work.class_scores")
其中
work.class_scores
是一个数据框。在 sqldf 主页上面的常见问题解答 #7 地址 (1) 和常见问题解答 #5 地址 (2) 上。 (3) 是由于点是 SQL 运算符,因此需要引用此类数据框名称,或者更改其名称以删除点。
下面我们提供了实现上述三种解决方案的可重现示例。
sqldf.driver
选项用于强制使用 SQLite,即使已加载 RMySQL。关于 tcltk,有三种方法: (i) gsubfn.engine 选项导致使用 R 代码代替 tcltk,以便不需要 tcltk 包。请参阅下面的示例代码。 (ii) 或者安装 tcltk。 (iii) 这个问题是在 sqldf 0.4-4 是当前版本时提出的,但现在 sqldf 0.4-5 已经发布,请注意,添加了额外的 tcltk 包检测,这使得它更有可能自动处理所有这些,而无需用户设置任何选项,而无需安装 tcltk。因此,最简单的解决方案可能是升级到 sqldf 0.4-5 或更高版本。
我们引用其中有点的数据框名称,或将数据框名称替换为不包含点的名称:
编辑:
添加了有关 sqldf 0.4-5 的信息。
This answer has been transferred from my earlier comments.
The post and comments indicate that:
it is desired to use SQLite with sqldf even though RMySQL is loaded
and
there was a message about tcltk being missing
there was a problem regarding:
sqldf("select count(*) from work.class_scores")
where
work.class_scores
is a data frame.On the sqldf home page FAQ #7 addresses (1) above and FAQ #5 addresses (2). (3) is due to the fact that dot is an SQL operator so such data frame names need to be quoted or else their name changed to remove the dot.
Below we provide reproducible example that implements the above three solutions.
The
sqldf.driver
option is used to force SQLite to be used even though RMySQL is loaded.Regarding tcltk there are three approaches: (i) The
gsubfn.engine
option causes R code to be used in place of tcltk so that the tcltk package won't be needed. See example code below. (ii) Alternately install tcltk. (iii) This question was asked when sqldf 0.4-4 was the current version but now that sqldf 0.4-5 is out note that additional tcltk package detection has been added which makes it more likely that it will automatically handle all this without the user having to set any options and without having to install tcltk. Thus the easiest solution may be to just upgrade to sqldf 0.4-5 or later.We quote the data frame name having a dot in it or replace the data frame name with a name not containing a dot:
EDIT:
Added info about sqldf 0.4-5.
您可以尝试从此处安装
tcl
软件包吗?(假设您使用的是 Mac)。
Can you try installing the
tcl
package from here?(this is assuming you are on a mac).