查询SPSS文件
R版本:2.12.2 /操作系统:Windows XP
我需要查询和操作的数据存储在SPSS .sav文件中(大小〜112MB),并且我对SPSS语法有非常基本的了解。我对 R 更加熟悉(通常更喜欢用它进行数据操作和分析),因此我将 SPSS 文件保存为制表符分隔文件(大小约为 85MB),并尝试使用 RODBC 包中的功能对其进行查询。
使用 Windows 控制面板上管理工具组中的“数据源 (ODBC)”应用程序,我使用 Microsoft 文本驱动程序设置了一个包含表(制表符分隔文件)的数据库(Windows 文件夹)。
当前感兴趣的表有一个“日期”字段(称为“DateTime”),其中日期格式为“%Y/%m/%d”。当我第一次尝试使用该字段的条件查询数据库时,我返回了一个具有零观察值的结果集。因此,我将“日期时间”更改为数据类型“字符”,并再次尝试查询(无济于事)。
我能够成功查询数据库,即
sqlQuery(channel = cn, query = "SELECT * FROM large_table.txt", as.is = TRUE, stringsAsFactors = FALSE, max = 20)
返回结果。以下查询返回具有零观察值的结果集(我已验证至少有一条记录的“DateTime”字段中的值为“2011/04/01”)。
sqlQuery(channel = cn, query = "SELECT DateTime FROM large_table.txt WHERE DateTime = {d '2011-04-01'}", as.is = TRUE, stringsAsFactors = FALSE)
sqlQuery(channel = cn, query = "从 large_table.txt 中选择日期时间,其中日期时间 = '2011/04/01'", as.is = TRUE, stringsAsFactors = FALSE)
谁能告诉我为什么结果集没有观察值?
非常感谢任何改进我的数据库连接或查询的建议。我更喜欢使用 RODBC 包而不是另一个数据库查询包,但如果有必要,我愿意尝试新的东西。
R version: 2.12.2 / OS: Windows XP
Data that I need to query and manipulate is stored in a SPSS .sav file (size ~ 112MB) and I have a very basic knowledge of SPSS syntax. I am much more comfortable with R (and generally prefer it for data manipulation and analysis), so I saved the SPSS file as a tab-delimited file (size ~ 85MB) and have attempted to query it using functionality from the RODBC package.
Using the 'Data Sources (ODBC)' application within the Administrative Tools group on Windows Control Panel, I setup a database (Windows folder) which has tables (tab-delimited files) with the Microsoft Text Driver.
The current table of interest has a 'Date' field (called 'DateTime') where dates are formatted '%Y/%m/%d'. When I first tried to query the database with conditions on this field, I was returned a result set with zero observations. So, I changed 'DateTime' to datatype 'Character' and tried my query again (to no avail).
I am able to successfully query the database, i.e.
sqlQuery(channel = cn, query = "SELECT * FROM large_table.txt", as.is = TRUE, stringsAsFactors = FALSE, max = 20)
returns results. The following queries return result sets with zero observations (I have verified that there is at least one record with the the value '2011/04/01' in its 'DateTime' field).
sqlQuery(channel = cn, query = "SELECT DateTime FROM large_table.txt WHERE DateTime = {d '2011-04-01'}", as.is = TRUE, stringsAsFactors = FALSE)
sqlQuery(channel = cn, query = "SELECT DateTime FROM large_table.txt WHERE DateTime = '2011/04/01'", as.is = TRUE, stringsAsFactors = FALSE)
Can anyone tell me why the result set has no observations?
Any suggestions to improve my database connection or query are greatly appreciated. I prefer to use the RODBC package instead of another database-query package, but am willing to try something new if necessary.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通过将 SPSS 文件直接读入 R,您可以省去很多麻烦。
这将返回一个 data.frame,您可以使用所有常规 R 工具进行分析。
作为附带好处,您还将在 data.frame 的属性中包含调查问卷文本。
有关更多详细信息,请参阅
?read.spss
。You can save yourself a lot of hassle by reading the SPSS file directly into R.
This will return a data.frame that you can analyse using all of the normal R tools.
As a side benefit, you will also have the questionnaire text in the attributes of your data.frame.
See
?read.spss
for more details.我会为此使用“memisc”包。为您的 3 个 SPSS 文件设置 spss.system.file() 导入器,使用 memisc 版本的子集将所需的变量加载到 R 中,包括处理用户缺失的级别。
SPSS 文件的设置方式存在一些奇怪之处 - 具有数值变量的 SPSS 文件经常会设置用户缺失值,这可能需要在 R 端排序为具有适当 NA 的区间缩放。
I'd use the 'memisc' package for this. set up spss.system.file() importers for your 3 SPSS files, subset using the memisc version to load in to R just the variables that are needed, including handling user-missing levels.
There are some oddities around how SPSS files are set up - it is fairly frequent for SPSS files with numeric variables to have user-missings set, which may need sorting at the R end into an interval scaling with appropriate NAs.