RODBC sqlSave() 在违反 PK 时停止插入查询
我开发了一项在线调查,将我的数据存储在 Microsoft SQL 2005 数据库中。我在 R 中对数据编写了一组离群值检查。这些脚本的一般工作流程是:
- 使用 sqlQuery() 从 SQL 数据库读取数据
- 执行离群值分析
- 使用 sqlSave() 将违规受访者写回单独表中的数据库
表 I我回信的结构如下:
CREATE TABLE outliers2(
modelid int
, password varchar(50)
, reason varchar(50),
Constraint PK_outliers2 PRIMARY KEY(modelid, reason)
)
GO
如您所见,我已将主键设置为 modelid 和 Reason。同一受访者可能是多次检查的异常值,但我不想为任何受访者插入相同的 modelid 和原因组合。
由于我们仍在收集数据,因此当我开发根据数据进行估计的模型时,我希望能够每天/每周更新这些脚本。以下是我正在使用的 sqlSave()
命令的一般形式:
sqlSave(db, db.insert, "outliers2", append = TRUE, fast = FALSE, rownames = FALSE)
其中 db
是有效的 ODBC 连接,db.insert
具有 表单
> head(db.insert)
modelid password reason
1 873 abkd WRONG DIRECTION
2 875 ab9d WRONG DIRECTION
3 890 akdw WRONG DIRECTION
4 905 pqjd WRONG DIRECTION
5 941 ymne WRONG DIRECTION
6 944 okyt WRONG DIRECTION
当 sqlSave()
尝试插入违反主键约束的行并且不继续插入其他记录时, sqlSave() 会阻塞。我本以为设置 fast = FALSE
会缓解这个问题,但事实并非如此。
关于如何解决这个问题有什么想法吗?我总是可以在第一个脚本的开头删除
该表,但这似乎相当严厉,并且无疑会导致以后出现问题。
I have developed an online survey that stores my data in a Microsoft SQL 2005 database. I have written a set of outlier checks on my data in R. The general workflow for these scripts is:
- Read data from SQL database with sqlQuery()
- Perform outlier analysis
- Write offending respondents back to database in separate table using sqlSave()
The table I am writing back to has the structure:
CREATE TABLE outliers2(
modelid int
, password varchar(50)
, reason varchar(50),
Constraint PK_outliers2 PRIMARY KEY(modelid, reason)
)
GO
As you can see, I've set the primary key to be modelid and reason. The same respondent may be an outlier for multiple checks, but I do not want to insert the same modelid and reason combo for any respondent.
Since we are still collecting data, I would like to be able to update these scripts on a daily / weekly basis as I develop the models I am estimating on the data. Here is the general form of the sqlSave()
command I'm using:
sqlSave(db, db.insert, "outliers2", append = TRUE, fast = FALSE, rownames = FALSE)
where db
is a valid ODBC Connection and db.insert
has the form
> head(db.insert)
modelid password reason
1 873 abkd WRONG DIRECTION
2 875 ab9d WRONG DIRECTION
3 890 akdw WRONG DIRECTION
4 905 pqjd WRONG DIRECTION
5 941 ymne WRONG DIRECTION
6 944 okyt WRONG DIRECTION
sqlSave()
chokes when it tries to insert a row that violates the primary key constraint and does not continue with the other records for the insert. I would have thought that setting fast = FALSE
would have alleviated this problem, but it doesn't.
Any ideas on how to get around this problem? I could always drop
the table at the beginning of the first script, but that seems pretty heavy handed and will undoubtedly lead to problems down the road.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在这种情况下,一切都按预期进行。您将所有内容批量上传,SQL Server 一旦发现错误就会停止该批处理。不幸的是,我不知道有什么优雅的内置解决方案。但是,我认为可以在数据库中构建一个系统来更有效地处理这个问题。我喜欢在数据库中而不是在 R 中进行数据存储/管理,因此我的解决方案非常依赖数据库。其他人可能会为您提供更面向 R 的解决方案。
首先,创建一个没有约束的简单表来保存新行并相应地调整 sqlSave 语句。这是 R 将信息上传到的位置。
将信息放入此表的查询应假设“重复”列为“否”。我使用一种模式,其中 1=Y & 5=N。您也可以只标记那些异常值,但我倾向于明确我的逻辑。
您还需要一个地方来转储所有违反异常值 PK 的行。
好的。现在您需要做的就是创建一个触发器以将新行从 tblTemp 移动到 outliers2。该触发器会将所有重复行移动到 tblDuplicates 以便以后处理、删除等。
我不会遍历并编写整个触发器。我没有 SQL Server 2005 来测试它,我可能会犯语法错误,并且我不想给您错误的代码,但触发器需要执行以下操作:
这样做的好处是 sqlSave() 不会仅仅因为你违反了你的 PK 而出错,并且你可以稍后处理比赛,比如明天。 :-)
In this case, everything is working as expected. You uploading everything as a batch and SQL Server is stopping the batch as soon as it finds an error. Unfortunately, I don't know of a graceful built-in solution. But, I think it is possible to build a system in the database to handle this more efficiently. I like doing data storage/management in databases rather than within R, so my solution is very database heavy. Others may offer you a solution that is more R oriented.
First, create a simple table, without constraints, to hold your new rows and adjust your sqlSave statement accordingly. This is where R will upload the information to.
Your query to put information into this table should assume 'No' for the column 'duplicate'. I use a pattern where 1=Y & 5=N. You could also only mark those that are outliers but I tend to prefer to be explicit with my logic.
You will also need a place to dump all rows which violate the PK in outliers2.
OK. Now all you need to do is to create a trigger to move the new rows from tblTemp to outliers2. This trigger will move all duplicate rows to tblDuplicates for later handling, deletion, whatever.
I'm not going to go through and write the entire trigger. I don't have a SQL Server 2005 to test it against and I would probably make a syntax error and I don't want to give you bad code, but here's what the trigger needs to do:
The nice part about doing it this way is sqlSave() won't error out just because you have a violation of your PK and you can deal with the matches at a later time, like tomorrow. :-)