RODBC sqlSave() 在违反 PK 时停止插入查询

发布于 2024-10-01 15:32:07 字数 1257 浏览 6 评论 0原文

我开发了一项在线调查,将我的数据存储在 Microsoft SQL 2005 数据库中。我在 R 中对数据编写了一组离群值检查。这些脚本的一般工作流程是:

  1. 使用 sqlQuery() 从 SQL 数据库读取数据
  2. 执行离群值分析
  3. 使用 sqlSave() 将违规受访者写回单独表中的数据库

表 I我回信的结构如下:

CREATE TABLE outliers2(
    modelid int
    , password varchar(50)
    , reason varchar(50),
Constraint PK_outliers2 PRIMARY KEY(modelid, reason)
)
GO

如您所见,我已将主键设置为 modelid 和 Reason。同一受访者可能是多次检查的异常值,但我不想为任何受访者插入相同的 modelid 和原因组合。

由于我们仍在收集数据,因此当我开发根据数据进行估计的模型时,我希望能够每天/每周更新这些脚本。以下是我正在使用的 sqlSave() 命令的一般形式:

sqlSave(db, db.insert, "outliers2", append = TRUE, fast = FALSE, rownames = FALSE)

其中 db 是有效的 ODBC 连接,db.insert 具有 表单

> head(db.insert)
  modelid password          reason
1     873       abkd WRONG DIRECTION
2     875       ab9d WRONG DIRECTION
3     890       akdw WRONG DIRECTION
4     905       pqjd WRONG DIRECTION
5     941       ymne WRONG DIRECTION
6     944       okyt WRONG DIRECTION

sqlSave() 尝试插入违反主键约束的行并且不继续插入其他记录时, sqlSave() 会阻塞。我本以为设置 fast = FALSE 会缓解这个问题,但事实并非如此。

关于如何解决这个问题有什么想法吗?我总是可以在第一个脚本的开头删除该表,但这似乎相当严厉,并且无疑会导致以后出现问题。

I have developed an online survey that stores my data in a Microsoft SQL 2005 database. I have written a set of outlier checks on my data in R. The general workflow for these scripts is:

  1. Read data from SQL database with sqlQuery()
  2. Perform outlier analysis
  3. Write offending respondents back to database in separate table using sqlSave()

The table I am writing back to has the structure:

CREATE TABLE outliers2(
    modelid int
    , password varchar(50)
    , reason varchar(50),
Constraint PK_outliers2 PRIMARY KEY(modelid, reason)
)
GO

As you can see, I've set the primary key to be modelid and reason. The same respondent may be an outlier for multiple checks, but I do not want to insert the same modelid and reason combo for any respondent.

Since we are still collecting data, I would like to be able to update these scripts on a daily / weekly basis as I develop the models I am estimating on the data. Here is the general form of the sqlSave() command I'm using:

sqlSave(db, db.insert, "outliers2", append = TRUE, fast = FALSE, rownames = FALSE)

where db is a valid ODBC Connection and db.insert has the form

> head(db.insert)
  modelid password          reason
1     873       abkd WRONG DIRECTION
2     875       ab9d WRONG DIRECTION
3     890       akdw WRONG DIRECTION
4     905       pqjd WRONG DIRECTION
5     941       ymne WRONG DIRECTION
6     944       okyt WRONG DIRECTION

sqlSave() chokes when it tries to insert a row that violates the primary key constraint and does not continue with the other records for the insert. I would have thought that setting fast = FALSE would have alleviated this problem, but it doesn't.

Any ideas on how to get around this problem? I could always drop the table at the beginning of the first script, but that seems pretty heavy handed and will undoubtedly lead to problems down the road.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

So尛奶瓶 2024-10-08 15:32:07

在这种情况下,一切都按预期进行。您将所有内容批量上传,SQL Server 一旦发现错误就会停止该批处理。不幸的是,我不知道有什么优雅的内置解决方案。但是,我认为可以在数据库中构建一个系统来更有效地处理这个问题。我喜欢在数据库中而不是在 R 中进行数据存储/管理,因此我的解决方案非常依赖数据库。其他人可能会为您提供更面向 R 的解决方案。

首先,创建一个没有约束的简单表来保存新行并相应地调整 sqlSave 语句。这是 R 将信息上传到的位置。

CREATE TABLE tblTemp(
    modelid int
    , password varchar(50)
    , reason varchar(50)
    , duplicate int()
)
GO

将信息放入此表的查询应假设“重复”列为“否”。我使用一种模式,其中 1=Y & 5=N。您也可以只标记那些异常值,但我倾向于明确我的逻辑。

您还需要一个地方来转储所有违反异常值 PK 的行。

CREATE TABLE tblDuplicates(
    modelid int
    , password varchar(50)
    , reason varchar(50)
)
GO

好的。现在您需要做的就是创建一个触发器以将新行从 tblTemp 移动到 outliers2。该触发器会将所有重复行移动到 tblDuplicates 以便以后处理、删除等。

CREATE TRIGGER FindDups
ON tblOutliersTemp
AFTER INSERT
AS 

我不会遍历并编写整个触发器。我没有 SQL Server 2005 来测试它,我可能会犯语法错误,并且我不想给您错误的代码,但触发器需要执行以下操作:

  1. 识别 tblTemp 中违反的所有行异常值2中的PK。如果发现重复项,请将重复项更改为 1。这可以通过 UPDATE 语句来完成。
  2. 将重复项 = 1 的所有行复制到 tblDuplicates。您可以使用 INSERT INTO tblDuplicates 来执行此操作......
  3. 现在使用 INSERT INTO 语句将非重复行复制到 outliers2,该语句看起来几乎与步骤 2 中使用的语句完全相同。
  4. 从 tblTemp 中删除所有行,以清除它等待您的下一批更新。这一步很重要。

这样做的好处是 sqlSave() 不会仅仅因为你违反了你的 PK 而出错,并且你可以稍后处理比赛,比如明天。 :-)

In this case, everything is working as expected. You uploading everything as a batch and SQL Server is stopping the batch as soon as it finds an error. Unfortunately, I don't know of a graceful built-in solution. But, I think it is possible to build a system in the database to handle this more efficiently. I like doing data storage/management in databases rather than within R, so my solution is very database heavy. Others may offer you a solution that is more R oriented.

First, create a simple table, without constraints, to hold your new rows and adjust your sqlSave statement accordingly. This is where R will upload the information to.

CREATE TABLE tblTemp(
    modelid int
    , password varchar(50)
    , reason varchar(50)
    , duplicate int()
)
GO

Your query to put information into this table should assume 'No' for the column 'duplicate'. I use a pattern where 1=Y & 5=N. You could also only mark those that are outliers but I tend to prefer to be explicit with my logic.

You will also need a place to dump all rows which violate the PK in outliers2.

CREATE TABLE tblDuplicates(
    modelid int
    , password varchar(50)
    , reason varchar(50)
)
GO

OK. Now all you need to do is to create a trigger to move the new rows from tblTemp to outliers2. This trigger will move all duplicate rows to tblDuplicates for later handling, deletion, whatever.

CREATE TRIGGER FindDups
ON tblOutliersTemp
AFTER INSERT
AS 

I'm not going to go through and write the entire trigger. I don't have a SQL Server 2005 to test it against and I would probably make a syntax error and I don't want to give you bad code, but here's what the trigger needs to do:

  1. Identify all rows in tblTemp that would violate the PK in outliers2. Where duplicates are found, change the duplicates to 1. This would be done with an UPDATE statement.
  2. Copy all rows where duplicate=1 to tblDuplicates. You would do this with an INSERT INTO tblDuplicates ......
  3. Now copy the non-duplicate rows to outliers2 with an INSERT INTO statement that looks almost exactly like the one used in step 2.
  4. DROP all rows from tblTemp, to clear it out for your next batch of updates. This step is important.

The nice part about doing it this way is sqlSave() won't error out just because you have a violation of your PK and you can deal with the matches at a later time, like tomorrow. :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文