在Python中插入/删除操作，无需合并

发布于 2025-02-13 03:51:22 字数 524 浏览 0 评论 0原文

我是Python的新手。我有一个我想每天更新的桌子。我得到了一个带有大量数据的CSV文件，约有15000个条目。 CSV文件中的每一行必须插入我的表格。但是，如果文件中的特定值匹配任何行的主键，则我想从表中删除该行，而是从CSV文件中插入相应的行。因此，例如。如果我的CSV文件是这样的：

001|test1|test11|test111
002|test2|test22|test222
003|test3|test33|test333

在我的表中，我有一个带有主键列值= 002的行，然后从文件中删除该行并插入相应的行。

我对每天可以在CSV中可以得到多少行，而值与主键相匹配。我知道这可以使用合并查询来完成，但是我不确定它是否比任何其他方法都需要更长的时间。而且它也需要我创建一个临时表并每次将其截断。同样，如果我使用存在，则需要一个临时表。

执行此任务的最有效方法是什么？

我正在使用Python 2.7.5和SQL Server 2017

原文

I am quite new to python. I have a table that I want to update daily. I get a csv file with large amount of data, about 15000 entries. Each row from the csv file has to be inserted in my table. But If a specific value from the file matches the primary key of any of the rows, the I want to delete the row from the table and instead insert the corresponding row from the csv file. So for eg. if my csv file is like this:

001|test1|test11|test111
002|test2|test22|test222
003|test3|test33|test333

And in my table I have a row with primary key column value=002, then delete that row and insert corresponding row from the file.

I don't have an idea about how many rows I could get in that csv every day, with values matching primary key. I know this can be done with a MERGE query but I am not really sure if it will take a longer time than any other method. And it would also require me to create a temp table and truncate it every time. Same if I use WHERE EXISTS, I would need a temp table.

What is the most efficient way to do this task?

I am using Python 2.7.5 and SQL Server 2017

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初相遇 2025-02-20 03:51:22

我认为使用合并语句是最佳解决方案。创建一个与目标表匹配的舞台表，将其截断，然后将CSV插入舞台表。如果您的SQLServer实例可以访问该文件，则可以使用bulk Insert或Open Rowset将其加载，其他Vise使用Python。要加载分阶段的数据到目标表，请使用合并语句。

如果您的表具有列名ID，COL1，COL2，COL3，则类似的内容：

MERGE INTO dbo.MyTable as TargetTable USING
(
SELECT
Id,Col1,Col2,Col3
FROM dbo.stage_MyTable
) as SourceTable
ON TargetTable.Id = SourceTable.Id
WHEN MATCHED THEN UPDATE SET
  Col1 = SourceTable.Col1,
  Col2 = SourceTable.Col2,
  Col3 = SourceTable.Col3
WHEN NOT MATCHED BY TARGET THEN INSERT
(Id,Col1, Col2,Col3)
VALUES 
(SourceTable.Id,SourceTable.Col1, SourceTable.Col2,SourceTable.Col3)
;

此方法的好处是查询将作为单个事务执行以前的状态。

I think using merge statement is the optimal solution. Create a stage-table matching your target table, truncate it and insert the csv to the stage table. If your sqlserver instance has access to the file you can use bulk insert or open rowset to load it, othervise use python. To load staged data to target table use a MERGE statement.

If your table has column names Id, Col1, Col2, Col3 then something like this:

MERGE INTO dbo.MyTable as TargetTable USING
(
SELECT
Id,Col1,Col2,Col3
FROM dbo.stage_MyTable
) as SourceTable
ON TargetTable.Id = SourceTable.Id
WHEN MATCHED THEN UPDATE SET
  Col1 = SourceTable.Col1,
  Col2 = SourceTable.Col2,
  Col3 = SourceTable.Col3
WHEN NOT MATCHED BY TARGET THEN INSERT
(Id,Col1, Col2,Col3)
VALUES 
(SourceTable.Id,SourceTable.Col1, SourceTable.Col2,SourceTable.Col3)
;

The benefit of this approach is that the query will be executed as a single transaction so if there are duplicate rows or similar the table status will be rolled back to previous state.

回复收藏 0 原文

~没有更多了~