如何使用 SqlBulkCopy 保持行顺序?

发布于 2024-07-06 15:18:56 字数 258 浏览 4 评论 0 原文

我使用 SqlBulkCopy 以编程方式将数据从 Excel 导出到 SQL Server 2005。 它工作得很好,我遇到的唯一问题是它没有保留 Excel 文件中的行顺序。 我没有可供排序的列,我只想按照 Excel 电子表格中显示的顺序插入记录。

我无法修改 Excel 文件,并且必须使用我现有的文件。 按任何现有列排序都会破坏顺序。

请帮忙。

PS 最终将 ID 列插入到电子表格中,看起来无法在导出/导入期间保留订单

I'm exporting data programatically from Excel to SQL Server 2005 using SqlBulkCopy. It works great, the only problem I have is that it doesn't preserve the row sequence i have in Excel file. I don't have a column to order by, I just want the records to be inserted in the same order they appear in the Excel Spreadsheet.

I can't modify the Excel file, and have to work with what I've got. Sorting by any of the existing columns will break the sequence.

Please help.

P.S. Ended up inserting ID column to the spreadsheet, looks like there's no way to keep the order during export/import

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

清醇 2024-07-13 15:18:56

我不认为行排序是由 SQL 指定或保证的,除非您使用“ORDER BY”子句。

来自 Bill Vaughn 的帖子 (http://betav.com/blog/billva /2008/08/sql_server_indexing_tips_and_t.html):

使用排序依据:即使表有
聚集索引(存储
数据按物理顺序排列),SQL Server
不保证行会被
返回该(或任何特定)
除非 ORDER BY 子句是
使用过。

另一个包含信息的链接:

http://sqlblogcasts.com/blogs/simons/archive/2007/08/21/What-is-the-position-of-a-row--.aspx

I don't think that row ordering is specified or guaranteed by SQL unless you use an "ORDER BY " clause.

From a post by Bill Vaughn (http://betav.com/blog/billva/2008/08/sql_server_indexing_tips_and_t.html):

Using Order By: Even when a table has
a clustered index (which stores the
data in physical order), SQL Server
does not guarantee that rows will be
returned in that (or any particular)
order unless an ORDER BY clause is
used.

Another link with info:

http://sqlblogcasts.com/blogs/simons/archive/2007/08/21/What-is-the-position-of-a-row--.aspx

留蓝 2024-07-13 15:18:56

经过大量研究后,很明显,Microsoft 编写的批量插入命令无法保留行顺序。 您要么必须自己直接将 ID 列添加到导入文件中,使用 shell 或其他外部脚本,要么不这样做。 看起来这对微软来说是一个需要(而且很容易)添加的功能,但在十多年来他们什么也没做之后,这不会发生。

然而,我需要在导入后保留导入文件中的实际记录顺序,因为如果设置的列具有相同的值,则较高的记录将取代较低的记录。

所以我走了一条不同的路。 我的限制是:

  • 我根本无法更改源文件。 (并开创了一个糟糕的先例!)
  • 我无法使用外部脚本。 太复杂。 它必须是一个简单的基于 T-Sql 的解决方案,没有 CMD 执行。 这需要进入一个单一的程序,以便实现自动化。

我喜欢使用 Powershell 为每行创建有序插入语句,然后在 Sql 中运行的逻辑。 它本质上是将每个记录排队以进行单独插入而不是批量插入。 是的,它会起作用,但也会非常慢。 我经常有超过 500K 行的文件。 我需要快速的东西。

所以我遇到了 XML。 将文件直接批量上传到单个 XML 变量中。 当每个记录添加到 XML 中时,这将保留记录的顺序。 然后解析 XML 变量并将结果插入表中,同时添加标识列。

假设导入文件是标准文本文件,每条记录都以换行符结尾 (Char(13)+Char(10))

我的方法有 2 个步骤:

  1. 执行IMPORT SQL 语句(使用 OPENROWSET),用 XML 标签封装每个记录。 将结果捕获到 XML 变量中。

  2. 通过 XML 标记将变量解析到表中,添加一个递增的 [ID] 列。

    <前><代码>--------------------------------
    声明@X xml;
    ---------------------------------
    SELECT @X=Cast(''+Replace([BulkColumn],Char(13)+Char(10),'')+'' 作为XML)
    FROM OPENROWSET (BULK N'\\FileServer\ImportFolder\ImportFile_20170120.csv',SINGLE_CLOB) T
    ---------------------------------
    SELECT [记录].[X].query('.').value('.','varchar(max)') [记录]
    ,ROW_NUMBER() OVER (ORDER BY (SELECT 100)) [ID]
    --进入#TEMP
    FROM @X.nodes('X') [记录](X);
    ---------------------------------

    • XML 标记取代了每个换行符。

    • 如果文件以换行符结尾,这将导致在末尾添加一个空白行。 只需删除最后一行即可。

我使用动态 sql 将其写入我的过程中,这样我就可以传入 FileName 并将 ID 设置为从 1 或 0 开始(如果有标题行)。

我能够在大约 5 秒内针对包含 300K 记录的文件运行此程序。

After lots of research it seems evident that there's no way to retain row order with the Bulk Insert command written as it is featured by Microsoft. You either have to add an ID column yourself directly into the import file, use a shell or other external script, or you do without. It seems it would be a needed (and easy) feature for Microsoft to add, but after more than a decade of nothing from them, it's not going to happen.

Yet I needed to preserve the actual record order in the import file after importing as higher up records would supersede those lower if a set column had the same value.

So I went a different route. My constraints were:

  • I couldn't change the source file at all. (and set a bad precedent!)
  • I couldn't use an external script. Too complicated. It had to be a simple T-Sql based solution, no CMD executions. This needed to go into a single procedure so it could be automated.

I liked the logic of using Powershell to create ordered insert statements for each row, then running in Sql. It was essentially queuing each record up for individual insert rather than BULK insert. Yes, it would work, but it would also be very slow. I often have files with 500K+ rows in them. I needed something FAST.

So I ran across XML. Bulk upload the file directly into a single XML variable. This would retain the order of the records as each is added to the XML. Then parse the XML variable and insert the results into a table, adding an identity column at the same time.

There is an assumption that the import file is a standard text file, with each record ending in a Line Feed (Char(13)+Char(10))

My approach has 2 steps:

  1. Execute the IMPORT SQL statement (using OPENROWSET), encapsulating each record with XML tags. Capture the results into an XML variable.

  2. Parse the variable by the XML tags into a table, adding an incrementing [ID] column.

    ---------------------------------
    Declare @X xml;
    ---------------------------------
    SELECT @X=Cast('<X>'+Replace([BulkColumn],Char(13)+Char(10),'</X><X>')+'</X>' as XML)
    FROM OPENROWSET (BULK N'\\FileServer\ImportFolder\ImportFile_20170120.csv',SINGLE_CLOB) T
    ---------------------------------
    SELECT [Record].[X].query('.').value('.','varchar(max)') [Record]
    ,ROW_NUMBER() OVER (ORDER BY (SELECT 100)) [ID]
    --Into #TEMP 
    FROM @X.nodes('X') [Record](X);
    ---------------------------------
    
    • The XML tags replace each Line Feed.

    • If the file ends with a Line Feed, this will cause a blank row to be added at the end. Simply delete the last row.

I wrote this into my procedure using dynamic sql so I could pass in the FileName and set the ID to begin at 1 or 0 (in case there's a header row).

I was able to run this against a file of 300K records in about 5 seconds.

夜巴黎 2024-07-13 15:18:56

您还可以在表中定义一个标识列,该标识列在数据加载期间自动递增。 这样,当您希望记录再次以相同的顺序排列时,您可以对其进行排序。

You might also be able to define an identity column in your table that auto-increments during data load. That way, you can sort on it later when you want the records in the same order again.

メ斷腸人バ 2024-07-13 15:18:56

如果您可以将 Excel 电子表格保存为 CSV,则可以很容易地使用任何脚本语言生成 INSERT 语句列表,这些语句将以与电子表格完全相同的顺序执行。 下面是 Groovy 中的一个简单示例,但任何脚本语言都可以轻松完成此任务:

def file1 = new File('c:\\temp\\yourSpreadsheet.csv')
def file2 = new File('c:\\temp\\yourInsertScript.sql')

def reader = new FileReader(file1)
def writer = new FileWriter(file2)

reader.transformLine(writer) { line ->
    fields =  line.split(',')

    text = """INSERT INTO table1 (col1, col2, col3) VALUES ('${fields[0]}', '${fields[1]}', '${fields[2]}');"""

}

然后您可以针对数据库执行“yourInsertScript.sql”,您的订单将与电子表格相同。

If you can save the excel spreadsheet as a CSV it is very easy to generate a list of INSERT statements with any scripting language which will be executed in the exact same order as the spreadsheet. Here's a quick example in Groovy but any scripting language will do it just as easily if not easier:

def file1 = new File('c:\\temp\\yourSpreadsheet.csv')
def file2 = new File('c:\\temp\\yourInsertScript.sql')

def reader = new FileReader(file1)
def writer = new FileWriter(file2)

reader.transformLine(writer) { line ->
    fields =  line.split(',')

    text = """INSERT INTO table1 (col1, col2, col3) VALUES ('${fields[0]}', '${fields[1]}', '${fields[2]}');"""

}

You can then execute your "yourInsertScript.sql" against your database and your order will be the same as your spreadsheet.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文