使用 ErrorFile 选项时同时 SQL BULK INSERT 会生成失败
Windows Server 2008 R2 Enterprise、SQL Server 2008 X64、SP3、开发版
我构建并动态执行(通过 sp_executesql)BULK INSERT 命令。一般形式是:(
BULK INSERT #HeaderRowCheck
from "\\Server\Share\Develop\PKelley\StressTesting\101\DataSet.csv"
with
(
lastrow = 1
,rowterminator = '\n'
,tablock
,maxerrors = 0
,errorfile = 'C:\SQL_Packages\TempFiles\#HeaderRowCheck_257626FB-A5CD-41B8-B862-FAF8C591C7A9.log'
)
错误文件名称基于配置的本地文件夹、正在加载的表以及为每次批量插入运行新生成的 guid —— 它是包装在其自己的存储过程中的子例程。)
外部进程(是SQL 代理(现在是 WCF 服务)启动 DTEXEC,它启动一个 SSIS 包,该包调用数据库中的存储过程,循环遍历集合、构建查询并为每个集合运行它。给定数据库中最多可以同时运行四个负载,并且 SQL 实例上的多个数据库可以同时运行此负载 - 尽管从历史上看,容量一直很低,而且我们通常只有一个实例一次运行这个。我们经常这样做,并且在两年多的时间里它几乎完美无缺地工作——安全配置正确,必要的文件和文件夹存在,一切都很正常。 (运气?我不这么认为。)
我们现在预计会出现一些严重的工作负载,因此我们正在进行一些压力测试,其中我启动了 8 次运行,每次运行有四个进程,其中一组四个进程将一一划分处理要加载的文件(即最多同时执行 32 个批量插入。就像我说的,压力测试。)你瞧,启动时,一个或多个文件将在执行过程中失败,并显示如下错误消息:
<代码>错误#4861从文件“DataSet.csv”加载标头信息时遇到:无法批量加载,因为无法打开文件“C:\SQL_Packages\TempFiles\#HeaderRowCheck_D0070742-76A5-4175-A1A7-16494103EF25.log”。操作系统错误代码80(文件存在。)。
从运行到运行,同一文件、数据集或整体处理点不会发生错误。
从表面上看,听起来两个进程正在尝试访问相同的错误文件,这意味着它们独立生成相同的 guid(!)。我的理解是这几乎是不可能的。另一种理论是,太多事情同时发生(可能同时运行多达 32 个 BULK INSERT 命令),SQL 和/或操作系统不知何故变得混乱(我是 DBA,而不是网络管理员)。我可以做一个解决方法,构建我的 try-catch 块来检查错误 4861 并重试最多三次,但我宁愿避免这种混乱。
从那时起,我就加入了一个例程,在使用之前将错误文件的名称(带有 guid)记录到表中。经过多次运行和多次失败后,我看到 (a) 失败的文件 + guid 正在记录在我的表中,并且 (b) 没有记录重复的 guid。
有人知道会发生什么吗?
菲利普
Windows Server 2008 R2 Enterprise, SQL Server 2008 X64, SP3, Developer edition
I build and dynamically execute (via sp_executesql) a BULK INSERT command. The general form is:
BULK INSERT #HeaderRowCheck
from "\\Server\Share\Develop\PKelley\StressTesting\101\DataSet.csv"
with
(
lastrow = 1
,rowterminator = '\n'
,tablock
,maxerrors = 0
,errorfile = 'C:\SQL_Packages\TempFiles\#HeaderRowCheck_257626FB-A5CD-41B8-B862-FAF8C591C7A9.log'
)
(The errorfile name is based on a configured local folder, the table being loaded, and a guid generated freshly for every bulk insert run -- it's a subroutine wrapped in its own stored procedure.)
An outside process (was SQL Agent, is now a WCF service) launches DTEXEC which starts an SSIS package which calls stored procedures in a database that loop through sets, builds the query, and runs it for each. Up to four loads could be running at the same time from/into a given database, and multiple databases on the SQL instance could be running this at the same time – though historically, volume has been low, and we’ve generally only had one instance running this at a time. We do this a lot, and it has worked all but flawlessly for well over two years – security is properly configured, necessary files and folders exist, all the usual. (Luck? I like to think not.)
We are now anticipating some serious workloads, so we’re doing some stress testing, in which I launch 8 runs, each with four processes, where a set of four will divide and one by one process the files to be loaded (i.e. up to 32 simultaneous bulk inserts being performed. Like I said, stress testing.) Low and behold, when launched, one or more will fail during the course of execution, with an error message like:
Error #4861 encountered while loading header information from file "DataSet.csv": Cannot bulk load because the file "C:\SQL_Packages\TempFiles\#HeaderRowCheck_D0070742-76A5-4175-A1A7-16494103EF25.log" could not be opened. Operating system error code 80(The file exists.).
From run to run, the error does not occur for the same file, data set, or point-in-overall-processing.
On the surface, it sounds like two processes are trying to access the same error file, which would mean that they’re independantly generating the same guid(!). My understanding is that’s supposed to be all but impossible. An alternate theory is, so much is going on simultaneously (potentially up to 32 simultaneous BULK INSERT commands running), SQL and/or the OS is getting confused somehow (I’m a DBA, not a network admin). I could do a work-around, building out my try-catch block to check for error 4861 and retrying up to three times, but I’d rather avoid such kludgery.
I have since tossed in a routine that logs the name of the error file (with the guid) to a table before it is used. After many runs and several fails, I see that (a) the failed file + guid is being logged in my table, and (b) there are no duplicate guids being logged.
Anyone know what might be going on?
Philip
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我向 Microsoft 技术支持提出了一个案例,经过大量的反复讨论,Pradeep MM(SQL Server 支持技术主管)解决了所有问题。
一般过程:读入文件夹中的文件列表,然后逐个对这些文件执行一系列批量插入(首先读取我们解析列的第一行,然后从第二行以上读取数据)。所有批量插入都使用“ErrorFile”选项,以便在数据格式错误时向用户提供我们可以提供的信息。 Process 已经工作了 3 年多,但在最近的压力测试条件下(单个 SQL Server 实例最多同时运行 8 个,所有文件格式正确),我们得到了上面列出的错误。
我们最初认为生成 GUID 时会出现错误,因为“已经打开”错误,但这个想法最终被放弃——如果 newid() 不能正常运行,更多的人将会遇到更严重的问题。
根据 Pradeep 的说法,以下是批量插入如何工作的分步过程:
相同的编译期间计划
ERRORFILE 参数然后我们将创建 ErrorFile.log 并
ErrorFile.Error.Txt 到指定的文件夹位置(重要
这里要理解的是文件大小为 0kb)
windows API 调用
并尝试执行批量插入命令作为其中的一部分,我们将
重新创建 ErrorFile.log 和 ErrorFile.Error.Txt 到该文件夹
指定的位置(根据在线图书文档错误
文件不应该出现在这个位置,否则我们将失败
执行 http://msdn.microsoft.com/en-us/library/ms188365.aspx
批量插入相应的错误被记录到错误文件中
如果没有错误,这两个文件将被删除。
在运行失败期间运行 ProcMon(进程监视器)显示 ErrorFile 已成功创建并在步骤 3 中打开,但在步骤 4 中未关闭,导致步骤 5 生成我们看到的错误。 (对于成功运行,文件按预期创建并关闭。)
ProcMon 的进一步分析表明,在批量插入尝试后,另一个运行 CMD.EXE 的进程正在对文件发出“关闭句柄”操作。我们使用涉及 xp_cmdshell 的例程来检索要处理的文件列表,这将是 CMD.EXE 进程的原因。关键在于:
…有一些业务逻辑在 SQL Server 内部启动 CMD.EXE,并且由于 CMD.EXE 是一个子进程,它继承了父进程打开的所有句柄(所以这可能是 CMD 中的某种计时问题) .EXE 保存启动时打开的文件的句柄,并且所有由 CMD.EXE 继承的句柄的文件都无法删除,只能在 CMD.EXE 销毁后才能释放)
就是这样。单次运行永远不会遇到此问题,因为其 xp_cmdshell 调用在发出批量插入之前完成。但是,对于并行运行,尤其是许多并行运行(我只在 5 个或更多并行运行时遇到问题),出现了计时问题:
它内部使用 XP_CMDSHELL 并启动 CMD.EXE 来枚举
文件
然后启动批量插入活动,它处于编译状态
BULK INSERT 命令的阶段
编译阶段,编译完成后删除
阶段完成,
内部使用 XP_CMDSHELL 并启动的存储过程
CMD.EXE 枚举所有文件
处理 SQLServr.exe,因此默认情况下它会继承所有句柄
由 SQLServr.exe 创建(所以这个进程获取所有句柄
第一个中由 BULK INSERT 创建的 ERRORFILE
连接)
因此我们试图删除必须关闭的文件
所有句柄,我们确实看到 CMD.EXE 持有一个句柄
文件并且它仍然打开,因此我们无法删除该文件。所以
在不删除文件的情况下,我们进入执行阶段并在
在执行阶段,我们尝试使用以下命令创建一个新的 ERRORFILE
名称相同,但由于文件已经存在,我们失败并出现错误
“操作系统错误代码80(文件存在。)。”
我的短期解决方法是(1)实现重试循环,生成新的 ErrorFile 名称并在放弃之前尝试新的批量插入最多三次,以及(2)在夜间进程上构建另一个例程以删除找到的所有文件在我们的“ErrorFile 文件夹”中。
长期解决方案是修改我们的代码以不通过 xp_cmdshell 列出文件。这似乎是可行的,因为整个 ETL 过程都包含在 SSIS 包中并由 SSIS 包管理;或者,可以构建并使用 CLR 例程。目前,考虑到我们预期的工作负载,解决方法就足够了(特别是考虑到我们刚才正在处理的其他所有事情),因此可能需要一段时间才能实现最终的工作使固定。
为后代发布,以防万一发生在您身上!
I opened a case with Microsoft Tech Support, and after no small amount of back-and-forth, Pradeep M.M. (SQL Server Support Technical Lead) worked it all out.
The general process: read in a list of files in a folder, and one by one perform a series of bulk inserts on those files (first to read the first line, which we parse for columns, and then to read data from the second+ lines). All bulk inserts utilize the “ErrorFile” option, so as to provide users with what information we can when their data is mis-formatted. Process has worked for 3+ years, but under recent stress testing conditions (up to 8 simultaneous runs performed by a single SQL Server instance, with all files properly formatted), we got the errors listed above.
We initially though there were errors with generating the GUID, because of that “already open” error, but that idea was eventually discarded -- if newid() wasn’t functioning properly, a lot more people would be having much more serious issues.
As per Pradeep, here is a Step By Step Process of how Bulk Insert Works:
Plan for the same
ERRORFILE parameter then we will create the ErrorFile.log and
ErrorFile.Error.Txt to the folder location specified ( important
thing to understand here is the file will be of 0kb in size)
windows API Calls
and try to execute the Bulk Insert command as a part of it we will
re-create the ErrorFile.log and ErrorFile.Error.Txt to the folder
location specified ( As Per Books Online Documentation the Error
files should not be there in this location or else we will fail our
execution http://msdn.microsoft.com/en-us/library/ms188365.aspx
Bulk insert respective Errors are logged into the Error Files
created if there are no errors these 2 files will be deleted.
Running ProcMon (Process Monitor) during failed runs revealed that the ErrorFile was successfully created and opened in step 3, but were NOT closed in step 4, resulting in step 5 generating the error we were seeing. (For successful runs, the file was created and closed as expected.)
Further analysis of ProcMon showed that, another process running CMD.EXE was issuing “close handle” operations on the file, after the bulk insert attempt. We use a routine involving xp_cmdshell to retrieve the list of files to be processed, and that would be the cause of the CMD.EXE process. Here’s the kicker:
…there is some business logic which launches CMD.EXE inside SQL Server and since CMD.EXE is a child process it inherits all the handles opened by the parent process ( So probably this is some kind of timing issue where in CMD.EXE holds handles for files which are open when it got launched and all those files who’s handle is being inherited by CMD.EXE cannot be deleted and only can be released after CMD.EXE is destroyed)
And that was it. A single run never hits this problem, as its xp_cmdshell call is completed before the bulk inserts are issued. But with parallel runs, particularly with many parallel runs (I only hit the problem with 5 or more going), a timing issue occurred such that:
which internally uses XP_CMDSHELL and launches CMD.EXE to enumerate
the Files
and then starts the Bulk Insert Activity and it’s in Compilation
phase for the BULK INSERT Command
during compilation phase and then delete it after the compilation
phase is done
a stored procedure which internally uses XP_CMDSHELL and launches
CMD.EXE to enumerate all the files
Process SQLServr.exe so by default it inherits all the Handles
created by SQLServr.exe ( So this process gets all the handles for
the ERRORFILE that have been created by BULK INSERT in the First
Connection)
hence we are trying to delete the file during which we have to close
all the handles, We do see that CMD.EXE is holding an handle to the
file and it’s still open and hence we cannot delete the file. So
without deleting the File we move on to the Execution Phase and in
the Execution phase we are trying to create a new ERRORFILE with the
same name but since the file already exists we fail with the Error
“Operating system error code 80(The file exists.).”
My short-term workaround was to (1) implement a retry loop, generating a new ErrorFile name and attempting a new bulk insert up to three times before giving up, and (2) building another routine on our nightly processes to delete all files found in our “ErrorFile folder”.
The long-term fix is to revise our code to not list files via xp_cmdshell. This would seem to be feasible, since the whole ETL process is wrapped in and managed by an SSIS package; alternatively, CLR routines could be built and worked in. For now, given our anticipated work load the work-around is sufficient (particularly given everything else we’re working on just now), so it may be a bit before we implement the final fix.
Posted for posterity, in case it ever happens to you!