ADO.NET DataTable/DataRow 线程安全
简介
今天早上,一位用户向我报告,他遇到了我们提供的一些并行执行代码的结果不一致的问题(即,列值有时在不应该为空时出现空值)一个内部框架。这段代码过去运行良好,最近没有被篡改,但它让我思考以下代码片段:
代码示例
lock (ResultTable)
{
newRow = ResultTable.NewRow();
}
newRow["Key"] = currentKey;
foreach (KeyValuePair<string, object> output in outputs)
{
object resultValue = output.Value;
newRow[output.Name] = resultValue != null ? resultValue : DBNull.Value;
}
lock (ResultTable)
{
ResultTable.Rows.Add(newRow);
}
(不保证编译、手工编辑以掩盖专有信息.)
解释
我们在系统的其他地方也有这种级联类型的锁定代码,它工作得很好,但这是我遇到的第一个与 ADO .NET 交互的级联锁定代码实例。众所周知,框架对象的成员通常不是线程安全的(在这种情况下就是这种情况),但是级联锁定应该确保我们不会同时读取和写入 ResultTable.Rows。我们很安全,对吧?
假设
嗯,级联锁代码并不能确保我们不会同时读取或写入我们分配的 ResultTable.Rows > 值到新行中的列。如果 ADO .NET 使用某种非线程安全的缓冲区来分配列值,即使涉及不同的对象类型(DataTable 与 DataRow),会怎样?
以前有人遇到过这样的事情吗?我想我应该先在 StackOverflow 上问一下,然后再花几个小时苦苦挣扎:)
结论
嗯,共识似乎是,将级联锁更改为完整锁已经解决了问题。这不是我所期望的结果,但是全锁版经过很多很多次的测试并没有出现这个问题。
教训:警惕在您无法控制的 API 上使用的级联锁。谁知道幕后会发生什么!
Introduction
A user reported to me this morning that he was having an issue with inconsistent results (namely, column values sometimes coming out null when they should not be) of some parallel execution code that we provide as part of an internal framework. This code has worked fine in the past and has not been tampered with lately, but it got me to thinking about the following snippet:
Code Sample
lock (ResultTable)
{
newRow = ResultTable.NewRow();
}
newRow["Key"] = currentKey;
foreach (KeyValuePair<string, object> output in outputs)
{
object resultValue = output.Value;
newRow[output.Name] = resultValue != null ? resultValue : DBNull.Value;
}
lock (ResultTable)
{
ResultTable.Rows.Add(newRow);
}
(No guarantees that that compiles, hand-edited to mask proprietery information.)
Explanation
We have this cascading type of locking code other places in our system, and it works fine, but this is the first instance of cascading locking code that I have come across that interacts with ADO .NET. As we all know, members of framework objects are usually not thread safe (which is the case in this situation), but the cascading locking should ensure that we are not reading and writing to ResultTable.Rows concurrently. We are safe, right?
Hypothesis
Well, the cascading lock code does not ensure that we are not reading from or writing to ResultTable.Rows at the same time that we are assigning values to columns in the new row. What if ADO .NET uses some kind of buffer for assigning column values that is not thread safe--even when different object types are involved (DataTable vs. DataRow)?
Has anyone run into anything like this before? I thought I would ask here at StackOverflow before beating my head against this for hours on end :)
Conclusion
Well, the consensus appears to be that changing the cascading lock to a full lock has resolved the issue. That is not the result that I expected, but the full lock version has not produced the issue after many, many, many tests.
The lesson: be wary of cascading locks used on APIs that you do not control. Who knows what may be going on under the covers!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
艾伦,
我找不到你的方法的任何具体问题,并不是说我的测试是详尽的。以下是我们坚持的一些想法(我们所有的应用程序都以线程为中心):
只要有可能:
[1] 使所有数据访问完全原子化。由于多线程应用程序中的数据共享是各种不可预见的线程交互的绝佳场所。
[2] 避免锁定类型。如果不知道该类型是否是线程安全的,请编写一个包装器。
[3] 包含允许快速识别正在访问共享资源的线程的结构。如果系统性能允许,请将此信息记录在高于调试级别且低于通常操作日志级别的位置。
[4] 任何未在内部明确记录为经过线程安全测试的代码(包括 System.* 等)都不是线程安全的。道听途说和别人的口头言论不算在内。测试一下并写下来。
希望这有一定的价值。
Allen,
I could not find any specific problems with your approach, not that my testing was exhaustive. Here are some ideas that we stick with (all of our applications are thread centric):
Whenever possible:
[1] Make all data access completely atomic. As data sharing in multi-threaded applications is an excellent place for all kinds of unforeseen thread interaction.
[2] Avoid locking on a type. If the type is not know to be thread safe write a wrapper.
[3] Include structures that allow for the fast identification of threads that are accessing a shared resource. If system performance allows, log this information above the debug level and below usual operation log levels.
[4] Any code, including System.* et.al, not explicitly documented internally as Thread Safe Tested is not Thread Safe. Hearsay and the verbal word of others does not count. Test it and write it down.
Hope this is of some value.
我曾经读过一篇文章,说他们发现内部使用公共行在数据表中进行插入操作。创建新记录的多个线程都会在公共行上覆盖数据并相互破坏,从而导致问题。解决方法是在添加行时锁定表,以便一次只有一个线程可以添加新行。
I read an article once that said they found the internals use a common row for insert operations in the DataTable. Multiple thread both creating new records will overlay data on the common row and corupt each other causeing the problem. The fix is to lock the table when adding rows so only one thread can add a new row at a time.
您的代码对我来说也很好,但我建议您在添加新创建的行之前使用 ResultTable.Rows.SyncRoot 进行锁定,以便 ResultTable 对象的其余部分可供其他进程自由访问。
Your code looks fine to me too, but I suggest that you use
ResultTable.Rows.SyncRoot
for locking before adding the newly created row, so that the rest of theResultTable
object is free to be accessed by other processes..NET 的这一点在过去七 (!) 年中可能已经发生了变化,但是,为了回答这个问题,从 .NET 4.7.1 开始,列值缓冲的假设是不正确的。查看 corefx/DataRow.cs 中的源代码,问题是围绕
_tempRecord
字段的竞争条件,该字段存储数据表中的行位置。该字段可能会被任何触发调用BeginEditInternal()
的写入操作修改,其中包括值更新。当两个写入发生冲突时,一个写入可能会遵循另一个设置的_tempRecord
值,从而更新与预期不同的行。这与Microsoft 文档所述一致任何写入都必须同步(强调)。托尼之前的回答描述了这种行为的一部分。举个例子,我最近按照上面代码示例中所示的锁定方法通过提高性能来破坏代码。该代码很稳定,运行了 1.5 年,没有出现任何问题,但是,在每秒超过 2000 个新行的情况下,数万次写入中至少有一次始终会出现在错误的行上。
一种可能的修复方法是锁定每个写入,但将它们分组以通过最小化锁定数量来限制性能影响。另一种方法是为每个线程提供自己的表以更新并稍后合并结果。就我而言,性能关键部分一段时间以来一直是从
DataTable
中移出的候选者,因此用更具可扩展性的数据结构重新编码。This bit of .NET may have changed in the past seven (!) years but, to answer the question, the hypothesis of column value buffering is incorrect as of .NET 4.7.1. From a look at the source in corefx/DataRow.cs, the issue is a race condition around the
_tempRecord
field, which stores the row's position in the data table. This field can potentially be modified by any write triggering a call toBeginEditInternal()
, which includes value updates. When two writes collide, one can end up following the value of_tempRecord
set by the other and therefore updates a different row than expected. This is consistent with Microsoft's documentation stating any write must be synchronized (emphasis added). Tony's earlier answer describes a subset of this behaviour.As an example, I recently broke code following the locking approach shown in the code sample above by making performance improvements. The code was stable and ran without issue for 1.5 years but, somewhere above 2000 new rows per second, at least one out of a few tens of thousands of writes consistently ends up on a wrong row.
One possible fix is to lock on every write but group them to limit performance impact by minimizing the number of locks. Another is to give each thread its own table to update and later merge the results. In my case, the performance critical section had been a candidate to move off
DataTable
for some time, so got recoded with more scalable data structures.