如何并行执行 SQL 过程中的子任务
我有一个流程可以分析一个系统的审计数据,为另一个系统构建报告数据。有一个管理过程,它循环分析每一天,并使用当前迭代的日期调用特定于实体的过程。有些实体的处理时间不到一秒,而其他实体则可能需要几分钟。像在 t-sql 中那样连续运行,CPU 利用率在 16 核服务器上永远不会超过 8%。每个实体特定过程都不依赖于其他过程,只是当天的所有实体在第二天开始之前完成。
我的想法是有一个 CLR 管理过程,并启动当天运行时间较长的过程在自己的线程上运行,然后一旦快速过程完成,Thread.Join() 长时间运行的线程等待所有实体完成该过程继续下一天之前。
下面是我的尝试,作为仅适用于一个工作线程的最简单的事情,并且在该线程上调用 Start 不会导致调用静态方法。我在 HelloWorld 方法中设置了一个断点,但它从未被击中。
我在控制台应用程序中尝试了非常类似的方法,并且它的工作方式与在 AsyncHelloWorld 开头的注释行中的同一线程上调用它一样。 SQL CLR 过程中的线程处理有什么不同吗?
using System.Threading;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void AsyncHelloWorld()
{
// HelloWorld(SqlContext.Pipe);
var worker = new Thread(HelloWorld);
worker.Start(SqlContext.Pipe);
worker.Join();
}
public static void HelloWorld(object o)
{
var pipe = o as SqlPipe;
if (pipe != null)
pipe.Send("Hello World!");
}
}
I have a process that analyzes audit data from one system to build reporting data for another system. There is a managing procedure that loops for each day to be analyzed and calls a entity specific procedure with the current iteration's day. Some entities take less than a second to process while others can take minutes. Running serially as it does in t-sql the cpu utilization never crests above 8% on the 16-core server. Each of the entity specific procedures are not dependent on the others, just that all of the entities for that day are complete before the next day is started.
My idea is to have a CLR managing procedure and start the longer running procedures for the day running on their own threads, then once the quick ones are done, Thread.Join() the long running threads to wait for all Entities to complete for that day before moving on to the next.
Below is my try as the simplest thing that could work for just one worker thread, and calling Start on that thread does not result in the static method being called. I have set a break point in the HelloWorld method and it is never hit.
I have tried something very much like this in a console application and had it work as does calling it on the same thread in the commented out line at the start of AsyncHelloWorld. Is there something about threading within SQL CLR Procedures that is different?
using System.Threading;
using Microsoft.SqlServer.Server;
public partial class StoredProcedures
{
[SqlProcedure]
public static void AsyncHelloWorld()
{
// HelloWorld(SqlContext.Pipe);
var worker = new Thread(HelloWorld);
worker.Start(SqlContext.Pipe);
worker.Join();
}
public static void HelloWorld(object o)
{
var pipe = o as SqlPipe;
if (pipe != null)
pipe.Send("Hello World!");
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你绝对不能这么做。 SqlPipe 与您调用的线程的上下文紧密相关。虽然从技术上讲,您可以从 SQLCRL 启动线程,但这些线程必须从原始线程与调用者进行所有交互。但即便如此,在 SQL 托管环境中启动 CLR 线程也是一个非常坏主意(我不会详细说明原因)。
相反,将逻辑分离为可以并行调用的过程,并从客户端并行调用这些过程。您可以使用异步过程执行作为调度过程的模式以异步方式启动,并且基于队列的激活通过 MAX_QUEUE_READERS 设置内置了对并行性的支持。
但很可能您的程序不需要显式并行性。可以从显式用户控制的并行性中受益的 T-SQL 负载非常罕见,根本不值得一提(更不用说在并行任务中正确提取事务语义超出了凡人的能力)。 T-SQL 可以利用内部语句并行性来并行处理数据,因此不需要显式并行性。
因此,您最好解释一下您真正想要解决的问题是什么,也许我们可以提供帮助。
You absolutely cannot do that. A SqlPipe is very strongly tied to the context of the thread you were invoked on. While you can, technically, launch threads from SQLCRL, these threads must do all interaction with the caller from the original thread. But even so, launching CLR threads inside the SQL hosted environment is a very bad idea (and I won't enter into details why).
Instead, separate your logic into procedures than can be invoked in parallel and invoke these procedures in parallel from the client. You can use Asynchronous procedure execution as a pattern of scheduling procedures to be launched in asynchronously and queue based activation has built-in support for parallelism via
MAX_QUEUE_READERS
setting.But most likely your procedures do not need explicit parallelism. T-SQL loads than can benefit from explicit user controlled parallelism are so rare that is not worth mentioning (not to mention that pulling transactional semantics right across parallel tasks is beyond mere mortals). T-SQL can leverage internal statement parallelism for processing data in parallel, so there is never a need for explicit parallelism.
So better you explain what is that you're really trying to solve and perhaps we can help.