SQL Server 和 更新(或插入)并行性

发布于 2024-07-29 08:29:36 字数 401 浏览 7 评论 0原文

我收到了一项大型转换工作 - 将数据库中已有的 299Gb JPEG 图像转换为缩略图,以用于报告和带宽目的。

我编写了一个线程安全的 SQLCLR 函数来完成图像重新采样的工作,干得漂亮。

问题是,当我在 UPDATE 语句(从 PhotoData 字段到 ThumbData 字段)中执行它时,它会线性执行以防止竞争条件,仅使用一个处理器对图像进行重新采样。

那么,我如何最好地利用该数据库机器拥有的 12 个核心和 phat raid 设置呢? 是在update语句的FROM子句中使用子查询吗? 这就是启用此类操作的并行性所需的全部内容吗?

无论如何,操作被分成批次,每批次大约 4000 个图像(在大约 391k 图像的窗口查询中),这台机器有足够的资源可以消耗。

I got a large conversion job- 299Gb of JPEG images, already in the database, into thumbnail equivalents for reporting and bandwidth purposes.

I've written a thread safe SQLCLR function to do the business of re-sampling the images, lovely job.

Problem is, when I execute it in an UPDATE statement (from the PhotoData field to the ThumbData field), this executes linearly to prevent race conditions, using only one processor to resample the images.

So, how would I best utilise the 12 cores and phat raid setup this database machine has? Is it to use a subquery in the FROM clause of the update statement? Is this all that is required to enable parallelism on this kind of operation?

Anyway the operation is split into batches, around 4000 images per batch (in a windowed query of about 391k images), this machine has plenty of resources to burn.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

圈圈圆圆圈圈 2024-08-05 08:29:36

请检查 SQL Server 上最大并行度 (MAXDOP) 的配置设置。 您还可以设置 MAXDOP 的值。

此链接可能对您有用 http://www.mssqltips.com/tip.asp ?tip=1047

干杯

Please check the configuration setting for Maximum Degree of Parallelism (MAXDOP) on your SQL Server. You can also set the value of MAXDOP.

This link might be useful to you http://www.mssqltips.com/tip.asp?tip=1047

cheers

白昼 2024-08-05 08:29:36

您不能将查询拆分为多个批次,并在单独的连接上单独执行每个批次吗? SQL Server 仅在需要时才在查询中使用并行性,尽管您可以通过将并行性选项的成本阈值更改为 O 来停止它,甚至鼓励它(一点点),但我认为它相当碰运气。

值得注意的一件事是,它只会在编译查询时决定是否使用并行性。 另外,如果查询是在 CPU 负载较高的时候编译的,SQL Server 不太可能考虑并行性。

Could you not split the query into batches, and execute each batch separately on a separate connection? SQL server only uses parallelism in a query when it feels like it, and although you can stop it, or even encourage it (a little) by changing the cost threshold for parallelism option to O, but I think its pretty hit and miss.

One thing thats worth noting is that it will only decide whether or not to use parallelism at the time that the query is compiled. Also, if the query is compiled at a time when the CPU load is higher, SQL server is less likely to consider parallelism.

一笔一画续写前缘 2024-08-05 08:29:36

我也推荐 kragen2uk 和 onupdatecascade 提倡的“循环”方法(我正在投票支持它们)。 我知道我读过一些关于 CLR 例程和 SQL 并行性的令人恼火的内容,但我忘记了刚才是什么......但我认为它们不能很好地协同工作。

我过去在类似任务中所做的工作是建立一个表格,列出每批要完成的工作。 对于您启动的每个连接,它都会转到此表,获取下一批,将其标记为正在处理,处理它,将其更新为“完成”,然后重复。 这使您可以衡量性能、管理扩展、允许停止和重新启动而无需重新开始,并为您提供一些信息来显示任务的完成程度(更不用说显示它实际上正在做任何事情)。

I too recommend the "round-robin" methodology advocated by kragen2uk and onupdatecascade (I'm voting them up). I know I've read something irritating about CLR routines and SQL paralellism, but I forget what it was just now... but I think they don't play well together.

The bit I've done in the past on similar tasks it to set up a table listing each batch of work to be done. For each connection you fire up, it goes to this table, gest the next batch, marks it as being processed, processes it, updates it as Done, and repeats. This allows you to gauge performance, manage scaling, allow stops and restarts without having to start over, and gives you something to show how complete the task is (let alone show that it's actually doing anything).

纸伞微斜 2024-08-05 08:29:36

找到一些标准将集合分成不同的行子集(1-100、101-200,等等),然后同时从多个连接调用更新语句,其中每个连接处理表中的一个行子集。 所有连接应并行运行。

Find some criteria to break the set into distinct sub-sets of rows (1-100, 101-200, whatever) and then call your update statement from multiple connections at the same time, where each connection handles one subset of rows in the table. All the connections should run in parallel.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文