在数据仓库场景中使用WITH(NOLOCK)有什么缺点

发布于 2024-07-25 03:28:09 字数 390 浏览 11 评论 0原文

我有一个 Kimball 风格的 DW(星型模型中的事实和维度 - 没有迟到的事实行或列,没有列在维度上发生变化,除了作为类型 2 缓慢变化维度的一部分到期之外),每天进行繁重的处理来插入和更新行(新日期)以及月度和每日报告流程。 事实表按日期分区,以便轻松滚滚旧数据。

我知道 WITH(NOLOCK) 可能会导致读取未提交的数据,但是,我也不希望创建任何会导致 ETL 进程失败或阻塞的锁。

在所有情况下,当我们从 DW 读取数据时,我们从事实表中读取不会更改的日期(事实表按日期分区)和维度表,这些数据表所链接的事实的属性不会发生更改。

那么 - 有什么缺点吗? - 可能在执行计划中或在此类仅在同一表上并行运行的SELECT查询的操作中。

I have a Kimball-style DW (facts and dimensions in star models - no late-arriving facts rows or columns, no columns changing in dimensions except expiry as part of Type 2 slowly changing dimensions) with heavy daily processing to insert and update rows (on new dates) and monthly and daily reporting processes. The fact tables are partitioned by the dates for easy rolloff of old data.

I understand the WITH(NOLOCK) can cause uncommitted data to be read, however, I also do not wish to create any locks which would cause the ETL processes to fail or block.

In all cases, when we are reading from the DW, we are reading from fact tables for a date which will not change (the fact tables are partitioned by date) and dimension tables which will not have attributes changing for the facts they are linked to.

So - are there any disadvantages? - perhaps in the execution plans or in the operation of such SELECT-only queries running in parallel off the same tables.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

请止步禁区 2024-08-01 03:28:09

这就是您可能需要的:

`ALTER DATABASE AdventureWorks
设置 READ_COMMITTED_SNAPSHOT ON;

更改数据库 AdventureWorks
将 ALLOW_SNAPSHOT_ISOLATION 设置为 ON;
`

然后继续

使用SET TRANSACTION ISOLATION LEVEL READ COMMITTED

在查询中 。 根据 BOL:

READ COMMITTED 的行为取决于 READ_COMMITTED_SNAPSHOT 数据库选项的设置:

如果 READ_COMMITTED_SNAPSHOT 设置为 OFF(默认值),则数据库引擎使用共享锁来防止其他事务在当前事务运行时修改行。读操作。 共享锁还阻止语句读取其他事务修改的行,直到其他事务完成为止。 共享锁的类型决定了何时释放它。 在处理下一行之前释放行锁。 当读取下一页时,页锁被释放,当语句完成时,表锁被释放。

如果 READ_COMMITTED_SNAPSHOT 设置为 ON,则数据库引擎使用行版本控制为每个语句提供事务上一致的数据快照(与语句开始时存在的数据相同)。 锁不用于保护数据免遭其他事务的更新。

希望这有帮助。
拉吉

This is what you probably need:

`ALTER DATABASE AdventureWorks
SET READ_COMMITTED_SNAPSHOT ON;

ALTER DATABASE AdventureWorks
SET ALLOW_SNAPSHOT_ISOLATION ON;
`

Then go ahead and use

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

in your queries. According to BOL:

The behavior of READ COMMITTED depends on the setting of the READ_COMMITTED_SNAPSHOT database option:

If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. The shared lock type determines when it will be released. Row locks are released before the next row is processed. Page locks are released when the next page is read, and table locks are released when the statement finishes.

If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.

Hope this help.
Raj

剪不断理还乱 2024-08-01 03:28:09

只要都是不更新的数据就没有坏处,但如果也有很多好处,我会感到惊讶。 我想说值得一试。 最糟糕的情况是,如果您正在进行批量插入,您将获得不完整和/或不一致的数据,但您可以决定这是否会使任何有用的内容无效。

As long as it's all no-update data there's no harm, but I'd be surprised if there's much benefit either. I'd say it's worth a try. The worst that will happen is that you'll get incomplete and/or inconsistent data if you are in the middle of a batch insert, but you can decide if that invalidates anything useful.

柠檬色的秋千 2024-08-01 03:28:09

您是否考虑过创建 DW 的数据库快照并运行报告关掉吗?

Have you considered creating a DATABASE SNAPSHOT of your DW and run your reports off it?

风和你 2024-08-01 03:28:09

是的。 你的 SQL 的可读性会大大降低。 您将不可避免地错过一些 NOLOCK 提示,因为使用 NOLOCK 策略的 SQL SELECT 命令必须将其放在各处。

通过设置隔离级别也可以得到同样的效果

< a href="http://msdn.microsoft.com/en-us/library/aa259216(SQL.80).aspx" rel="nofollow noreferrer">设置事务隔离级别读取未提交

最后你获得 10% 的性能提升(抱歉,我太懒了,也查了一下文章,但它就在那里)

我想说 10% 的提升不值得降低可读性。

Yes. Your SQL will be far less readable. You will inevitably miss some NOLOCK hints because SQL SELECT commands using the NOLOCK strategy have to put it all over the place.

You can get the same thing by setting the isolation level

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

In the end you get a 10% performance boost (sorry I'm too lazy too look up the article for it, but it's out there)

I'd say a 10% gain isn't worth reducing readability.

许一世地老天荒 2024-08-01 03:28:09

如果可以使整个数据库只读,那么这是一个更好的选择。 您将获得未提交读的性能,而无需修改所有代码。

ALTER DATABASE adventureworks SET read_only

If making the whole database read-only is possbile, Then this is a better option. You'll get read-uncommitted performance without having to modify all your code.

ALTER DATABASE adventureworks SET read_only
海未深 2024-08-01 03:28:09

NOLOCK 执行“脏读”(不雅的 READ UNCOMMITTED 与 NOLOCK 执行相同的操作)。 如果在您读取时数据库正在更新,则存在返回不一致数据的危险。 唯一的选择是接受锁定并因此阻塞,或者选择 SQL 2005 及以后提供的两个新隔离级别之一 此处讨论

NOLOCK performs a ‘dirty read’ (indecently READ UNCOMMITTED does the same thing as NOLOCK). If the database is being updated as you read there is a danger that you will get inconsistent data back. The only option is to either accept locking and hence blocking, or to pick one of the two new isolation levels offered in SQL 2005 onwards discussed here.

带刺的爱情 2024-08-01 03:28:09

Kimball-DWH 中应该只有一项可以操纵数据的服务 - etl 进程 - 本身。

如果您有完整的端到端 etl 作业,您将永远不会遇到锁(当您正确设置子任务的依赖关系时)。

但是:如果您有独立的作业,这些作业正在端到端更新数据管道,从采购到星星、模型和报告,您需要一个概念来确保共享资源/工件的并发作业的一致性和可访问性。 一个好的建议是对表进行分区,更新克隆表,并在一个短事务中一起切换所涉及表的更新分区(在 etl 过程之后)。 因此主表应该与其他表保持一致并且始终可访问。

这种模式是一种最佳实践,但并非没有障碍 - 如果你用谷歌搜索一下 - 你会同意的。

There should be only one service in a Kimball-DWH that manipulationg data - the etl-process - himself.

If you have a full end-to-end etl-job you will never ever encounter locks (wehen you set the dependecies of the sub-tasks correct).

But: If you have independent jobs, which are updating data-pipelines end-2-end from sourcing up to the stars and models and reports, you need a concept to ensure consistency and accessibility for concurrent jobs sharing ressources/artefacts. A good advice is partitioned tables and updating cloned tables and switch the updated partitions of involved tables in a short transaction together (after the etl process). so the main-table should be consistent with the others and accessible all the time.

this pattern is a best practise but not without stones in your road - if you google a bit - you will agree.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文