临界区总是更快吗？

发布于 2024-07-20 12:46:04 字数 597 浏览 17 评论 0原文

我正在调试一个多线程应用程序，发现了CRITICAL_SECTION的内部结构。我发现 CRITICAL_SECTION 的数据成员 LockSemaphore 是一个有趣的成员。

看起来 LockSemaphore 是一个自动重置事件（顾名思义，不是信号量），当线程第一次等待关键部分时，操作系统会默默地创建此事件，其中被其他线程锁定。

现在，我想知道关键部分总是更快吗？事件是一个内核对象，每个关键部分对象都与事件对象关联，那么与互斥体等其他内核对象相比，关键部分如何更快？另外，内部事件对象实际上如何影响关键部分的性能？

以下是 CRITICAL_SECTION 的结构：

struct RTL_CRITICAL_SECTION
{
    PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
    LONG LockCount;
    LONG RecursionCount;
    HANDLE OwningThread;
    HANDLE LockSemaphore;
    ULONG_PTR SpinCount;
};

原文

I was debugging a multi-threaded application and found the internal structure of CRITICAL_SECTION. I found data member LockSemaphore of CRITICAL_SECTION an interesting one.

It looks like LockSemaphore is an auto-reset event (not a semaphore as the name suggests) and operating system creates this event silently when first time a thread waits on Critcal Section which is locked by some other thread.

Now, I am wondering is Critical Section always faster? Event is a kernel object and each Critical section object is associated with event object then how Critical Section can be faster compared to other kernel objects like Mutex? Also, how does internal event object actually affects the performance of Critical section ?

Here is the structure of the CRITICAL_SECTION:

struct RTL_CRITICAL_SECTION
{
    PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
    LONG LockCount;
    LONG RecursionCount;
    HANDLE OwningThread;
    HANDLE LockSemaphore;
    ULONG_PTR SpinCount;
};

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伤痕我心 2024-07-27 12:46:04

当他们说关键部分“快”时，他们的意思是“当它尚未被另一个线程锁定时获取它很便宜”。

[请注意，如果它已经被另一个线程锁定，那么它的速度有多快并不重要。]

它之所以快是因为，在进入内核之前，它使用相当于这些 LONG 字段之一上的 InterlockedIncrement（可能在 LockCount 字段上），如果成功，则认为无需获取锁已经进入内核了。

我认为 InterlockedIncrement API 在用户模式下实现为“LOCK INC”操作码……换句话说，您可以获取一个无争议的关键部分，而无需对内核进行任何环转换。

回复收藏 0 原文

青芜 2024-07-27 12:46:04

在性能工作中，很少有东西属于“总是”类别:)如果您自己使用其他原语实现类似于操作系统关键部分的东西，那么在大多数情况下，速度会更慢。

回答您的问题的最佳方法是性能测量。操作系统对象的执行方式非常取决于场景。例如，如果争用较低，则关键部分通常被认为是“快”。如果锁定时间小于自旋计数时间，它们也被认为是快速的。

要确定的最重要的事情是关键部分的争用是否是应用程序中的首要限制因素。如果没有，那么只需正常使用关键部分并解决应用程序的主要瓶颈（或瓶颈）。

如果关键部分的性能至关重要，那么您可以考虑以下事项。

仔细设置“热门”关键部分的自旋锁计数。如果性能至关重要，那么这里的工作是值得的。请记住，虽然自旋锁确实避免了用户模式到内核的转换，但它会以惊人的速度消耗 CPU 时间 - 在自旋时，没有其他任何东西可以使用该 CPU 时间。如果锁持有的时间足够长，那么旋转线程实际上会阻塞，从而释放 CPU 来执行其他工作。
如果您有读取器/写入器模式，请考虑使用 Slim Reader /写入器（SRW）锁定。缺点是它们仅适用于 Vista 和 Windows Server 2008 及更高版本的产品。
您可以将条件变量与您的关键部分以最大限度地减少轮询和争用，仅在需要时唤醒线程。同样，这些在 Vista 和 Windows Server 2008 及更高版本的产品上受支持。
考虑使用互锁单链表 (SLIST) -这些都是高效且“无锁”的。更好的是，它们在 XP 和 Windows Server 2003 及更高版本的产品上受支持。
检查您的代码 - 您也许可以通过重构某些代码并使用互锁操作或 SLIST 进行同步和通信来打破“热”锁。

总之，调整存在锁争用的场景可能是一项具有挑战性（但很有趣！）的工作。专注于测量应用程序性能并了解热路径在哪里。 Windows 性能工具包中的 xperf 工具是您的朋友:)我们刚刚在适用于 Windows 7 和 .NET Framework 3.5 SP1 的 Microsoft Windows SDK 中发布了版本 4.5 (ISO 在这里，网络安装程序位于此处）。您可以在此处找到 xperf 工具的论坛。 V4.5完全支持Win7、Vista、Windows Server 2008 - 所有版本。

In performance work, few things fall into the "always" category :) If you implement something yourself that is similar to an OS critical section using other primitives then odds are that will be slower in most cases.

The best way to answer your question is with performance measurements. How OS objects perform is very dependent on the scenario. For example, critical sections are general considered 'fast' if contention is low. They are also considered fast if the lock time is less than the spin count time.

The most important thing to determine is if contention on a critical section is the first order limiting factor in your application. If not, then simply use a critical section normaly and work on your applications primary bottleneck (or necks).

If critical section performance is critical, then you can consider the following.

Carefully set the spin lock count for your 'hot' critical sections. If performance is paramount, then the work here is worth it. Remember, while the spin lock does avoid the user mode to kernel transition, it consumes CPU time at a furious rate - while spinning, nothing else gets to use that CPU time. If a lock is held for long enough, then the spinning thread will actual block, freeing up that CPU to do other work.
If you have a reader/writer pattern then consider using the Slim Reader/Writer (SRW) locks. The downside here is they are only available on Vista and Windows Server 2008 and later products.
You may be able to use condition variables with your critical section to minimize polling and contention, waking threads only when needed. Again, these are supported on Vista and Windows Server 2008 and later products.
Consider using Interlocked Singly Linked Lists (SLIST)- these are efficient and 'lock free'. Even better, they are supported on XP and Windows Server 2003 and later products.
Examine your code - you may be able to break up a 'hot' lock by refactoring some code and using an interlocked operation, or SLIST for synchronization and communication.

In summary - tuning scenarios that have lock contention can be challenging (but interesting!) work. Focus on measuring your applications performance and understanding where your hot paths are. The xperf tools in the Windows Performance Tool kit is your friend here :) We just released version 4.5 in the Microsoft Windows SDK for Windows 7 and .NET Framework 3.5 SP1 (ISO is here, web installer here). You can find the forum for the xperf tools here. V4.5 fully supports Win7, Vista, Windows Server 2008 - all versions.

回复收藏 0 原文