跨 AppDomain 线程池 W/O 上下文切换
我们的应用程序严重依赖 CPU 来尽可能快地处理数十亿个元组的数据。它使用多核,并很快转向云中的分布式。
因此,我们的目标是尽可能高效地使用 CPU。这个问题是关于如何保持高水平的性能,从而允许插件在运行时动态加载/卸载。
请理解,虽然跨 AppDomain 进行通信很容易,但没有一种“简单”的方法能够满足上述性能要求。所以这个问题讨论了通用技术太慢的原因和进一步的要求以及我们的想法需要解决的具体问题。
为了实现性能,应用程序被设计为通过“消息传递”在组件之间进行通信,这意味着我们有一个用户模式任务调度程序来保持这些线程忙碌,而不会(除非必要时)放弃任何时间片或上下文切换到操作系统。
在 .NET 中卸载 DLL 的唯一方法是通过 AppDomains。然而,为了保持上述的高水平性能,这意味着线程池(我们有自己的国产线程池)必须能够在各种不同的AppDomain中执行任务。
更具体地说,如果数十个 AppDomain 中的每一个都有单独的线程,然后争夺 CPU,那么性能将会非常糟糕。对于 CPU 密集型工作来说,线程数多于核心数将会降低性能,因为操作系统会花费大量时间在线程之间进行上下文切换。
经过初步研究,我们自己的线程池似乎以零效率的方式跨入其他具有良好性能的 AppDomain。即使方法的参数为零,远程处理或序列化也非常慢。
请注意,这与数据共享无关。我们不想使用线程调用不同的AppDomain来传递数据。相反,我们希望通过套接字和内存映射文件在 AppDomain 之间共享数据,以实现顶级飞行性能,就像正确的进程间通信一样。所以这个问题只涉及让线程跨AppDomains工作。
Stackoverflow 上的以下链接已有 2 年多的历史,暗示利用 .Net 的 CLR 中的内置线程池,该链接指出它会跨其他 AppDomain 来执行任务。 MS 文档还验证了 CLR 线程池是否可以跨所有 AppDomain 运行。
.Net如何创建跨进程的所有 AppDomain 共享的自定义线程池?
还是在阅读文档之后,如何跨 AppDomain 使用内置线程池,同时在跨 AppDomain 时绝不允许任何上下文切换?
所以设计目标是如何轮换线程(每个核心一个)来频繁检查每个AppDomain中任务的“运行队列”,看看那里是否有工作要做,然后移动到下一个AppDomain?等等,循环遍历每个 AppDomains 调度程序?如何在不等待任何上下文切换开销或远程处理或编组的情况下做到这一点?
当然,请注意,我们将巧妙地将 AppDomain 分配给每个线程,以避免 L1 缓存未命中,从而避免硬件瓶颈。
我们想知道的另一个想法是编写我们自己的自定义 CLR 主机。看来 C++ API 允许实现我们自己的线程池。有谁知道这是否允许上述功能?如果是这样,这是通过非托管代码实现此目的的唯一方法吗?
Our application is heavily CPU bound to process billions of tuples of data as fast as possible. It uses multi-cores and soon moving to distributed in the cloud.
So the goal is to using the CPU absolutely as efficiently as possible. This question is about how to maintain that high level of performance will allows plugs to loaded/unloaded dynamically at run time.
Please understand that while it's easy to communicate across AppDomains, none of the "easy" ways will meet the performance requirements above. So this questions discusses the reasons that the common techniques are too slow and further requirement plus specific questions of our ideas to solve.
To achieve the performance, the application has been designed to communicate among components via "message passing" which means that we have a user mode task scheduler to keep those threads busy without ever (except when necessary) relinquishing any time slices or context switches to the operating system.
The ONLY way to unload DLLs in .NET is via AppDomains. However, to maintain the high level of performance described above it means that the thread pool (we have our own home-grown thread pool) must be able to perform tasks in various different AppDomains.
More specifically, it will be terrible performance to have separate threads for each of dozens of AppDomains that then compete for CPU. More threads than cores for CPU bound work will kill performance as the operating systems spends tremendous time context switching among threads.
There seems after preliminary research zero efficient way for our own thread pool to cross into other AppDomain with any kind of decent performance. Remoting or serializing are both impossibly slow even with zero arguments to methods.
Please note that this is not about data sharing. We don't want to use the thread calls into different AppDomain to passing data. Instead, we expect to share data among AppDomains via sockets and memory mapped files for top flight performance just like proper Interprocess Communication. So this question only relates to get threads working across AppDomains.
The following link here on Stackoverflow which is over 2 years old hints at capitalizing on the built-in thread pool in the CLR for .Net which states that it crosses other AppDomains to do tasks. MS documentation also verifies that the CLR thread pool operates across all AppDomains.
.Net How to create a custom ThreadPool shared across all the AppDomain of a process?
Still after reading documentation, how to use the built-in thread pool across AppDomain while NEVER allowing any context switches when crossing AppDomains?
So the design goal is how to rotate the threads (one per core) to frequently check the "run queue" of tasks in each AppDomain to see if there is work to do there and then move to the next AppDomain? And so on, looping through each AppDomains scheduler? How to do that w/o waiting any context switching overhead or remoting or marshalling?
Note, of course, we'll have some cleverness into which AppDomains are assigned to each threads to avoid L1 cache misses and such to avoid hardware bottlenecks.
Also another idea that we wonder about is writing our own custom CLR host. It appears that the C++ API allows implementing our own thread pool. Does anyone know if that will allow for the above capabilities? If so, is that the only way to do it through unmanaged code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论