使用 Actor 池有意义吗?
我刚刚学习并且非常喜欢 Actor 模式。我现在正在使用 Scala,但我对一般的架构风格很感兴趣,因为它在 Scala、Erlang、Groovy 等中使用。
我正在考虑的情况是我需要同时做一些事情,例如,让我们说“运行一项工作”。
通过线程,我将创建一个线程池和一个阻塞队列,并让每个线程轮询阻塞队列,并在作业进出队列时对其进行处理。
对于演员来说,处理这个问题的最佳方法是什么?创建一个参与者池,并以某种方式向他们发送包含作业的消息是否有意义?也许有一个“协调员”演员?
注意:我忘记提及的案例的一个方面是:如果我想限制我的应用程序同时处理的作业数量怎么办?也许有一个配置设置?我在想一个池可以让这件事变得更容易。
谢谢!
I'm just learning, and really liking, the Actor pattern. I'm using Scala right now, but I'm interested in the architectural style in general, as it's used in Scala, Erlang, Groovy, etc.
The case I'm thinking of is where I need to do things concurrently, such as, let's say "run a job".
With threading, I would create a thread pool and a blocking queue, and have each thread poll the blocking queue, and process jobs as they came in and out of the queue.
With actors, what's the best way to handle this? Does it make sense to create a pool of actors, and somehow send messages to them containing or the jobs? Maybe with a "coordinator" actor?
Note: An aspect of the case which I forgot to mention was: what if I want to constrain the number of jobs my app will process concurrently? Maybe with a config setting? I was thinking that a pool might make it easy to do this.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
池是当创建和销毁资源的成本很高时使用的一种机制。在 Erlang 中情况并非如此,因此您不应该维护池。
您应该在需要时生成进程,并在使用完毕后销毁它们。
A pool is a mechanism you use when the cost of creating and tearing down a resource is high. In Erlang this is not the case so you should not maintain a pool.
You should spawn processes as you need them and destroy them when you have finished with them.
有时,限制在大型任务列表上同时运行的工作进程数是有意义的,因为生成进程要完成的任务涉及资源分配。至少进程会耗尽内存,但它们也可能会保留打开的文件和/或套接字,这些文件和/或套接字往往仅限于数千个,一旦用完,就会严重失败且不可预测。
要拥有一个拉动任务池,可以生成N个请求任务的链接进程,并为它们提供一个可以spawn_monitor的函数。一旦受监控的进程结束,他们就会返回执行下一个任务。特定需求驱动细节,但这只是一种方法的概要。
我让每个任务产生一个新进程的原因是进程确实有一些状态,并且从头开始是很好的。这是一种常见的微调,用于设置进程的最小堆大小,以最大程度地减少其生命周期内所需的 GC 数量。这也是一种非常有效的垃圾收集,可以释放进程的所有内存,并为下一个任务启动新的内存。
使用两倍数量的进程是不是感觉很奇怪?这是你在 Erlang 编程中需要克服的感觉。
Sometimes, it makes sense to limit how many working processes you have operating concurrently on a large task list, as the task the process is spawned to complete involve resource allocations. At the very least processes use up memory, but they could also keep open files and/or sockets which tend to be limited to only thousands and fail miserably and unpredictable once you run out.
To have a pull-driven task pool, one can spawn N linked processes that ask for a task, and one hand them a function they can spawn_monitor. As soon as the monitored process has ended, they come back for the next task. Specific needs drive the details, but that is the outline of one approach.
The reason I would let each task spawn a new process is that processes do have some state and it is nice to start off a clean slate. It's a common fine-tuning to set the min-heap size of processes adjusted to minimize the number of GCs needed during its lifetime. It is also a very efficient garbage collection to free all memory for a process and start on a new one for the next task.
Does it feel weird to use twice the number of processes like that? It's a feeling you need to overcome in Erlang programming.
没有适合所有情况的最佳方法。该决定取决于作业的数量、持续时间、到达和所需的完成时间。
生成演员和使用池之间最明显的区别是,在前一种情况下,您的工作几乎同时完成,而在后一种情况下,完成时间将及时分散。但平均完成时间是相同的。
使用参与者的优点是编码简单,因为它不需要额外的处理。代价是您的参与者将争夺您的 CPU 核心。无论您使用哪种编程范例,您都无法拥有比 CPU 核心(或 HT 等)更多的并行作业。
举个例子,假设您需要执行 100,000 个作业,每个作业需要一分钟,并且结果将于下个月到期。你有四个核心。您会派生出 100'000 个演员,每个演员都会争夺一个月的资源,还是只是将您的工作排队,然后一次执行四个?
作为反例,想象一下在同一台机器上运行的 Web 服务器。如果您有 5 个请求,您愿意在 T 时间内为 4 个用户提供服务,在 2T 时间内为 1 个用户提供服务,还是在 1.2T 时间内为所有 5 个用户提供服务?
There is no best way for all cases. The decision depends on the number, duration, arrival, and required completion time of the jobs.
The most obvious difference between just spawning off actors, and using pools is that in the former case your jobs will be finished nearly at the same time, while in the latter case completion times will be spread in time. The average completion time will be the same though.
The advantage of using actors is the simplicity on coding, as it requires no extra handling. The trade-off is that your actors will be competing for your CPU cores. You will not be able to have more parallel jobs than CPU cores (or HT's, whatever), no matter what programming paradigm you use.
As an example, imagine that you need to execute 100'000 jobs, each taking one minute, and the results are due next month. You have four cores. Would you spawn off 100'000 actors having each compete over the resources for a month, or would you just queue your jobs up, and have execute four at a time?
As a counterexample, imagine a web server running on the same machine. If you have five requests, would you prefer to serve four users in T time, and one in 2T, or serve all five in 1.2T time ?