当我想要对同一输入调用多个昂贵的操作然后收集结果时,我可以使用 Scala 的并行集合吗?

发布于 2024-12-11 06:14:57 字数 814 浏览 0 评论 0原文

我发现了一个类似的问题但是它有一个似乎更简单的情况,其中昂贵的操作总是相同的。就我而言,我想收集一些我想并行执行的昂贵 API 调用的一组结果。

假设我有:

def apiRequest1(q: Query): Option[Result]
def apiRequest2(q: Query): Option[Result]

其中 q 是相同的值。

我想要一个 List[Result] 或类似的(显然 List[Option[Result]] 很好),并且我希望这两个昂贵的操作并行发生。

当然,简单的 List 构造函数不会并行执行:

List(apiRequest1(q), apiRequest2(q))

并行集合有帮助吗?或者我应该寻找期货之类的东西?我能想到的使用并行集合的唯一方法似乎很老套:

 List(q, q).par.zipWithIndex.flatMap((q) =>
   if (q._2 % 2 == 0) apiRequest1(q._1) else apiRequest2(q._1)
 )

实际上,所有事情都是平等的,也许这并没有那么糟糕......

I found a similar question but it has what seems to be a simpler case, where the expensive operation is always the same. In my case, I want to collect a set of results of some expensive API calls that I'd like to execute in parallel.

Say I have:

def apiRequest1(q: Query): Option[Result]
def apiRequest2(q: Query): Option[Result]

where q is the same value.

I'd like a List[Result] or similar (obviously List[Option[Result]] is fine) and I'd like the two expensive operations to happen in parallel.

Naturally a simple List constructor doesn't execute in parallel:

List(apiRequest1(q), apiRequest2(q))

Can the parallel collections help? Or should I be looking to futures and the like instead? The only approach I can think of using parallel collections seems hacky:

 List(q, q).par.zipWithIndex.flatMap((q) =>
   if (q._2 % 2 == 0) apiRequest1(q._1) else apiRequest2(q._1)
 )

Actually, all things being equal, maybe that isn't so bad...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

残月升风 2024-12-18 06:14:57

你为什么不写

List(apiRequest1 _, apiRequest2 _).par.map(_(q))

Why don’t you write

List(apiRequest1 _, apiRequest2 _).par.map(_(q))
余生一个溪 2024-12-18 06:14:57

快速而肮脏的解决方案:

scala> def apiRequest1(q: Query): Option[Result] = { Thread.sleep(1000); Some(new Result) }
apiRequest1: (q: Query)Option[Result]

scala> def apiRequest2(q: Query): Option[Result] = { Thread.sleep(3000); Some(new Result) }
apiRequest2: (q: Query)Option[Result]

scala> val f = List(() => apiRequest1(q), () => apiRequest2(q)).par.map(_())
f: scala.collection.parallel.immutable.ParSeq[Option[Result]] = ParVector(Some(Result@1f24908), Some(Result@198c0b5))

Quick and dirty solution:

scala> def apiRequest1(q: Query): Option[Result] = { Thread.sleep(1000); Some(new Result) }
apiRequest1: (q: Query)Option[Result]

scala> def apiRequest2(q: Query): Option[Result] = { Thread.sleep(3000); Some(new Result) }
apiRequest2: (q: Query)Option[Result]

scala> val f = List(() => apiRequest1(q), () => apiRequest2(q)).par.map(_())
f: scala.collection.parallel.immutable.ParSeq[Option[Result]] = ParVector(Some(Result@1f24908), Some(Result@198c0b5))
︶ ̄淡然 2024-12-18 06:14:57

我不确定如果你只有两个或少量的调用,它实际上会并行工作,并行化有一个阈值,并且它可能会在如此小的集合中按顺序工作,因为它不值得并行化开销(它无法知道,因为它取决于您想要运行的操作,但对集合操作设置阈值是合理的)。

I m not sure it would actually work in parallel if you have only two or a small numbers of calls, there is a threshold for parallelization, and it would probably work sequentially with so small a collection, on the ground that it is not worth the parallelization overhead (it can't know that as it depends on the the operation you want to run, but it is reasonable to have a threshold on collection operations).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文