如何高效地并行化数组列表并控制并行度?
我有一个 resourceId
数组,我需要并行循环。并为每个资源生成URL
,然后放入一个地图中,其中键(resourcId)和值是url。
我得到下面的代码可以完成这项工作,但我不确定这是否是正确的方法。我在这里使用 sizedwaitgroup 来并行化 resourceId
列表。并且在向地图写入数据时也使用地图锁。我确信这不是有效的代码,因为使用锁然后使用调整大小的等待组会产生一些性能问题。
做到这一点的最佳且有效的方法是什么?我应该在这里使用频道吗?我想控制应该有多少并行度,而不是 resourceId
列表的运行长度。如果任何 resourceId
url 生成失败,我想将其记录为该 resourceId
的错误,但不要中断其他并行运行的 go 例程以获取为其他 resourceId 生成的 url代码>资源ID。
例如:如果有 10 个资源,其中 2 个失败,则记录这 2 个资源的错误,并且映射应该包含剩余 8 个资源的条目。
// running 20 threads in parallel
swg := sizedwaitgroup.New(20)
var mutex = &sync.Mutex{}
start := time.Now()
m := make(map[string]*customerPbV1.CustomerResponse)
for _, resources := range resourcesList {
swg.Add()
go func(resources string) {
defer swg.Done()
customerUrl, err := us.GenerateUrl(clientId, resources, appConfig)
if err != nil {
errs.NewWithCausef(err, "Could not generate the url for %s", resources)
}
mutex.Lock()
m[resources] = customerUrl
mutex.Unlock()
}(resources)
}
swg.Wait()
elapsed := time.Since(start)
fmt.Println(elapsed)
注意:以上代码将以高吞吐量从多个读取器线程调用,因此它需要表现良好。
I have a resourceId
array which I need loop in parallel. And generate URL
for each resource and then put inside a map which is key (resourcId) and value is url.
I got below code which does the job but I am not sure if this is the right way to do it. I am using sizedwaitgroup here to parallelize the resourceId
list. And also using lock on map while writing the data to it. I am sure this isn't efficient code as using lock and then using sizedwaitgroup will have some performance problem.
What is the best and efficient way to do this? Should I use channels here? I want to control the parallelism on how much I should have instead of running length of resourceId
list. If any resourceId
url generation fails, I want to log that as an error for that resourceId
but do not disrupt other go routine running in parallel to get the url generated for other resourceId
.
For example: If there are 10 resources, and 2 fails then log error for those 2 and map should have entry for remaining 8.
// running 20 threads in parallel
swg := sizedwaitgroup.New(20)
var mutex = &sync.Mutex{}
start := time.Now()
m := make(map[string]*customerPbV1.CustomerResponse)
for _, resources := range resourcesList {
swg.Add()
go func(resources string) {
defer swg.Done()
customerUrl, err := us.GenerateUrl(clientId, resources, appConfig)
if err != nil {
errs.NewWithCausef(err, "Could not generate the url for %s", resources)
}
mutex.Lock()
m[resources] = customerUrl
mutex.Unlock()
}(resources)
}
swg.Wait()
elapsed := time.Since(start)
fmt.Println(elapsed)
Note: Above code will be called at high throughput from multiple reader threads so it needs to perform well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不确定 sizedwaitgroup 是什么,也没有解释,但总的来说,这种方法看起来不是很典型的 Go。就此而言,“最好”是一个见仁见智的问题,但 Go 中最典型的方法是这样的:(
不过,根据名称,我认为 errs.NewWithCause 不会) t 实际上处理错误,但返回一个,在这种情况下,当前代码将它们扔到地板上,正确的解决方案将有一个额外的
chan error
来处理错误:I'm not sure what
sizedwaitgroup
is and it's not explained, but overall this approach doesn't look very typical of Go. For that matter, "best" is a matter of opinion, but the most typical approach in Go would be something along these lines:(Though, based on the name, I would assume
errs.NewWithCause
doesn't actually handle errors, but returns one, in which case the current code is dropping them on the floor, and a proper solution would have an additionalchan error
for handling errors:我已经创建了带有注释的示例代码。
请阅读评论。
游乐场:https://go.dev/play/p/LeyE9n1hh81
I have create example code with comment.
please read the comment.
playground : https://go.dev/play/p/LeyE9n1hh81
这是一个纯通道解决方案(playground)。
我认为性能实际上取决于
GenerateUrl
或我的代码generateURL
。我还想指出的另一件事是,正确的术语是 并发而不是并行。
Here is a pure channel solution (playground).
I think the performance really depends on the
GenerateUrl
or in my codegenerateURL
.Also one more thing I would like to point out is that correct term for this is concurrency not parallelism.