Parallel.ForEach 可以与 CloudTableQuery 一起安全使用吗
我在 Azure 表中有合理数量的记录,我正在尝试对其进行一次性数据加密。我认为我可以通过使用 Parallel.ForEach 来加快速度。另外,因为有超过 1K 条记录,而且我不想自己搞乱连续令牌,所以我使用 CloudTableQuery 来获取我的枚举器。
我的问题是,我的一些记录已被双重加密,并且我意识到我不确定 CloudTableQuery.Execute()
返回的枚举器的线程安全性如何。还有其他人有过这种组合的经验吗?
I have a reasonable number of records in an Azure Table that I'm attempting to do some one time data encryption on. I thought that I could speed things up by using a Parallel.ForEach
. Also because there are more than 1K records and I don't want to mess around with continuation tokens myself I'm using a CloudTableQuery to get my enumerator.
My problem is that some of my records have been double encrypted and I realised that I'm not sure how thread safe the enumerator returned by CloudTableQuery.Execute()
is. Has anyone else out there had any experience with this combination?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我愿意打赌,执行返回线程安全的
IEnumerator
实现的答案是极不可能的。也就是说,这听起来像是生产者-消费者模式的又一个案例。在您的特定场景中,我会让名为 Execute 的原始线程按顺序读取结果并将其填充到
BlockingCollection
。不过,在开始执行此操作之前,您需要启动一个单独的Task
,该任务将使用Parallel::ForEach
控制这些项目的消耗。现在,您可能还想考虑使用 ParallelExtensions 库的 GetConsumingPartitioner 方法,以提高效率,因为在这种情况下,默认分区程序会产生比您想要的更多的开销。您可以从这篇博文。与原始
ConcurrentQueueu
相比,使用BlockingCollection
的另一个好处是它提供了 设置界限,这可以帮助阻止生产者向集合中添加超出消费者可以跟上的项目。当然,您需要进行一些性能测试来找到适合您的应用程序的最佳点。I would be willing to bet the answer to Execute returning a thread-safe
IEnumerator
implementation is highly unlikely. That said, this sounds like yet another case for the producer-consumer pattern.In your specific scenario I would have the original thread that called Execute read the results off sequentially and stuff them into a
BlockingCollection<T>
. Before you start doing that though, you want to start a separateTask
that will control the consumption of those items usingParallel::ForEach
. Now, you will probably also want to look into using theGetConsumingPartitioner
method of the ParallelExtensions library in order to be most efficient since the default partitioner will create more overhead than you want in this case. You can read more about this from this blog post.An added bonus of using
BlockingCollection<T>
over a rawConcurrentQueueu<T>
is that it offers the ability to set bounds which can help block the producer from adding more items to the collection than the consumers can keep up with. You will of course need to do some performance testing to find the sweet spot for your application.尽管我尽了最大努力,我还是无法复制我原来的问题。因此,我的结论是,将
Parallel.ForEach
循环与CloudTableQuery.Execute()
结合使用是完全可以的。Despite my best efforts I've been unable to replicate my original problem. My conclusion is therefore that it is perfectly OK to use
Parallel.ForEach
loops withCloudTableQuery.Execute()
.