处理 Azure 表批量更新的可靠且高效的方法
我有一个 IEnumerable
,我想以最有效的方式将其添加到 Azure 表中。由于每次批量写入都必须定向到相同的 PartitionKey,每次写入限制为 100 行...
是否有人想尝试以 TODO 部分中引用的“正确”方式实现此操作?我不确定为什么 MSFT 没有在这里完成任务...
而且我不确定错误处理是否会使这个问题复杂化,或者实现它的正确方法。以下是 Microsoft 模式和实践团队针对 Windows Azure “Tailspin Toys”演示的代码
public void Add(IEnumerable<T> objs)
{
// todo: Optimize: The Add method that takes an IEnumerable parameter should check the number of items in the batch and the size of the payload before calling the SaveChanges method with the SaveChangesOptions.Batch option. For more information about batches and Windows Azure table storage, see the section, "Transactions in aExpense," in Chapter 5, "Phase 2: Automating Deployment and Using Windows Azure Storage," of the book, Windows Azure Architecture Guide, Part 1: Moving Applications to the Cloud, available at http://msdn.microsoft.com/en-us/library/ff728592.aspx.
TableServiceContext context = this.CreateContext();
foreach (var obj in objs)
{
context.AddObject(this.tableName, obj);
}
var saveChangesOptions = SaveChangesOptions.None;
if (objs.Distinct(new PartitionKeyComparer()).Count() == 1)
{
saveChangesOptions = SaveChangesOptions.Batch;
}
context.SaveChanges(saveChangesOptions);
}
private class PartitionKeyComparer : IEqualityComparer<TableServiceEntity>
{
public bool Equals(TableServiceEntity x, TableServiceEntity y)
{
return string.Compare(x.PartitionKey, y.PartitionKey, true, System.Globalization.CultureInfo.InvariantCulture) == 0;
}
public int GetHashCode(TableServiceEntity obj)
{
return obj.PartitionKey.GetHashCode();
}
}
I have an IEnumerable
that I'd like to add to Azure Table in the most efficient way possible. Since every batch write has to be directed to the same PartitionKey, with a limit of 100 rows per write...
Does anyone want to take a crack at implementing this the "right" way as referenced in the TODO section? I'm not sure why MSFT didn't finish the task here...
Also I'm not sure if error handling will complicate this, or the correct way to implement it. Here is the code from the Microsoft Patterns and Practices team for Windows Azure "Tailspin Toys" demo
public void Add(IEnumerable<T> objs)
{
// todo: Optimize: The Add method that takes an IEnumerable parameter should check the number of items in the batch and the size of the payload before calling the SaveChanges method with the SaveChangesOptions.Batch option. For more information about batches and Windows Azure table storage, see the section, "Transactions in aExpense," in Chapter 5, "Phase 2: Automating Deployment and Using Windows Azure Storage," of the book, Windows Azure Architecture Guide, Part 1: Moving Applications to the Cloud, available at http://msdn.microsoft.com/en-us/library/ff728592.aspx.
TableServiceContext context = this.CreateContext();
foreach (var obj in objs)
{
context.AddObject(this.tableName, obj);
}
var saveChangesOptions = SaveChangesOptions.None;
if (objs.Distinct(new PartitionKeyComparer()).Count() == 1)
{
saveChangesOptions = SaveChangesOptions.Batch;
}
context.SaveChanges(saveChangesOptions);
}
private class PartitionKeyComparer : IEqualityComparer<TableServiceEntity>
{
public bool Equals(TableServiceEntity x, TableServiceEntity y)
{
return string.Compare(x.PartitionKey, y.PartitionKey, true, System.Globalization.CultureInfo.InvariantCulture) == 0;
}
public int GetHashCode(TableServiceEntity obj)
{
return obj.PartitionKey.GetHashCode();
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,我们(模式和实践团队)只是为了展示我们认为有用的其他内容而进行了优化。上面的代码并不是真正的“通用库”,而是使用它的示例的特定方法。
在那一刻,我们认为添加额外的错误处理不会增加太多,并且我们决定保持简单,但是......我们可能错了。
无论如何,如果您点击 //TODO: 中的链接,您会发现我们之前编写的指南的另一部分,其中更多地讨论了“复杂”存储事务中的错误处理(不是以“ACID”形式,尽管Windows Azure 存储不支持事务“ala DTC”)。
链接是这样的: http://msdn.microsoft.com/en-us/library /ff803365.aspx
此处更详细地列出了限制:
添加一些额外的错误处理不应使事情变得过于复杂,但取决于您在此基础上构建的应用程序的类型以及您处理此较高或较低的偏好你的应用程序堆栈。在我们的示例中,应用程序永远不会期望 >无论如何,有 100 个实体,因此如果发生这种情况,它只会将异常冒泡(因为它应该是真正的异常)。与总尺寸相同。应用程序中实现的用例使得不可能在同一个集合中拥有相同的实体,因此,这种情况永远不应该发生(如果发生,它只会抛出)
所有“实体组事务”限制都记录在此处:< a href="http://msdn.microsoft.com/en-us/library/dd894038.aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/dd894038.aspx< /a>
让我们知道进展如何!我也有兴趣知道该指南的其他部分是否对您有用。
Well, we (the patterns & practices team) just optimized for showing other things we considered useful. The code above is not really a "general purpose library", but rather a specific method for the sample that uses it.
At that moment we thought that adding that extra error handling would not add much, and we diceided to keep it simple, but....we might have been wrong.
Anyway, if you follow the link in the //TODO:, you will find another section of a previous guide we wrote that talks a little bit more on error handling in "complex" storage transactions (not in the "ACID" form though as transactions "ala DTC" are not supported in Windows Azure Storage).
Link is this: http://msdn.microsoft.com/en-us/library/ff803365.aspx
The limitations are listed in more detail there:
Adding some extra error handling should not overcomplicate things too much, but depends on the type of app you are building on top of this and your preference to handle this higher or lower in your app stack. In our example, the app would never expect > 100 entities anyway, so it would simply bubble the exception up if that situation happens (because it should be truly exceptional). Same with the total size. The use cases implemented in the app make it impossible to have the same entity in the same collection, so again, that should never happen (and if it happens, it wouls simply throw)
All "entity group transactions" limitations are documented here: http://msdn.microsoft.com/en-us/library/dd894038.aspx
Let us know how it goes! I'm also interested to know if other pieces of the guide were useful for you.