处理 Azure 表批量更新的可靠且高效的方法

发布于 2024-10-18 23:12:59 字数 1893 浏览 2 评论 0原文

我有一个 IEnumerable，我想以最有效的方式将其添加到 Azure 表中。由于每次批量写入都必须定向到相同的 PartitionKey，每次写入限制为 100 行...

是否有人想尝试以 TODO 部分中引用的“正确”方式实现此操作？我不确定为什么 MSFT 没有在这里完成任务...

而且我不确定错误处理是否会使这个问题复杂化，或者实现它的正确方法。以下是 Microsoft 模式和实践团队针对 Windows Azure “Tailspin Toys”演示的代码

    public void Add(IEnumerable<T> objs)
    {
        // todo: Optimize: The Add method that takes an IEnumerable parameter should check the number of items in the batch and the size of the payload before calling the SaveChanges method with the SaveChangesOptions.Batch option. For more information about batches and Windows Azure table storage, see the section, "Transactions in aExpense," in Chapter 5, "Phase 2: Automating Deployment and Using Windows Azure Storage," of the book, Windows Azure Architecture Guide, Part 1: Moving Applications to the Cloud, available at http://msdn.microsoft.com/en-us/library/ff728592.aspx.

        TableServiceContext context = this.CreateContext();

        foreach (var obj in objs)
        {
            context.AddObject(this.tableName, obj);
        }

        var saveChangesOptions = SaveChangesOptions.None;
        if (objs.Distinct(new PartitionKeyComparer()).Count() == 1)
        {
            saveChangesOptions = SaveChangesOptions.Batch;
        }

        context.SaveChanges(saveChangesOptions);
    }


   private class PartitionKeyComparer : IEqualityComparer<TableServiceEntity>
    {
        public bool Equals(TableServiceEntity x, TableServiceEntity y)
        {
            return string.Compare(x.PartitionKey, y.PartitionKey, true, System.Globalization.CultureInfo.InvariantCulture) == 0;
        }

        public int GetHashCode(TableServiceEntity obj)
        {
            return obj.PartitionKey.GetHashCode();
        }
    }

原文

I have an IEnumerable that I'd like to add to Azure Table in the most efficient way possible. Since every batch write has to be directed to the same PartitionKey, with a limit of 100 rows per write...

Does anyone want to take a crack at implementing this the "right" way as referenced in the TODO section? I'm not sure why MSFT didn't finish the task here...

Also I'm not sure if error handling will complicate this, or the correct way to implement it. Here is the code from the Microsoft Patterns and Practices team for Windows Azure "Tailspin Toys" demo

    public void Add(IEnumerable<T> objs)
    {
        // todo: Optimize: The Add method that takes an IEnumerable parameter should check the number of items in the batch and the size of the payload before calling the SaveChanges method with the SaveChangesOptions.Batch option. For more information about batches and Windows Azure table storage, see the section, "Transactions in aExpense," in Chapter 5, "Phase 2: Automating Deployment and Using Windows Azure Storage," of the book, Windows Azure Architecture Guide, Part 1: Moving Applications to the Cloud, available at http://msdn.microsoft.com/en-us/library/ff728592.aspx.

        TableServiceContext context = this.CreateContext();

        foreach (var obj in objs)
        {
            context.AddObject(this.tableName, obj);
        }

        var saveChangesOptions = SaveChangesOptions.None;
        if (objs.Distinct(new PartitionKeyComparer()).Count() == 1)
        {
            saveChangesOptions = SaveChangesOptions.Batch;
        }

        context.SaveChanges(saveChangesOptions);
    }


   private class PartitionKeyComparer : IEqualityComparer<TableServiceEntity>
    {
        public bool Equals(TableServiceEntity x, TableServiceEntity y)
        {
            return string.Compare(x.PartitionKey, y.PartitionKey, true, System.Globalization.CultureInfo.InvariantCulture) == 0;
        }

        public int GetHashCode(TableServiceEntity obj)
        {
            return obj.PartitionKey.GetHashCode();
        }
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娜些时光，永不杰束 2024-10-25 23:12:59

好吧，我们（模式和实践团队）只是为了展示我们认为有用的其他内容而进行了优化。上面的代码并不是真正的“通用库”，而是使用它的示例的特定方法。

在那一刻，我们认为添加额外的错误处理不会增加太多，并且我们决定保持简单，但是......我们可能错了。

无论如何，如果您点击 //TODO: 中的链接，您会发现我们之前编写的指南的另一部分，其中更多地讨论了“复杂”存储事务中的错误处理（不是以“ACID”形式，尽管Windows Azure 存储不支持事务“ala DTC”）。

链接是这样的： http://msdn.microsoft.com/en-us/library /ff803365.aspx

此处更详细地列出了限制：

批次中只能存在实体的一个实例
最多 100 个实体或 4 MB 有效负载
相同的 PartitionKey（正在代码中处理：请注意“仅当存在单个分区键时才指定“batch”）
等。

添加一些额外的错误处理不应使事情变得过于复杂，但取决于您在此基础上构建的应用程序的类型以及您处理此较高或较低的偏好你的应用程序堆栈。在我们的示例中，应用程序永远不会期望 >无论如何，有 100 个实体，因此如果发生这种情况，它只会将异常冒泡（因为它应该是真正的异常）。与总尺寸相同。应用程序中实现的用例使得不可能在同一个集合中拥有相同的实体，因此，这种情况永远不应该发生（如果发生，它只会抛出）

所有“实体组事务”限制都记录在此处：< a href="http://msdn.microsoft.com/en-us/library/dd894038.aspx" rel="nofollow">http://msdn.microsoft.com/en-us/library/dd894038.aspx< /a>

让我们知道进展如何！我也有兴趣知道该指南的其他部分是否对您有用。