Azure 存储投票模式
我正在研究如何将 Azure 存储用于我想要的应用程序。其中一部分涉及一些类似 SO 的投票和收藏功能。与 SO 一样,我希望能够允许用户仅投票/添加收藏夹一次,并在以后将其用于评分/加权目的。
如何使用 Azure 存储或 AWS SimpleDB 来做到这一点?此类场景是否有出现的模式?
I'm investigating the use of Azure Storage for an application I have in mind. Part of it involves some SO-like functionality for voting and favourites. As with SO I'd like to be able to allow a user to vote/add favourites only once and use these for scoring/weighting purposes later on.
How would one go about doing this using Azure Storage, or AWS SimpleDB for that matter? Are there patterns for this type of scenario emerging?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您使用表存储吗? Azure 存储需要记住的棘手问题是缺乏 Count 等功能。
为了防止某人两次投票或支持某事物,您需要使主键包含内容的 ID 和用户的 ID。假设用户可以对评论进行投票。我们将创建一个名为 CommentVotes 的表,其 PartitionKey 为“UserID”,RowKey 为“CommentID”。现在,任何重复项都会引发异常并阻止其发生。问题是在不获取所有行的情况下计算事物的计数。您需要创建另一个表来存储聚合结果,当插入成功时该结果会增加。该表可能看起来像 PK“Comments”、RK“CommentID”、TotalVotes“5”。
Are you using Table Storage? The tricky thing to keep in mind with Azure Storage is the lack of functions such as Count.
To prevent someone from voting or favoring something twice you'll need to make the primary key include the content's ID and the user's ID. Lets say a User can vote on a Comment. We'll create a table called CommentVotes with a PartitionKey of "UserID" and RowKey of "CommentID". Now any duplicates will throw an exception and prevent it from happening. The problem is calculating the Count on things without grabbing all the rows. You'll need to create another table which stores aggregated results which gets incremented when an insert is successful. That table might look like PK "Comments", RK "CommentID", TotalVotes "5".
您可以选择多种方式来利用存储:BLOB(二进制大型对象)或表。 Blob 仅存储文本或二进制数据。表格提供了更多的结构。此外,他们还提供 REST 服务来管理它们。
如果您需要“持久且持久”的存储选项,微软表示 Azure 存储非常适合。但是,如果您的应用程序像大多数应用程序一样会发生变化,我建议使用 SQL Azure。 SQL 在存储应用程序数据方面更为常见。 Azure 存储对于利用诊断日志等更有用,无需设置 SQL 数据库(或编写连接到数据库本身的问题)。
存储的另一个用途可能是对信息进行分区,以便不同的人可以访问它们。例如,您将所有诊断和错误信息放入运维团队的一个位置,然后为需要包含应用程序生成的报告的简单文件的经理创建另一个存储位置。每个存储位置都可以有自己的标识符和连接哈希字符串(抱歉,我不知道它们的正式名称是什么)。
此外,您还可以将存储用于部署和构建目的。我相信 Visual Studio 在将其配置为通过 IDE 进行部署时使用存储来支持部署。我的观点是,越来越多的人发现 SQL 对于应用程序数据和操作数据存储很有用。
我喜欢但尚未实现的一种模式是使用 Azure 队列。当您想要使用几种不同类型的角色扩展应用程序时,它的用处就变得显而易见。例如,Web 角色可以使用队列添加事务,并让两个或多个辅助角色从队列中选取事务进行处理和存储。通常,许多 Azure 应用程序的瓶颈是数据库,因此当考虑可扩展性时,将数据处理从 Web 角色转移到辅助角色非常有用。
You can choose a couple ways to leverage storage: BLOBs (binary large objects) or Tables. Blobs just store text or binary data. Tables provide a little more structure. Furthermore, they provide REST services to manage them.
If you need "persistent and durable" storage options, Microsoft says that Azure Storage is perfect for that. However, if your application is subject to changes in the way that most applications are, I'd recommend using SQL Azure. SQL is much more common for storing application data. Azure storage is more useful for leveraging diagnostic logs and such without the need for setting up a SQL db (or to write problems with connecting to the database, themselves).
Another use for storage could be partitioning out your information so different people can access them. For example, you put all the diagnostics and error information into one location for an Ops team and then create another storage location for Managers who need a simple file that contains a report, which your application generates. Each storage location can have it's own identifier and connection hash string (sorry, I don't know what they're officially called).
Also, you can use storage for deployment and build purposes. I believe Visual Studio uses storage for propping a deployment when configuring it to deploy via the IDE. My point is, more people find SQL useful for application data and storage for operational data.
One pattern that I like but haven't seen implemented yet is the use of Azure Queues. The usefulness becomes apparent when you want to scale your application with a couple different types of roles. For example, a web role can use the Queue to add transactions and have two or more worker roles picking transactions off of the queue for processing and storage. Typically, the bottleneck in many Azure applications is the database, so moving the processing of data out of the web roles into worker roles is useful when scalability is a concern.
最简单的解决方案是每个计数器使用 1 个 blob(也称为 Blob 存储)。 blob 实际上不仅包含最终计数,还包含投票用户的标识符。这将确保不会出现双重投票。 Lokad.Cloud 等开源库可以帮助您解决此问题(免责声明:我的工作在洛卡德)。
这种方法的一个缺点是,计数器的规模不会超过每秒约 10 票 - 对于大多数网络应用程序来说,这已经很多。然后,如果您确实考虑超重型计数器,您应该考虑分片计数器,它可以通过表存储和 Blob 存储来实现。
另一个看待这个问题的角度是考虑CQRS,让投票发出异步处理的命令消息,而Javascript负责向用户提供即时反馈。这里最显着的好处是可以使用代表页面整个状态的单个 blob、投票计数器以及其他内容来加速读取。请检查 Lokad.CQRS 来执行此操作。
The simplest solution consist of using 1 blob per counter (aka Blob Storage). The blob would actually contain not just the final count but the identifiers of the voting users as well. This would ensure no double voting. Open source libraries such as Lokad.Cloud can help you for this (disclaimer: I work at Lokad).
One drawback of this approach is that you counter won't scale above ~10 votes / second - which is already a lot for most web apps. Then, if you are really thinking about super-heavy duty counters, you should think of sharded counters which can be implemented both with Table Storage and Blob Storage.
Another angle to look at this is to think CQRS, and let the voting issue a command message for async processing while Javascript takes care of providing immediate feedback to the user. The most notable benefit here is that it becomes possible to have a single blob representing the entire state of the page, voting counters along with other stuff to speed-up reads. Check Lokad.CQRS to do this.
当您规划分布式系统时,挑战之一是说服自己应该在不同的地方复制信息,这就是数据非规范化。我们大多数人都习惯了数据库中的数据规范化以及所有这些,试图将每种信息类型仅保存在一个位置。但要知道,为了性能和分配性,我们必须做相反的事情。
正如@Vyrotek 已经指出的那样,您应该将“计数”信息存储在其他地方,并在每次对元素进行投票时自行更新。
One of the challenges when you plan a distributed system is convince yourself that you should replicate the information in different places, that's data denormalization. Most of us are used to the data normalization, in databases and all that, trying to keep every information type only in one single place. But know we have to do the opposite for the performance's and distributivity sake.
As @Vyrotek has already pointed out, you should keep the "count" information stored somewhere else and update it yourself every time that an element is voted.