Windows Azure 表存储 LINQ 运算符
目前表存储支持From、Where、Take 和First。
是否有计划支持其他 29 家运营商中的任何一家?
为了实现 COUNT、SUM、GROUP BY 等功能,是否应该遵循存储方面的架构或设计实践?
如果我们必须自己编写这些代码,那么我们通过 SQL 和 SQL Server 寻找类似的性能差异有多大?您是否认为它具有一定的可比性,或者如果我需要对巨大的数据集进行计数、求和或分组,它会慢得多吗?
我喜欢 Azure 平台和基于云的存储的想法。我喜欢表存储,因为它可以存储的数据量及其无模式的性质。由于存储空间成本高昂,SQL Azure 无法正常工作。
Currently Table Storage supports From, Where, Take, and First.
Are there plans to support any of the other 29 operators?
Are there architectural or design practices in regards to storage that one should follow in order to implement things like COUNT, SUM, GROUP BY, etc?
If we have to code for these ourselves, how much of a performance difference are we looking at to something similar via SQL and SQL Server? Do you see it being somewhat comparable or will it be far far slower if I need to do a Count or Sum or Group By over a gigantic dataset?
I like the Azure platform and the idea of cloud based storage. I like Table Storage for the amount of data it can store and its schema-less nature. SQL Azure just won't work due to the high cost of storage space.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Ryan,
正如 Steve 所说,聚合是在“客户端”解决的,如果您的数据集太大,这可能会导致性能不佳。
另一种选择是以不同的方式思考问题。您可能需要预先计算这些值,以便随时使用它们。例如,如果您有主从数据(例如众所周知的采购订单+行项目),您可能希望将“行项目总和”存储在标题中。这可能看起来是“多余的”(确实如此),但反规范化是您必须考虑的事情。
这些预计算可以“同步”或“异步”完成。在某些情况下,您可以承受近似值,因此从性能角度来看,延迟计算可能是有益的。
Ryan,
As Steve said, aggregations are resolved "client side", which might kead to bad perfromance if your datasets are too large.
An alternative is to think about the problem in a different way. You might want to pre-compute those values so they are readily available. For example if you have master-detail data (like the proverbial Purchase order + line items), you might want to store the "sum of line items" in the header. This might appear to be "redundant" (and it is), but de-normalization is something you will have to consider.
These pre-computations can be done "synch" or "asynch". In some situations you can afford having approximations, so delaying the computation might be beneficial from a perfromance perspective.
唯一的选择是将所有内容拉到本地并在本地对象上运行 Count() 或 Sum()。因为在进行计数之前必须传输表的全部内容,所以这肯定会比使用 SQL 之类的服务器端操作慢得多。慢多少取决于数据的大小。
The only alternative is to pull everything down locally and run Count() or Sum() over the local objects. Because you have to transfer the entire contents of your table before doing the count, this will certainly be much slower than doing something server-side like with SQL. How much slower depends on the size of your data.