用于数字运算的最快 nosql 选项?
我一直认为 Mongo 的 MapReduce 功能具有出色的性能,但现在我发现它的实现速度很慢。因此,如果我必须选择一个替代方案来进行基准测试,它应该是什么?
我的软件将使得用户通常拥有数百万条记录,并且经常对数十或数百个不可预测的子集进行排序和处理。大多数使用完整数百万条记录的数据分析可以在汇总表等中完成。我最初认为 Hypertable 是一个可行的替代方案,但在进行研究时,我在他们的文档中看到他们提到 Mongo 将是一个性能更高的选择,而 Hypertable 还有其他好处。但对于我的应用程序来说,速度是我的第一要务。
I had always thought that Mongo had excellent performance with it's mapreduce functionality, but am now reading that it is a slow implementation of it. So if I had to pick an alternative to benchmark against, what should it be?
My software will be such that users will often have millions of records, and often be sorting and crunching through unpredictable subsets that are 10s or 100s of thousands. Most of the analysis of data that uses the full millions of records can be done in summary tables and the like. I'd originally thought Hypertable was a viable alternative, but in doing research I saw in their documents their mention that Mongo would be a more performant option, while Hypertable had other benefits. But for my application speed is my number one initial priority.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,决定什么是“足够快”很重要。毫无疑问,有比 MongoDB 的 Map/Reduce 更快的解决方案,但在大多数情况下,您可能会面临更高的开发成本。
也就是说,在撰写本文时,MongoDB 的 map/reduce 在单个线程上运行,这意味着它不会利用所有可用的 cpu。此外,MongoDB 的原生聚合功能非常少。这将在 2.1 版之后进行更改,从而提高性能(请参阅 https://jira.mongodb。 org/browse/SERVER-447 和 http://www.slideshare.net/cwestin63/mongodb-aggregation-mongosf-may-2011)。
现在,MongoDB 擅长的是轻松扩展,尤其是在读取方面。这很重要,因为对大型数据集进行数字处理的最佳解决方案肯定是像奥古斯托建议的那样的映射/归约云。让这样的 m/r 进行数字运算,而 MongoDB 则可以高速提供所需的数据。通过添加更多 mongo 分片可以轻松解决数据库查询吞吐量过低的问题。通过添加更多 m/r 框可以解决数字处理/聚合性能太慢的问题。基本上,性能成为您为问题保留的实例数量的函数,从而决定了成本。
First of all, it's important to decide on what is "fast enough". Undoubtedly there are faster solutions than MongoDB's map/reduce but in most cases you may be looking at significantly higher development cost.
That said MongoDB's map/reduce runs, at time of writing, on a single thread which means it will not utilize all the cpu available to it. Also, MongoDB has very little in the way of native aggregation functionality. This will change fixed with version 2.1 onwards that should improve performance though (see https://jira.mongodb.org/browse/SERVER-447 and http://www.slideshare.net/cwestin63/mongodb-aggregation-mongosf-may-2011).
Now, what MongoDB is good at is scaling up easily, especially when it comes to reads. And this is important because the best solution for number crunching on large datasets is definitely a map/reduce cloud like Augusto suggested. Let such an m/r do the number crunching while MongoDB makes the required data available at high speeds. Database query throughput too low is easily solved by adding more mongo shards. Number crunching/aggregation performance too slow is solved by adding more m/r boxes. Basically performance becomes a function of number of instances you reserve for the problem, and thus cost.