选择广告/分析服务的数据库
现在我有一个带有广告交换服务的项目(类似于谷歌双击),我必须选择一个高可扩展的数据库。我正在考虑 mongodb 或 cassandra。
Cassandra:
- 适合我们的写入密集型系统。 (+)
- 看起来很难做聚合(对于分析非常重要)(有什么好的方法吗?只需阅读有关 Twitter rainbird 的幻灯片,看起来不错)(?)
- 我不太喜欢 java。 (-)
MongoDB:
- 似乎更容易进行分析。 (有内置聚合函数) (+)
- 消耗更多 RAM? (因为面向文档与键值 Cassandra)(?)
- 与 Cassandra 相比写入性能? (?)
- javascript shell 与 node.js 自然契合(我们项目中的一个重要部分) (+)
- http://pastebin.com/raw.php?i=FD3xe6Jt - 这篇文章让我谨慎。 (-)
你们能帮我选择一个或回答我上面的一些问题吗?
谢谢。
Now I have a project with ads exchange service (something like google double click) and I have to pick a high-scalable database. I'm thinking about mongodb or cassandra.
Cassandra:
- fit with our write-intensive system. (+)
- looks hard to do aggregate(very important for analytics) (is there a good way? Just read slide about Twitter rainbird, seem good) (?)
- I dont prefer java much. (-)
MongoDB:
- Seem easier to do analytics. (have build-in aggregate functions) (+)
- more RAM-consuming? (because of document-oriented vs key-value Cassandra) (?)
- write perfomance compare to Cassandra? (?)
- javascript shell and natural fit with node.js(one important part in our project) (+)
- http://pastebin.com/raw.php?i=FD3xe6Jt - This article make me cautious. (-)
Can you guys help me to pick the one or answer some of my questions above
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不了解 Cassandra,但 MongoDB 在使用它进行分析方面有一些优势:高并发、分片、将事件的所有内容存储在单个文档中、诸如 upsert 和 $inc。
有关更详细的说明,请查看以下资源:
MongoDB 分析 - 视频
http://blog.mongodb.org/post/171353301 /using-mongodb-for-real-time-analytics
http://www.mongodb.org/display/DOCS/Use+Cases
http://www.slideshare.net/jrosoff /scalable-event-analytics-with-mongodb-ruby-on-rails
http://nosql.mypopescu.com/post/3508305955/fast -asynchronous-analytics-with-mongodb
http://blog.opengovernment.org/2011/ 02/24/fast-asynchronous-analytics-with-mongodb/
http://blog.10gen.com/post/4416876632/london-startup-ubervu-on-storing-5tb-of-data-in-mongodb
I don't know about Cassandra, but MongoDB has some advantages for using it for analytics: high concurrency, sharding, storing everything about an event in a single document, features like upsert and $inc.
For more detailed explanations check the following resources:
MongoDB Analytics - videos
http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics
http://www.mongodb.org/display/DOCS/Use+Cases
http://www.slideshare.net/jrosoff/scalable-event-analytics-with-mongodb-ruby-on-rails
http://nosql.mypopescu.com/post/3508305955/fast-asynchronous-analytics-with-mongodb
http://blog.opengovernment.org/2011/02/24/fast-asynchronous-analytics-with-mongodb/
http://blog.10gen.com/post/4416876632/london-startup-ubervu-on-storing-5tb-of-data-in-mongodb
这在很大程度上取决于您的领域,大多数情况下人们可能会选择 Mongo。
例如 http://square.github.com/cube/ 是基于 Mongo 构建的。
Cassandra 的大多数用例都源于需求,高可用性,这是它的主要功能AFAIK。您的需求似乎集中在以一种廉价的方式将可查询数据推送到横向扩展数据库中,而 mongo 在查询方面几乎与 RDBMS 匹配。 Mongo 可能也更容易处理。
It depends a lot on your domain, most cases one would probably choose Mongo.
For example http://square.github.com/cube/ is built on Mongo.
Most use cases of Cassandra draw from the need oh high availability that's the main feature of it afaik. Your needs seem to be centered around having a cheap way to shove queryable data in a scale-out DB, and mongo almost matches RDBMS in regards to querying. Mongo is also probably easier to deal with.
我认为 cassandra 很适合解决这个问题。
只要有您选择的语言的客户端库,您不需要了解太多 java 就可以运行它(除了安装 java 之外)。
Cassandra 0.8+ 现在具有原子计数器支持 - 非常适合展示次数/点击跟踪。
您还可以在 cassandra 之上运行 hadoop,为您提供一个经过验证的平台,用于将 MapReduce 作业写入进行分析/聚合并将结果也存储回 Cassandra。
查看有关 cassandra 和 hadoop 的幻灯片: http://www.slideshare.net/jeromatron/cassandrahadoop- 4399672
我希望有所帮助。
I think cassandra is a good fit for this problem.
You don't need to know much java to get it running (other than install java), as long as there is a client library in your chosen language.
Cassandra 0.8+ now has atomic counter support - perfect for impressions/click tracking.
You could also run hadoop on top of cassandra, giving you a proven platform for writing map reduce jobs to do analytics/aggregations and store the results back to Cassandra too.
Check out this slideshow about cassandra and hadoop: http://www.slideshare.net/jeromatron/cassandrahadoop-4399672
I hope that helps.