选择数据库技术
我们正在着手构建一个在线平台(API、服务器、数据、Wahoo!)。对于上下文,假设我们需要构建类似 Twitter 的东西,但评论(推文)是围绕现场活动组织的。有关现场活动本身的信息必须尽可能快速且一致地传递给客户,而有关活动的评论可能需要等待更长的时间才能传递。现场活动结束后,我们将进行大量阅读。
可扩展性非常重要。我们希望从租用 VPS 切片开始,并从那里开始扩展。我是云的忠实粉丝,并希望尽可能长时间地留在云中。我们可能会使用红宝石。
我确信我想尝试文档存储而不是 RDBMS。我喜欢无模式存储的想法,以及通过关注键值来实现更轻松的可扩展性的承诺。
问题是我不知道哪种技术最适合我们的平台。我研究过 Couch、Mongo、Tokyo Cabinet、Cassandra 和带有 blobed 文档的 RDBMS。有帮助为这项特定工作选择合适的工具吗?
We're setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live event. Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.
Scalability is very important. We want to start out renting VPS slices, and scale from there. I'm a big fan of the cloud, and would like to remain there as long as possible. We'll probably be using ruby.
I'm convinced that I want to try a document store instead of an RDBMS. I like the idea of schema-less storage and the promises of easier scalability by focusing on key-value.
The problem is I don't know which technology is the most appropriate for our platform. I've looked at Couch, Mongo, Tokyo Cabinet, Cassandra, and an RDBMS with blobbed documents. Any help picking the right tool for this particular job?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
查看 BJ Clark 的 NO SQL 替代方案比较。
然后您需要考虑他的博客的摘录:
Doesn 't 规模(集群 和复制)并考虑HyperTable。这也是 No-SQL 替代方案中的有力竞争者。它是 Google BigTable 概念的开源实现。
我相信它的扩展性很好,因为它被中国搜索引擎百度和娱乐门户 Rediff 广泛使用。
你是说:
这有点像 Twitter 的做法。您的编程语言选择也非常重要,因为 Twitter 最初使用 Ruby 进行后端消息传递,但是 他们说这不是一个正确的选择,他们已将整个消息传递系统移至 Scala 语言。
他们仍在使用 Ruby 作为前端。如果您想要使用非常适合可扩展环境的高度可靠、容错的系统,那么您应该考虑 Scala 或 Erlang。
Checkout the NO SQL alternatives comparison by BJ Clark.
Then you need to consider the excerpts from his blog:
Doesn't scale(Clustering & replication)And consider HyperTable. This is also a serious contender in No-SQL alternatives. It's an open source implementation of Google's BigTable concept.
I believe it scales well because it's extensively used by the Chinese search engine Baidu and entertainment portal Rediff.
You were saying:
This is something like Twitter's approach. Your programming language selection is also very important, because Twitter initially went with Ruby for back-end message delivery but they were saying it's not a correct choice and they have moved the entire message delivery system to the Scala language.
They are still using Ruby for their front-end. If you want to go with a highly reliable, fault tolerant system that is well suited for scalable environments, then you should consider Scala or Erlang.
Ramesh 有一个很好的总结。我想补充一点,Cassandra 拥有比普通 Dynamo 克隆(如 Voldemort 或 Dynomite)更丰富的数据模型:具有命名、排序列的行,而不仅仅是键/值。 Cassandra 被 Twitter、Mahalo、Ooyala、SimpleGeo、WebEx 和其他公司使用 ( http://n2.nabble.com/Cassandra-users-survey-td4040068.html),至少其中一些在 EC2 或机架空间云服务器上运行 Cassandra 集群。
Ramesh has a good summary. I would add that Cassandra has a richer data model than vanilla Dynamo clones (like Voldemort or Dynomite): rows with named, sorted columns rather than just key/value. Cassandra is being used by Twitter, Mahalo, Ooyala, SimpleGeo, WebEx, and others (http://n2.nabble.com/Cassandra-users-survey-td4040068.html), at least some of which are running Cassandra clusters on EC2 or rackspace cloud servers.
如果您想水平扩展(将数据分布在多个节点上),则必须考虑 CAP 定理。
http://www.julianbrowne.com/article/viewer/brewers-cap-定理
这不是一件容易的事,但你必须选择,总是存在某种权衡。
If you want to scale horizontally (distribute your data over more than one node) you have to take the CAP theorem into account.
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
It is not easy stuff but you have to choose, there is always some kind of trade off.