选择数据库技术

发布于 2024-08-18 17:46:30 字数 412 浏览 5 评论 0原文

我们正在着手构建一个在线平台(API、服务器、数据、Wahoo!)。对于上下文,假设我们需要构建类似 Twitter 的东西,但评论(推文)是围绕现场活动组织的。有关现场活动本身的信息必须尽可能快速且一致地传递给客户,而有关活动的评论可能需要等待更长的时间才能传递。现场活动结束后,我们将进行大量阅读。

可扩展性非常重要。我们希望从租用 VPS 切片开始,并从那里开始扩展。我是云的忠实粉丝,并希望尽可能长时间地留在云中。我们可能会使用红宝石。

我确信我想尝试文档存储而不是 RDBMS。我喜欢无模式存储的想法,以及通过关注键值来实现更轻松的可扩展性的承诺。

问题是我不知道哪种技术最适合我们的平台。我研究过 Couch、Mongo、Tokyo Cabinet、Cassandra 和带有 blobed 文档的 RDBMS。有帮助为这项特定工作选择合适的工具吗?

We're setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live event. Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

Scalability is very important. We want to start out renting VPS slices, and scale from there. I'm a big fan of the cloud, and would like to remain there as long as possible. We'll probably be using ruby.

I'm convinced that I want to try a document store instead of an RDBMS. I like the idea of schema-less storage and the promises of easier scalability by focusing on key-value.

The problem is I don't know which technology is the most appropriate for our platform. I've looked at Couch, Mongo, Tokyo Cabinet, Cassandra, and an RDBMS with blobbed documents. Any help picking the right tool for this particular job?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

枫林﹌晚霞¤ 2024-08-25 17:46:30

查看 BJ Clark 的 NO SQL 替代方案比较

可扩展性非常重要。

然后您需要考虑他的博客的摘录:

  1. Tokyo Cabinet - 不扩展
  2. Redis - 不扩展
  3. Project Voldemort - 扩展
  4. MongoDB - 限制(分片已实施)
  5. Cassandra - 扩展
  6. Amazon S3 - 扩展
  7. Couch - Doesn 't 规模集群 和复制)
  8. MySQL - 不规模

并考虑HyperTable。这也是 No-SQL 替代方案中的有力竞争者。它是 Google BigTable 概念的开源实现。
我相信它的扩展性很好,因为它被中国搜索引擎百度和娱乐门户 Rediff 广泛使用。

你是说:

有关现场活动的信息
其本身必须作为
尽可能快速且一致,
而有关该事件的评论可以
可能要等一段时间
发表。之后我们会大量阅读
现场活动结束。

这有点像 Twitter 的做法。您的编程语言选择也非常重要,因为 Twitter 最初使用 Ruby 进行后端消息传递,但是 他们说这不是一个正确的选择,他们已将整个消息传递系统移至 Scala 语言。

他们仍在使用 Ruby 作为前端。如果您想要使用非常适合可扩展环境的高度可靠、容错的系统,那么您应该考虑 ScalaErlang

Checkout the NO SQL alternatives comparison by BJ Clark.

Scalability is very important.

Then you need to consider the excerpts from his blog:

  1. Tokyo Cabinet - Doesn't scale
  2. Redis - Doesn't scale
  3. Project Voldemort - scales
  4. MongoDB - limted (sharding is been implemented)
  5. Cassandra - scales
  6. Amazon S3 - scales
  7. Couch - Doesn't scale (Clustering & replication)
  8. MySQL - Doesn't scale

And consider HyperTable. This is also a serious contender in No-SQL alternatives. It's an open source implementation of Google's BigTable concept.
I believe it scales well because it's extensively used by the Chinese search engine Baidu and entertainment portal Rediff.

You were saying:

Information about the live event
itself must be delivered to clients as
fast and consistently as possible,
while comments about the event can
probably wait a bit longer to be
delivered. We'll be read-heavy after
the live event finishes.

This is something like Twitter's approach. Your programming language selection is also very important, because Twitter initially went with Ruby for back-end message delivery but they were saying it's not a correct choice and they have moved the entire message delivery system to the Scala language.

They are still using Ruby for their front-end. If you want to go with a highly reliable, fault tolerant system that is well suited for scalable environments, then you should consider Scala or Erlang.

甩你一脸翔 2024-08-25 17:46:30

Ramesh 有一个很好的总结。我想补充一点,Cassandra 拥有比普通 Dynamo 克隆(如 Voldemort 或 Dynomite)更丰富的数据模型:具有命名、排序列的行,而不仅仅是键/值。 Cassandra 被 Twitter、Mahalo、Ooyala、SimpleGeo、WebEx 和其他公司使用 ( http://n2.nabble.com/Cassandra-users-survey-td4040068.html),至少其中一些在 EC2 或机架空间云服务器上运行 Cassandra 集群。

Ramesh has a good summary. I would add that Cassandra has a richer data model than vanilla Dynamo clones (like Voldemort or Dynomite): rows with named, sorted columns rather than just key/value. Cassandra is being used by Twitter, Mahalo, Ooyala, SimpleGeo, WebEx, and others (http://n2.nabble.com/Cassandra-users-survey-td4040068.html), at least some of which are running Cassandra clusters on EC2 or rackspace cloud servers.

余罪 2024-08-25 17:46:30

如果您想水平扩展(将数据分布在多个节点上),则必须考虑 CAP 定理。

http://www.julianbrowne.com/article/viewer/brewers-cap-定理

这不是一件容易的事,但你必须选择,总是存在某种权衡。

If you want to scale horizontally (distribute your data over more than one node) you have to take the CAP theorem into account.

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

It is not easy stuff but you have to choose, there is always some kind of trade off.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文