Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 11 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(6)
我刚刚完成了对几个类似数据库的审查。由于不同的原因,我最终选择了 Mongo。 Riak 和 Cassandra 都是 Amazon Dynamo 的实现,它们都可以在这方面做得很好。在 Riak 站点,他们对 Riak 和其他一些数据库进行了很好的比较。对于您的具体问题,我认为 Riak 和 Cassandra 都可以使用 Riak 提交的矢量时钟和 Cassandra 处理冲突的时间戳来处理任何节点上的写入。
除此之外,您还有其他一些可能有意义的选择:
我不确定这是否是一个完整的答案。我的搜索花费了几周的时间和大约 50 页的笔记,但如果大的、分布式的和安全的写入是重要的标准,那么这应该会推动你前进。
I just finished my review of several similar databases. I ended up with Mongo for different reasons. Riak and Cassandra are both implementations of Amazon's Dynamo, which could each do a good job of that. At the Riak site, they have good comparisons of Riak and a few other databases. For your specific question, I think both Riak and Cassandra handle writes on any node with a vector clock for Riak's commits, and a timestamp for Cassandra's to handle conflicts.
Other than that, you have a few other choices that may make sense:
I'm not sure that's a complete answer. My search took several weeks and about 50 pages of notes, but if large, distributed, and safe writes are the big criteria, that should move you along.
如果您担心单点故障:
MongoDB 使用副本集来分发读取,使用分片来分发写入。为了实现您想要的目标,您可以对系统进行分片,每个分片都是一个副本集。如果分片中的主节点死亡,则会自动选举一个新的主节点,因此不是单点故障。
注意:MongoDB不支持多主复制
If your concern is about a single point of failure:
MongoDB uses replicasets for distributing reads and sharding for distributing writes. To achieve what you are looking for you can shard your system with each shard being a replica set. If your primary in a shard dies then a new primary is automatically elected and hence is not a single point of failure.
Note: MongoDB does not support multi-master replication
取决于您想要如何分发写入。
分片:如果您希望在键上分发写入,MongoDB 有一个很棒的自动分片功能。为了实现冗余,您可以创建多个副本(主从)对,然后通过中央服务 (mongos) 为每个副本分配一个键范围。读取将按键范围静态分布。
多主:
如果您的系统足够小(GB,而不是 TB),CouchDB 具有更复杂的合并复制方案之一,并且专为在节点发生故障时快速、可靠地恢复而构建。使用 CouchDB,每个节点都有完整的数据副本,并且集群中的所有节点都可以写入和读取。
如果您每小时提取数百万行,Cassandra 使用基于对等的复制方案,如果您愿意在读取性能上付出一点努力,该方案将允许您将写入规模远远超出 CouchDB。
HBase 还可以扩展写入和读取,但更适合面向批量的写入功能(加载日志文件),因为它位于 HDFS 上,并且写入需要接近最小块大小(64MB、128MB... .) 在写入可以提交到磁盘之前。
希望这有帮助。
Depends on how you want to distribute your writes.
Sharding: If you are looking to distribute writes on a key, MongoDB has a great auto-sharding feature. For redundancy, you would create multiple replica (master-slave) pairs and then assign each of them a key range through a central service (mongos). Reads would be distributed statically by key range.
Multi-Master:
If you're system is small enough (GB, not TB), CouchDB has one of the more sophisticated merge-replication schemes and is built for fast, reliable recover in the event of node failure. With CouchDB, every node has a complete copy of the data, and all nodes in a cluster can be both writable and readable.
If you are pulling in millions of rows per hour, Cassandra uses a peer-based replication scheme that will allow you to scale writes far beyond CouchDB if you're willing to give a little on the read performance.
HBase also scales writes and reads, but is better-suited to a batch-oriented write function (loading log files), as it sits on HDFS and writes need to be close to the minimum block size (64MB, 128MB...) before a write can be committed to disk.
Hope this helps.
我是 couchdb 的粉丝
抱歉,我还没来得及详细介绍这一点就被切断了。
1) 首先,couch 很容易在地理上分布——您可以通过 http 与它通信,这对于分布式项目来说非常有用。
2) Couch 内置了复制功能。
更好的是,您可能会发现 bigcouch 更合适因为它是专门为集群而设计的。
我花了几周时间评估 Mongo / Cassandra / Couch 等人,并认为总的来说,Couch 非常适合各种应用。
我想您还应该查看 Amazon Simple DB。当谈到分布式最终一致数据库时,它确实符合要求。几年来我一直在许多项目中使用它,它的作用正如其所言。我唯一担心的是,您基本上将所有数据放入第三方的黑匣子中......但它确实有效,可以扩展并勾选您的所有框。
希望这有助于充实一些内容。
I'm a fan of couchdb
Sorry, I got cut off before I could expand on this.
1) Firstly couch is easily geographically distributed - you talk to it over http which is great for distributed projects.
2) Couch has replication built in.
Better yet, you may find that bigcouch is even more suitable as it is specifically designed with clustering in mind.
I spent several weeks evaluating Mongo / Cassandra / Couch et al and decided that on balance, for a wide range of applications, Couch is well suited.
I suppose you should also be looking at Amazon Simple DB. When it comes to distributed eventually consistent databases, it certainly fits the bill. I've been using it on a number of projects for a couple of years and it does what it says on the tin. My only concern is that you are basically putting all your data into a third party's black box ... but it certainly works, scales and ticks all of your boxes.
Hope that helps flesh things out a bit.
您可以使用 CloudTran 这样的产品来处理跨常见数据库(如 MySQL、Oracle、SQL Server 等)的非常快速的分布式事务。
You can use a product like CloudTran to handle very fast distributed transactions across common databases like MySQL, Oracle, SQL Server, etc.
这是NuoDB的设计目标之一,今天的产品也做到了这一点。
您可以跨多个数据中心以事务方式读取(查询)、写入(插入、更新、删除)或执行任何其他操作,就像数据库位于单个位置一样。 NuoDB 是真正一致的,而不是最终一致的。它使用乐观异步消息传递和分布式版本控制来保证 ACID 事务。而且NuoDB对标准SQL有丰富的支持。
This is one of the design goals of NuoDB, and the product does this today.
You can read (QUERY), write (INSERT, UPDATE, DELETE), or do anything else transactionally across multiple datacenters as though the database is in a single location. NuoDB is truly consistent, not eventually consistent. It guarantees ACID transactions using optimistic asynchronous messaging and distributed versioning. And NuoDB has rich support for standard SQL.