什么时候不应该使用 Cassandra?

发布于 2024-08-29 00:01:05 字数 1431 浏览 6 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

熟人话多 2024-09-05 00:01:05

没有什么比灵丹妙药更好的了,一切都是为了解决特定问题而构建的,并且都有自己的优点和缺点。这取决于您,您有什么问题陈述以及该问题的最佳解决方案是什么。

我会尽力按照您提问的顺序一一回答您的问题。由于 Cassandra 基于 NoSQL 系列数据库,因此在我回答您的问题之前,您必须了解为什么使用 NoSQL 数据库。

为什么使用 NoSQL

对于 RDBMS,做出选择非常容易,因为此类中的所有数据库(如 MySQL、Oracle、MS SQL、PostgreSQL)都提供几乎相同类型的面向 ACID 属性的解决方案。当谈到 NoSQL 时,决策变得很困难,因为每个 NoSQL 数据库都提供不同的解决方案,您必须了解哪一个最适合您的应用程序/系统要求。例如,MongoDB 适合您的系统需要无模式文档存储的用例。 HBase 可能适合搜索引擎、分析日志数据或任何需要扫描大型二维无连接表的地方。 Redis 旨在为树、队列、链表等各种数据结构提供内存中搜索,并且非常适合制作实时排行榜、发布-订阅类型的系统。同样,该类别中还有其他数据库(包括 Cassandra),它们适合不同的问题陈述。现在我们回到原来的问题,一一回答。

何时使用 Cassandra

作为 NoSQL 系列的一部分,Cassandra 提供了一种解决方案,可解决以下问题:您的要求之一是拥有一个非常繁重的写入系统,并且您希望在顶部拥有一个反应灵敏的报告系统所存储的数据。考虑 Web 分析的用例,其中为每个请求存储日志数据,并且您希望围绕它构建一个分析平台,以实时方式按浏览器、按 IP 等计算每小时的点击次数。您可以参考这个< /a> 博客文章,了解有关 Cassandra 适合的用例的更多信息。

何时使用 RDMS 而不是 Cassandra

Cassandra 基于 NoSQL 数据库,不提供 ACID 和关系数据属性。如果您对 ACID 属性(例如财务数据)有强烈要求,那么 Cassandra 不适合这种情况。显然,您可以为此制定解决方法,但是您最终将编写大量应用程序代码来模拟 ACID 属性,并且会严重浪费上市时间。此外,使用 Cassandra 管理此类系统对您来说将是复杂而乏味的。

何时不使用 Cassandra

如果上述解释有意义,我认为不需要回答。

There is nothing like a silver bullet, everything is built to solve specific problems and has its own pros and cons. It is up to you, what problem statement you have and what is the best fitting solution for that problem.

I will try to answer your questions one by one in the same order you asked them. Since Cassandra is based on the NoSQL family of databases, it's important you understand why use a NoSQL database before I answer your questions.

Why use NoSQL

In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.

When to use Cassandra

Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to this blog post to understand more about the use cases where Cassandra fits in.

When to Use a RDMS instead of Cassandra

Cassandra is based on a NoSQL database and does not provide ACID and relational data properties. If you have a strong requirement for ACID properties (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make a workaround for that, however you will end up writing lots of application code to simulate ACID properties and will lose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.

When not to use Cassandra

I don't think it needs to be answered if the above explanation makes sense.

誰認得朕 2024-09-05 00:01:05

在评估分布式数据系统时,您必须考虑 CAP 定理 - 您可以选择以下两个:一致性、可用性和分区容错性。

Cassandra 是一个可用的分区容忍系统,支持最终一致性。有关更多信息,请参阅我写的这篇博客文章:NoSQL 系统视觉指南

When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.

Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see this blog post I wrote: Visual Guide to NoSQL Systems.

娜些时光,永不杰束 2024-09-05 00:01:05

Cassandra 是一个特定问题的答案:当您有如此多的数据以至于一台服务器无法容纳时,您会怎么做?如何将所有数据存储在许多服务器上,同时不破坏您的银行账户,也不让您的开发人员发疯? Facebook 每天都会获得 4 TB 的新压缩数据。而且这个数字很可能在一年内增长两倍以上。

如果您没有这么多数据,或者您有数百万美元用于企业 Oracle/DB2 集群安装以及设置和维护它所需的专家,那么您可以选择 SQL 数据库。

然而,Facebook 不再使用 cassandra,现在几乎完全使用 MySQL,将应用程序堆栈中的分区向上移动,以获得更快的性能和更好的控制。

Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.

If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.

However Facebook no longer uses cassandra and now uses MySQL almost exclusively moving the partitioning up in the application stack for faster performance and better control.

简单气质女生网名 2024-09-05 00:01:05

NoSQL 的总体思想是您应该使用最适合您的应用程序的数据存储。如果您有财务数据表,请使用 SQL。如果您的对象需要复杂/缓慢的查询才能映射到关系模式,请使用对象或键/值存储。

当然,您遇到的任何现实世界问题都介于这两个极端之间,并且这两种解决方案都不是完美的。您需要考虑每个商店的功能以及使用一个商店相对于另一个商店的后果,这将非常具体地针对您要解决的问题。

The general idea of NoSQL is that you should use whichever data store is the best fit for your application. If you have a table of financial data, use SQL. If you have objects that would require complex/slow queries to map to a relational schema, use an object or key/value store.

Of course just about any real world problem you run into is somewhere in between those two extremes and neither solution will be perfect. You need to consider the capabilities of each store and the consequences of using one over the other, which will be very much specific to the problem you are trying to solve.

梦年海沫深 2024-09-05 00:01:05

除了上面给出的关于何时使用和何时不使用 Cassandra 的答案之外,如果您确实决定使用 Cassandra,您可能需要考虑不使用 Cassandra 本身,而是使用它的众多表兄弟之一。

上面的一些答案已经指出了各种“NoSQL”系统,它们与 Cassandra 共享许多属性,有一些或大或小的差异,并且可能比 Cassandra 本身更适合您的特定需求。

此外,最近(这个问题最初提出几年后),一个名为 Scylla 的 Cassandra 克隆(参见 https ://en.wikipedia.org/wiki/Scylla_(database))已发布。 Scylla 是 Cassandra 在 C++ 中的开源重新实现,它声称比原始 Java Cassandra 具有更高的吞吐量和更低的延迟,同时在很大程度上与其兼容(在功能、API 和文件格式方面)。因此,如果您已经在考虑 Cassandra,那么您可能也想考虑 Scylla。

Besides the answers given above about when to use and when not to use Cassandra, if you do decide to use Cassandra you may want to consider not using Cassandra itself, but one of the its many cousins out there.

Some answers above already pointed to various "NoSQL" systems which share many properties with Cassandra, with some small or large differences, and may be better than Cassandra itself for your specific needs.

Additionally, recently (several years after this question was originally asked), a Cassandra clone called Scylla (see https://en.wikipedia.org/wiki/Scylla_(database)) was released. Scylla is an open-source re-implementation of Cassandra in C++, which claims to have significantly higher throughput and lower latencies than the original Java Cassandra, while being mostly compatible with it (in features, APIs, and file formats). So if you're already considering Cassandra, you may want to consider Scylla as well.

撞了怀 2024-09-05 00:01:05

我将在这里重点关注一些重要方面,这些方面可以帮助您决定是否真的需要 Cassandra。该列表并不详尽,只是我想到的一些要点 -

  • 当您对关系(跨数据集)有严格要求时,不要将 Cassandra 视为首选。

  • Cassandra默认是AP系统(CAP)。但是,它支持可调一致性,这意味着它也可以配置为支持 CP。 因此,不要仅仅因为您在某处读到它是 AP 并且您正在寻找 CP 系统而忽略它。Cassandra 更准确地称为“可调一致”,这意味着它可以让您轻松决定 AP 系统的级别您需要的一致性,与可用性级别的平衡。

  • 如果您的规模不大或者您可以处理非分布式数据库,请不要使用 Cassandra。

  • 如果您的团队认为使用 Cassandra 这样的分布式数据库就可以解决所有问题,请多加思考。使用这些数据库非常简单,因为它具有许多默认值,但优化和掌握它以解决特定问题将需要大量(如果不是很多)工程工作。

  • Cassandra 是面向列的,但同时每一行也有一个唯一的键。因此,将其视为索引的、面向行的存储可能会有所帮助。 您甚至可以将其用作文档存储。

  • Cassandra 不会强制您预先定义字段。因此,如果您处于启动模式或您的功能正在不断发展(如敏捷) - Cassandra 会拥抱它。因此,更好的是,首先考虑查询,然后考虑数据来回答它们。

  • Cassandra 针对真正的高写入吞吐量进行了优化。 如果您的用例是读取密集型(例如缓存),那么 Cassandra 可能不是理想的选择。

I will focus here on some of the important aspects which can help you to decide if you really need Cassandra. The list is not exhaustive, just some of the points which I have at top of my mind-

  • Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset).

  • Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems. Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.

  • Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB.

  • Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort.

  • Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store.

  • Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them.

  • Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.

海的爱人是光 2024-09-05 00:01:05

正确的。当您拥有大量数据、大量查询但查询种类很少时,使用 Cassandra 是有意义的。 Cassandra 基本上通过分区和复制来工作。如果您的所有查询都基于相同的分区键,那么 Cassandra 是您的最佳选择。如果您收到的查询不是分区键的属性,Cassandra 允许您使用新的分区键复制整个数据。所以现在您有相同数据的 2 个副本,具有 2 个不同的分区键。

这让我想到你的下一个问题。当使用 Cassandra 时。正如我所提到的,Cassandra 通过为每个新分区键复制完整数据库来进行扩展。但你不能一次又一次地制作新副本。因此,当查询种类繁多时,即每个查询在 where 子句中都有不同的列,Cassandra 不是一个好的选择。

现在第三个问题。使用 RDBMS 的全部意义在于当您需要 ACID 属性时。如果您正在构建支付服务之类的东西,并希望每笔交易都被隔离,每笔交易要么完成,要么根本不发生,尽管系统出现故障,更改仍将持续存在,并且交易前后银行帐户中的资金保持一致完成后,RDBMS 是帮助您实现这一目标的唯一选择。

这篇文章实际上解释了整个事情,特别是何时使用 Cassandra 或不使用(而不是其他一些 NoSQL 选项)问题的一部分 -> 选择最好的数据库。一定要检查一下。

编辑:为了回答proximab评论中的问题,当我们想到银行系统时,我们立即认为“ACID是最好的解决方案”。但即使银行系统也是由多个子系统组成,这些子系统甚至可能不处理任何与交易相关的数据,例如账户持有人的个人信息、账户对账单、信用卡详细信息、信用记录等。

所有这些信息都需要存储在某个数据库中或另一个。现在,如果您存储帐户相关信息(例如帐户余额),则需要始终保持一致。例如,如果您尝试从账户 A 向账户 B 转账,那么从账户 A 消失的资金应该立即出现在账户 B 中,并且不能同时存在于两个账户中。这个系统在任何时候都不能不一致。这就是 ACID 最重要的地方。

另一方面,如果您要保存信用卡详细信息或信用记录,并且不应该落入坏人之手,那么您需要只允许授权用户访问的东西。我相信 Cassandra 也支持这一点。也就是说,像信用记录和信用卡交易这样的数据,我认为这是一个不断增加的数据。此外,您只能查询这些数据,即它的查询数量非常有限。这两个条件使 Cassandra 成为完美的解决方案。

Right. It makes sense to use Cassandra when you have a huge amount of data, a huge number of queries but very little variety of queries. Cassandra basically works by partitioning and replicating. If all your queries will be based on the same partition key, Cassandra is your best bet. If you get a query on an attribute that is not the partition key, Cassandra allows you to replicate the whole data with a new partition key. So now you have 2 replicas of the same data with 2 different partition keys.

Which brings me to your next question. When not to use Cassandra. As I mentioned, Cassandra scales by replicating the complete database for every new partitioning key. But you can't keep making new copies again and again. So when you have a high variety in queries i.e. each query has a different column in the where clause, Cassandra is not a good option.

Now for the third question. The whole point of using RDBMS is when you want the ACID properties. If you are building something like a payment service and want each transaction to be isolated, each transaction to either complete or not happen at all, changes to be persistent despite system failure, and the money to be consistent across bank accounts before and after the transaction completes, an RDBMS is the only option that will help you achieve this.

This article actually explains the whole thing, especially when to use Cassandra or not (as opposed to some other NoSQL option) part of the question -> Choosing the best Database. Do check it out.

EDIT: To answer the question in the comments by proximab, when we think of banking systems we immidiately think "ACID is the best solution". But even banking systems are made up of several subsystems that might not even be dealing with any transaction related data like account holder's personal information, account statements, credit card details, credit histories, etc.

All of this information needs to be stored in some database or the another. Now if you store the account related information like account balance, that is something that needs to be consistent at all times. For example, if you try to send money from account A to account B, then the money that disappears from account A should instantaneousy show up in account B, and it cannot be present in both accounts at the same time. This system cannot be inconsistant at any point. This is where ACID is of utmost importance.

On the other hand if you are saving credit card details or credit histories, that should not get into the wrong hands, then you need something that allows access only to authorised users. That I believe is supported by Cassandra. That said, data like credit history and credit card transactions, I think that is an ever increasing data. Also there is only so much yo can query on this data i.e. it has a very finite number of queries. These two conditions make Cassandra a perfect solution.

他不在意 2024-09-05 00:01:05

与正在部署 Cassandra 的人交谈,它不能很好地处理多对多。他们正在做一些黑客工作来进行初步测试。我与一位 Cassandra 顾问就此进行了交谈,他说如果您遇到此问题,他不会推荐它。

Talking with someone in the midst of deploying Cassandra, it doesn't handle the many-to-many well. They are doing a hack job to do their initial testing. I spoke with a Cassandra consultant about this and he said he wouldn't recommend it if you had this problem set.

怎言笑 2024-09-05 00:01:05

您应该问自己以下问题:

  1. (音量,速度)您是否会编写和阅读大量信息,如此多的信息以至于没有一台计算机可以处理写入。
  2. (全球) 您是否需要在世界范围内拥有这种写入和读取功能,以便世界某个地方的写入内容可以在世界另一个地方访问?
  3. (可靠性) 您是否需要这个数据库始终启动并运行,并且无论在哪个云、哪个国家、无论是虚拟机、容器还是裸机,都永远不会宕机?
  4. (可扩展性) 您是否需要此数据库能够继续轻松增长并线性扩展
  5. (一致性) 您是否需要 TUNABLE 一致性,其中某些写入可以异步发生其他人需要认证吗?
  6. (技能)您是否愿意学习这项技术和数据建模,以创建一个可以为世界各地的每个人提供快速服务的全球分布式数据库?

如果对于这些问题中的任何一个您认为“也许”或“否”,您应该使用其他方法。如果您对所有这些问题的回答都是“是的”,那么您应该使用 Cassandra。

当您可以在一台机器上完成所有操作时,请使用 RDBMS。它可能比大多数人都容易,任何人都可以使用它。

You should ask your self the following questions:

  1. (Volume, Velocity) Will you be writing and reading TONS of information , so much information that no one computer could handle the writes.
  2. (Global) Will you need this writing and reading capability around the world so that the writes in one part of the world are accessible in another part of the world?
  3. (Reliability) Do you need this database to be up and running all the time and never go down regardless of which Cloud, which country, whether it's VM , Container, or Bare metal?
  4. (Scale-ability) Do you need this database to be able to continue to grow easily and scale linearly
  5. (Consistency) Do you need TUNABLE consistency where some writes can happen asynchronously where as others need to be certified?
  6. (Skill) Are you willing to do what it takes to learn this technology and the data modeling that goes with creating a globally distributed database that can be fast for everyone, everywhere?

If for any of these questions you thought "maybe" or "no," you should use something else. If you had "hell yes" as an answer to all of them, then you should use Cassandra.

Use RDBMS when you can do everything on one box. It's probably easier than most and anyone can work with it.

々眼睛长脚气 2024-09-05 00:01:05

除了此处的其他答案之外,繁重的单个查询与无数的轻量查询负载是另一个需要考虑的点。在 NoSql 风格的数据库中自动优化单个查询本质上更困难。我使用过 MongoDB,并在尝试计算复杂查询时遇到了性能问题。我没有使用过 Cassandra,但我预计它也会有同样的问题。

另一方面,如果您的负载预计是很多小型查询的负载,并且您希望能够轻松横向扩展,则可以利用大多数 NoSql DB 提供的最终一致性。请注意,最终一致性实际上并不是非关系数据模型的一个特性,但它在基于 NoSql 的系统中更容易实现和设置。

对于单个非常繁重的查询,任何现代 RDBMS 引擎都可以很好地并行化查询的各个部分,并充分利用您投入的 CPU 和内存(在单台机器上)。 NoSql 数据库没有足够的有关数据结构的信息,无法做出允许大查询真正智能并行化的假设。它们确实允许您轻松扩展更多服务器(或核心),但是一旦查询达到复杂程度,您基本上就被迫手动将其拆分为 NoSql 引擎知道如何智能处理的部分。

根据我使用 MongoDB 的经验,最终由于查询的复杂性,Mongo 无法对其进行太多优化并在多个数据上运行部分查询。 Mongo 可以并行化多个查询,但不太擅长优化单个查询。

Heavy single query vs. gazillion light query load is another point to consider, in addition to other answers here. It's inherently harder to automatically optimize a single query in a NoSql-style DB. I've used MongoDB and ran into performance issues when trying to calculate a complex query. I haven't used Cassandra but I expect it to have the same issue.

On the other hand, if your load is expected to be that of very many small queries, and you want to be able to easily scale out, you could take advantage of eventual consistency that is offered by most NoSql DBs. Note that eventual consistency is not really a feature of a non-relational data model, but it is much easier to implement and to set up in a NoSql-based system.

For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.

In my experience with MongoDB, in the end because of the complexity of the query there wasn't much Mongo could do to optimize it and run parts of it on multiple data. Mongo parallelizes multiple queries but isn't so good at optimizing a single one.

我不会写诗 2024-09-05 00:01:05

让我们阅读一些现实世界的案例:

http://planetcassandra.org/apache-cassandra-use-案例/

在本文中:http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data- in-apache-cassandra

他们阐述了不选择MySql的原因是因为db同步太慢。

(也是由于2-phrase commit,FK,PK)


Cassandra 基于 Amazon Dynamo 论文

特点:

稳定性

高可用性

备份性能良好

读写优于 HBase(Java 中的 BigTable 克隆)。

wiki http://en.wikipedia.org/wiki/Apache_Cassandra

他们的结论< /strong> 是:

We looked at HBase, Dynamo, Mongo and Cassandra. 

Cassandra was simply the best storage solution for the majority of our data.

从 2018 年开始,

如果您需要支持,我建议使用 ScyllaDB 来替换经典的 cassandra。

Postgres kv 插件也比 cassandra 快。怎么可能不具备多实例可扩展性。

Let's read some real world cases:

http://planetcassandra.org/apache-cassandra-use-cases/

In this article: http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra

They elaborated the reason why they didn't choose MySql is because db synchronization is too slow.

(Also due to 2-phrase commit, FK, PK)


Cassandra is based on Amazon Dynamo paper

Features:

Stability

High availability

Backup performs well

Read and Write is better than HBase, (BigTable clone in java).

wiki http://en.wikipedia.org/wiki/Apache_Cassandra

Their Conclusion is:

We looked at HBase, Dynamo, Mongo and Cassandra. 

Cassandra was simply the best storage solution for the majority of our data.

As of 2018,

I would recommend using ScyllaDB to replace classic cassandra, if you need back support.

Postgres kv plugin is also quick than cassandra. How ever won't have multi-instance scalability.

末が日狂欢 2024-09-05 00:01:05

另一种使选择更容易的情况是,当您想要使用聚合函数(如 sum、min、max 等)和复杂查询(如上面提到的金融系统)时,关系数据库可能比 nosql 数据库更方便,因为两者都是除非您使用大量倒排索引,否则在 nosql 数据库上不可能。当您确实使用 nosql 时,您必须在代码中执行聚合函数或将它们单独存储在自己的列族中,但这使得一切变得非常复杂,并降低了使用 nosql 获得的性能。

another situation that makes the choice easier is when you want to use aggregate function like sum, min, max, etcetera and complex queries (like in the financial system mentioned above) then a relational database is probably more convenient then a nosql database since both are not possible on a nosql databse unless you use really a lot of Inverted indexes. When you do use nosql you would have to do the aggregate functions in code or store them seperatly in its own columnfamily but this makes it all quite complex and reduces the performance that you gained by using nosql.

甚是思念 2024-09-05 00:01:05

如果满足以下条件,Cassandra 是一个不错的选择:

  1. 您不需要数据库的 ACID 属性。

  2. 数据库上将会有大量的写入。

  3. 需要与大数据、Hadoop、Hive 和 Spark 集成。

  4. 需要实时数据分析和报告生成。

  5. 有一个令人印象深刻的容错机制的要求。

  6. 有同质系统的要求。

  7. 需要进行大量的定制调整。

Cassandra is a good choice if:

  1. You don't require the ACID properties from your DB.

  2. There would be massive and huge number of writes on the DB.

  3. There is a requirement to integrate with Big Data, Hadoop, Hive and Spark.

  4. There is a need of real time data analytics and report generations.

  5. There is a requirement of impressive fault tolerant mechanism.

  6. There is a requirement of homogenous system.

  7. There is a requirement of lots of customisation for tuning.

洛阳烟雨空心柳 2024-09-05 00:01:05

如果您需要一个与 SQL 语义完全一致的数据库,Cassandra 不适合您。 Cassandra 支持键值查找。它不支持 SQL 查询。 Cassandra 中的数据“最终一致”。数据的并发查找可能不一致,但最终查找是一致的。

如果您需要严格的语义并需要支持 SQL 查询,请选择其他解决方案,例如 MySQL、PostGres,或将 Cassandra 与 Solr 结合使用。

If you need a fully consistent database with SQL semantics, Cassandra is NOT the solution for you. Cassandra supports key-value lookups. It does not support SQL queries. Data in Cassandra is "eventually consistent". Concurrent lookups of data may be inconsistent, but eventually lookups are consistent.

If you need strict semantics and need support for SQL queries, choose another solution such as MySQL, PostGres, or combine use of Cassandra with Solr.

画▽骨i 2024-09-05 00:01:05

Apache cassandra 是一个分布式数据库,用于管理跨多个商用服务器的大量结构化数据,同时提供高度可用的服务并且无单点故障。

该架构纯粹基于上限定理,即可用性和分区容错性,有趣的是最终是一致的。

如果您不跨集群机架存储大量数据,请不要使用它,
如果您不存储时间序列数据,请勿使用,
如果您没有对服务器进行分区,请勿使用,
如果您需要强一致性,请不要使用。

Apache cassandra is a distributed database for managing large amounts of structured data across many commodity servers, while providing highly available service and no single point of failure.

The archichecture is purely based on the cap theorem, which is availability , and partition tolerance, and interestingly eventual consistently.

Dont Use it, if your not storing volumes of data across racks of clusters,
Dont use if you are not storing Time series data,
Dont Use if you not patitioning your servers,
Dont use if you require strong Consistency.

携君以终年 2024-09-05 00:01:05

Mongodb 拥有非常强大的聚合函数和富有表现力的聚合框架。它具有开发人员习惯使用关系数据库领域的许多功能。例如,它的文档数据/存储结构允许比 Cassandra 更复杂的数据模型。

当然,所有这些都需要权衡。因此,当您选择数据库(NoSQL、NewSQL 或 RDBMS)时,请考虑您要解决的问题以及可扩展性需求。没有任何一个数据库可以做到这一切。

Mongodb has very powerful aggregate functions and an expressive aggregate framework. It has many of the features developers are accustomed to using from the relational database world. It's document data/storage structure allows for more complex data models than Cassandra, for example.

All this comes with trade-offs of course. So when you select your database (NoSQL, NewSQL, or RDBMS) look at what problem you are trying to solve and at your scalability needs. No one database does it all.

有木有妳兜一样 2024-09-05 00:01:05

根据 DataStax 的说法,当需要

1- 高端硬件设备时,Cassandra 并不是最佳用例。
2- 符合 ACID,无回滚(银行交易)

According to DataStax, Cassandra is not the best use case when there is a need for

1- High end hardware devices.
2- ACID compliant with no roll back (bank transaction)

放我走吧 2024-09-05 00:01:05
  • 它不支持跨域的完整事务管理
    表。
  • 不支持二级索引。
  • 必须依赖 Elastic search /Solr 进行二级索引,并且必须编写自定义同步组件。
  • 不符合 ACID 的系统。
  • 查询支持有限。
  • It does not support complete transaction management across the
    tables.
  • Secondary Index not supported.
  • Have to rely on Elastic search /Solr for Secondary index and the custom sync component has to be written.
  • Not ACID compliant system.
  • Query support is limited.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文