NoSQL 数据库无法处理的任务示例(如果有)
我想测试一下 NoSQL 世界。这只是好奇心,还不是绝对需要。 我读过一些有关 SQL 和 NoSQL 数据库之间差异的文章。我对潜在的优势深信不疑,但我有点担心 NoSQL 不适用的情况。如果我了解 NoSQL 数据库本质上会错过 ACID 属性。
有人可以举一个 ACID 关系数据库可以处理的现实世界操作(例如电子商务网站或科学应用程序,或者...)的示例,但是 NoSQL 数据库可能会严重失败,无论是系统性的还是某种类型的操作竞争条件或由于停电等?
完美的例子是,如果不修改数据库引擎就无法找到任何解决方法。 NoSQL 数据库表现不佳的例子最终将是另一个问题,但在这里我想看看理论上我们什么时候不能使用这种技术。
也许找到这样的例子是特定于数据库的。如果是这样的话,我们就以 MongoDB 来代表 NoSQL 世界吧。
编辑: 为了澄清这个问题,我不想争论哪种数据库对于某些情况更好。我想知道在某些情况下这项技术是否绝对是死胡同,因为无论我们如何努力尝试 SQL 数据库提供的某些功能都无法在 nosql 存储之上实现。 由于有许多 nosql 存储可用,我可以接受选择现有的 nosql 存储作为支持,但我最感兴趣的是存储应该提供的最小功能子集,以便能够实现更高级别的功能(例如事务可以使用不提供 X 的商店...)。
I would like to test the NoSQL world. This is just curiosity, not an absolute need (yet).
I have read a few things about the differences between SQL and NoSQL databases. I'm convinced about the potential advantages, but I'm a little worried about cases where NoSQL is not applicable. If I understand NoSQL databases essentially miss ACID properties.
Can someone give an example of some real world operation (for example an e-commerce site, or a scientific application, or...) that an ACID relational database can handle but where a NoSQL database could fail miserably, either systematically with some kind of race condition or because of a power outage, etc ?
The perfect example will be something where there can't be any workaround without modifying the database engine. Examples where a NoSQL database just performs poorly will eventually be another question, but here I would like to see when theoretically we just can't use such technology.
Maybe finding such an example is database specific. If this is the case, let's take MongoDB to represent the NoSQL world.
Edit:
to clarify this question I don't want a debate about which kind of database is better for certain cases. I want to know if this technology can be an absolute dead-end in some cases because no matter how hard we try some kind of features that a SQL database provide cannot be implemented on top of nosql stores.
Since there are many nosql stores available I can accept to pick an existing nosql store as a support but what interest me most is the minimum subset of features a store should provide to be able to implement higher level features (like can transactions be implemented with a store that don't provide X...).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这个问题有点像问什么样的程序不能用命令式/函数式语言编写。任何图灵完备的语言并表达可以通过图灵机解决的每个程序。问题是,作为一名程序员,您真的想用不可移植的机器指令为财富 500 强公司编写会计系统吗?
最后,NoSQL 可以做任何基于 SQL 的引擎可以做的事情,不同之处在于,作为程序员,您可能负责 MySQL 免费为您提供的类似 Redis 之类的逻辑。 SQL 数据库对数据完整性采取非常保守的看法。 NoSQL 运动放宽了这些标准,以获得更好的可扩展性,并使 Web 应用程序常见的任务变得更容易。
MongoDB(我目前的偏好)使复制和分片(水平扩展)变得容易,插入速度非常快,并且不需要严格的方案。作为交换,MongoDB 用户必须在索引不存在时围绕较慢的查询进行编码,在应用程序中实现事务逻辑(可能采用三阶段提交),这样我们的存储效率就会受到影响。
CouchDB 也有类似的权衡,但也牺牲了即席查询,以获得离线处理数据然后与服务器同步的能力。
Redis 和其他键值存储要求程序员编写 SQL 数据库中内置的大量索引和连接逻辑。作为交换,应用程序可以利用有关其数据的领域知识来使索引和连接比 SQL 所需的通用解决方案更有效。 Redis 还要求所有数据都适合 RAM,但作为交换,Redis 的性能与 Memcache 相当。
最后,你真的可以做 MySQL 或 Postgres 所做的一切,只需要操作系统文件系统命令(毕竟这就是编写这些数据库引擎的人所做的)。这一切都取决于您希望数据存储为您做什么以及您愿意放弃什么作为回报。
This question is a bit like asking what kind of program cannot be written in an imperative/functional language. Any Turing-complete language and express every program that can be solved by a Turing Maching. The question is do you as a programmer really want to write a accounting system for a fortune 500 company in non-portable machine instructions.
In the end, NoSQL can do anything SQL based engines can, the difference is you as a programmer may be responsible for logic in something Like Redis that MySQL gives you for free. SQL databases take a very conservative view of data integrity. The NoSQL movement relaxes those standards to gain better scalability, and to make tasks that are common to Web Applications easier.
MongoDB (my current preference) makes replication and sharding (horizontal scaling) easy, inserts very fast and drops the requirement for a strict scheme. In exchange users of MongoDB must code around slower queries when an index is not present, implement transactional logic in the app (perhaps with three phase commits), and we take a hit on storage efficiency.
CouchDB has similar trade-offs but also sacrifices ad-hoc queries for the ability to work with data off-line then sync with a server.
Redis and other key value stores require the programmer to write much of the index and join logic that is built in to SQL databases. In exchange an application can leverage domain knowledge about its data to make indexes and joins more efficient then the general solution the SQL would require. Redis also require all data to fit in RAM but in exchange gives performance on par with Memcache.
In the end you really can do everything MySQL or Postgres do with nothing more then the OS file system commands (after all that is how the people that wrote these database engines did it). It all comes down to what you want the data store to do for you and what you are willing to give up in return.
好问题。首先澄清一下。虽然关系存储领域由相当坚实的原则基础结合在一起,每个供应商都选择在功能或定价方面增加价值,但非关系 (nosql) 领域的异构性要大得多。
有一些文档存储(MongoDB、CouchDB)非常适合内容管理和类似的情况,在这种情况下,您想要围绕主题构建一组扁平的变量属性。以网站定制为例。使用文档存储来管理定义用户希望查看其页面的方式的自定义属性非常适合该平台。尽管他们进行了营销宣传,但这些商店往往无法很好地扩展到 TB 级。可以做到,但并不理想。 MongoDB 具有关系数据库中的许多功能,例如动态索引(每个集合/表最多 40 个)。 CouchDB 的设计理念是在发生故障时绝对可以恢复。
有些键/值存储(Cassandra、HBase...)非常适合高度分布式存储。 Cassandra 用于低延迟,HBase 用于较高延迟。这些方法的技巧在于,您必须在开始放入数据之前定义查询需求。它们对于针对任何属性的动态查询都效率不高。例如,如果您正在构建客户事件日志记录服务,您需要在客户的唯一属性上设置密钥。从那里,您可以将各种日志结构推送到您的商店中,并根据需要通过客户密钥检索所有日志。然而,尝试通过日志查找类型为“失败”的日志事件的成本要高得多,除非您决定将其作为辅助键。另一件事:上次我查看 Cassandra 时,您无法在 M/R 查询中运行 regexp。这意味着,如果您想在某个字段中查找模式,则必须提取该字段的所有实例,然后通过正则表达式运行它以查找所需的元组。
图数据库与上述两者有很大不同。项目(对象、元组、元素)之间的关系是流动的。它们无法扩展到 TB 级,但这不是它们的设计目的。他们非常适合提出诸如“嘿,我的用户有多少喜欢绿色?其中有多少人住在加利福尼亚州?”之类的问题。使用关系数据库,您将拥有静态结构。使用图形数据库(当然,我过于简单化了),您拥有属性和对象。您可以按照有意义的方式连接它们,而无需实施架构。
我不会将任何重要的东西放入非关系存储中。例如,在商业领域,您希望保证交易在交付产品之前完成。您想要保证完整性(或者至少保证完整性的最佳机会)。如果用户丢失了他/她的网站自定义设置,也没什么大不了的。如果你失去了一次商业交易,那就大不了了。可能有人不同意。
我也不会将复杂的结构放入上述任何非关系存储中。他们不能很好地进行大规模连接。而且,这没关系,因为这不是他们应该工作的方式。如果您可能将address_type 的标识放入关系系统中的customer_address 表中,则您可能希望将address_type 信息嵌入到存储在文档或键/值中的客户元组中。数据效率不是文档或键/值存储的领域。重点是分布和纯粹的速度。牺牲是足迹。
商店家族还有其他子类型标记为“nosql”,我在这里没有介绍。有大量(最新统计为 122 个)不同的项目专注于各种类型数据问题的非关系解决方案。 Riak 是我不断听说并迫不及待地想尝试的另一款产品。
这就是诀窍。大型关系型供应商一直在关注,而且很可能,他们都在构建或计划构建自己的非关系型解决方案以与其产品相结合。在接下来的几年里,如果不是更早的话,我们将看到这一运动成熟,大公司收购最好的品种,相关供应商开始为那些还没有提供集成解决方案的供应商提供集成解决方案。
对于数据管理领域的工作来说,这是一个非常激动人心的时刻。你应该尝试其中的一些。您可以下载 Couch 或 Mongo,并在几分钟内安装并运行它们。 HBase 有点难。
无论如何,我希望我所传达的信息不会令人困惑,我的启发不会有重大偏见或错误。
Good question. First a clarification. While the field of relational stores is held together by a rather solid foundation of principles, with each vendor choosing to add value in features or pricing, the non-relational (nosql) field is far more heterogeneous.
There are document stores (MongoDB, CouchDB) which are great for content management and similar situations where you have a flat set of variable attributes that you want to build around a topic. Take site-customization. Using a document store to manage custom attributes that define the way a user wants to see his/her page is well suited to the platform. Despite their marketing hype, these stores don't tend to scale into terabytes that well. It can be done, but it's not ideal. MongoDB has a lot of features found in relational databases, such as dynamic indexes (up to 40 per collection/table). CouchDB is built to be absolutely recoverable in the event of failure.
There are key/value stores (Cassandra, HBase...) that are great for highly-distributed storage. Cassandra for low-latency, HBase for higher-latency. The trick with these is that you have to define your query needs before you start putting data in. They're not efficient for dynamic queries against any attribute. For instance, if you are building a customer event logging service, you'd want to set your key on the customer's unique attribute. From there, you could push various log structures into your store and retrieve all logs by customer key on demand. It would be far more expensive, however, to try to go through the logs looking for log events where the type was "failure" unless you decided to make that your secondary key. One other thing: The last time I looked at Cassandra, you couldn't run regexp inside the M/R query. Means that, if you wanted to look for patterns in a field, you'd have to pull all instances of that field and then run it through a regexp to find the tuples you wanted.
Graph databases are very different from the two above. Relations between items(objects, tuples, elements) are fluid. They don't scale into terabytes, but that's not what they are designed for. They are great for asking questions like "hey, how many of my users lik the color green? Of those, how many live in California?" With a relational database, you would have a static structure. With a graph database (I'm oversimplifying, of course), you have attributes and objects. You connect them as makes sense, without schema enforcement.
I wouldn't put anything critical into a non-relational store. Commerce, for instance, where you want guarantees that a transaction is complete before delivering the product. You want guaranteed integrity (or at least the best chance of guaranteed integrity). If a user loses his/her site-customization settings, no big deal. If you lose a commerce transation, big deal. There may be some who disagree.
I also wouldn't put complex structures into any of the above non-relational stores. They don't do joins well at-scale. And, that's okay because it's not the way they're supposed to work. Where you might put an identity for address_type into a customer_address table in a relational system, you would want to embed the address_type information in a customer tuple stored in a document or key/value. Data efficiency is not the domain of the document or key/value store. The point is distribution and pure speed. The sacrifice is footprint.
There are other subtypes of the family of stores labeled as "nosql" that I haven't covered here. There are a ton (122 at last count) different projects focused on non-relational solutions to data problems of various types. Riak is yet another one that I keep hearing about and can't wait to try out.
And here's the trick. The big-dollar relational vendors have been watching and chances are, they're all building or planning to build their own non-relational solutions to tie in with their products. Over the next couple years, if not sooner, we'll see the movement mature, large companies buy up the best of breed and relational vendors start offering integrated solutions, for those that haven't already.
It's an extremely exciting time to work in the field of data management. You should try a few of these out. You can download Couch or Mongo and have them up and running in minutes. HBase is a bit harder.
In any case, I hope I've informed without confusing, that I have enlightened without significant bias or error.
RDBMS 擅长连接,而 NoSQL 引擎通常不擅长。
NoSQL 引擎擅长分布式可扩展性,而 RDBMS 通常不擅长。
RDBMS 擅长数据验证约束,而 NoSQL 引擎通常不擅长。
NoSQL 引擎擅长灵活且无模式的方法,而 RDBMS 通常不擅长。
两种方法都可以解决任意一组问题;区别在于效率。
RDBMSes are good at joins, NoSQL engines usually aren't.
NoSQL engines is good at distributed scalability, RDBMSes usually aren't.
RDBMSes are good at data validation coinstraints, NoSQL engines usually aren't.
NoSQL engines are good at flexible and schema-less approaches, RDBMSes usually aren't.
Both approaches can solve either set of problems; the difference is in efficiency.
您的问题的答案可能是 mongodb 可以处理任何任务(也可以处理 sql)。但在某些情况下最好选择mongodb,在其他情况下选择sql数据库。关于优点和缺点,您可以阅读此处。
正如@Dmitry所说,mongodb 通过复制和垂直扩展打开了方便的大门。分片。
Probably answer to your question is that mongodb can handle any task (and sql too). But in some cases better to choose mongodb, in others sql database. About advantages and disadvantages you can read here.
Also as @Dmitry said mongodb open door for easy horizontal and vertical scaling with replication & sharding.
RDBMS 强制执行强一致性,而大多数 no-sql 是最终一致的。因此,在给定的时间点,当从 no-sql 数据库读取数据时,它可能并不代表该数据的最新副本。
一个常见的例子是银行交易,当用户取款时,节点 A 会更新此事件,如果同时节点 B 查询该用户的余额,它可能会返回过期的余额。这在 RDBMS 中不会发生,因为一致性属性保证数据在读取之前已更新。
RDBMS enforce strong consistency while most no-sql are eventual consistent. So at a given point in time when data is read from a no-sql DB it might not represent the most up-to-date copy of that data.
A common example is a bank transaction, when a user withdraw money, node A is updated with this event, if at the same time node B is queried for this user's balance, it can return an outdated balance. This can't happen in RDBMS as the consistency attribute guarantees that data is updated before it can be read.
RDBM 确实非常适合从表中快速聚合总和、平均值等。例如
从 y WHERE z 中选择 SUM(x)
。如果您想立即得到答案,那么在大多数 NoSQL 数据库中这是很难做到的事情。一些 NoSQL 存储提供 Map/Reduce 作为解决同一问题的方法,但它不像 SQL 世界中那样实时。RDBMs are really good for quickly aggregating sums, averages, etc. from tables. e.g.
SELECT SUM(x) FROM y WHERE z
. It's something that is surprisingly hard to do in most NoSQL databases, if you want an answer at once. Some NoSQL stores provide map/reduce as a way of solving the same thing, but it is not real time in the same way it is in the SQL world.