NoSQL 数据库无法处理的任务示例（如果有）

发布于 2024-10-26 01:24:21 字数 624 浏览 10 评论 0原文

我想测试一下 NoSQL 世界。这只是好奇心，还不是绝对需要。我读过一些有关 SQL 和 NoSQL 数据库之间差异的文章。我对潜在的优势深信不疑，但我有点担心 NoSQL 不适用的情况。如果我了解 NoSQL 数据库本质上会错过 ACID 属性。

有人可以举一个 ACID 关系数据库可以处理的现实世界操作（例如电子商务网站或科学应用程序，或者...）的示例，但是 NoSQL 数据库可能会严重失败，无论是系统性的还是某种类型的操作竞争条件或由于停电等？

完美的例子是，如果不修改数据库引擎就无法找到任何解决方法。 NoSQL 数据库表现不佳的例子最终将是另一个问题，但在这里我想看看理论上我们什么时候不能使用这种技术。

也许找到这样的例子是特定于数据库的。如果是这样的话，我们就以 MongoDB 来代表 NoSQL 世界吧。

编辑：为了澄清这个问题，我不想争论哪种数据库对于某些情况更好。我想知道在某些情况下这项技术是否绝对是死胡同，因为无论我们如何努力尝试 SQL 数据库提供的某些功能都无法在 nosql 存储之上实现。由于有许多 nosql 存储可用，我可以接受选择现有的 nosql 存储作为支持，但我最感兴趣的是存储应该提供的最小功能子集，以便能够实现更高级别的功能（例如事务可以使用不提供 X 的商店...）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独﹏钓一江月 2024-11-02 01:24:21

这个问题有点像问什么样的程序不能用命令式/函数式语言编写。任何图灵完备的语言并表达可以通过图灵机解决的每个程序。问题是，作为一名程序员，您真的想用不可移植的机器指令为财富 500 强公司编写会计系统吗？

最后，NoSQL 可以做任何基于 SQL 的引擎可以做的事情，不同之处在于，作为程序员，您可能负责 MySQL 免费为您提供的类似 Redis 之类的逻辑。 SQL 数据库对数据完整性采取非常保守的看法。 NoSQL 运动放宽了这些标准，以获得更好的可扩展性，并使 Web 应用程序常见的任务变得更容易。

MongoDB（我目前的偏好）使复制和分片（水平扩展）变得容易，插入速度非常快，并且不需要严格的方案。作为交换，MongoDB 用户必须在索引不存在时围绕较慢的查询进行编码，在应用程序中实现事务逻辑（可能采用三阶段提交），这样我们的存储效率就会受到影响。

CouchDB 也有类似的权衡，但也牺牲了即席查询，以获得离线处理数据然后与服务器同步的能力。

Redis 和其他键值存储要求程序员编写 SQL 数据库中内置的大量索引和连接逻辑。作为交换，应用程序可以利用有关其数据的领域知识来使索引和连接比 SQL 所需的通用解决方案更有效。 Redis 还要求所有数据都适合 RAM，但作为交换，Redis 的性能与 Memcache 相当。

最后，你真的可以做 MySQL 或 Postgres 所做的一切，只需要操作系统文件系统命令（毕竟这就是编写这些数据库引擎的人所做的）。这一切都取决于您希望数据存储为您做什么以及您愿意放弃什么作为回报。

回复收藏 0 原文

a√萤火虫的光℡ 2024-11-02 01:24:21

好问题。首先澄清一下。虽然关系存储领域由相当坚实的原则基础结合在一起，每个供应商都选择在功能或定价方面增加价值，但非关系 (nosql) 领域的异构性要大得多。

有一些文档存储（MongoDB、CouchDB）非常适合内容管理和类似的情况，在这种情况下，您想要围绕主题构建一组扁平的变量属性。以网站定制为例。使用文档存储来管理定义用户希望查看其页面的方式的自定义属性非常适合该平台。尽管他们进行了营销宣传，但这些商店往往无法很好地扩展到 TB 级。可以做到，但并不理想。 MongoDB 具有关系数据库中的许多功能，例如动态索引（每个集合/表最多 40 个）。 CouchDB 的设计理念是在发生故障时绝对可以恢复。

有些键/值存储（Cassandra、HBase...）非常适合高度分布式存储。 Cassandra 用于低延迟，HBase 用于较高延迟。这些方法的技巧在于，您必须在开始放入数据之前定义查询需求。它们对于针对任何属性的动态查询都效率不高。例如，如果您正在构建客户事件日志记录服务，您需要在客户的唯一属性上设置密钥。从那里，您可以将各种日志结构推送到您的商店中，并根据需要通过客户密钥检索所有日志。然而，尝试通过日志查找类型为“失败”的日志事件的成本要高得多，除非您决定将其作为辅助键。另一件事：上次我查看 Cassandra 时，您无法在 M/R 查询中运行 regexp。这意味着，如果您想在某个字段中查找模式，则必须提取该字段的所有实例，然后通过正则表达式运行它以查找所需的元组。

图数据库与上述两者有很大不同。项目（对象、元组、元素）之间的关系是流动的。它们无法扩展到 TB 级，但这不是它们的设计目的。他们非常适合提出诸如“嘿，我的用户有多少喜欢绿色？其中有多少人住在加利福尼亚州？”之类的问题。使用关系数据库，您将拥有静态结构。使用图形数据库（当然，我过于简单化了），您拥有属性和对象。您可以按照有意义的方式连接它们，而无需实施架构。

我不会将任何重要的东西放入非关系存储中。例如，在商业领域，您希望保证交易在交付产品之前完成。您想要保证完整性（或者至少保证完整性的最佳机会）。如果用户丢失了他/她的网站自定义设置，也没什么大不了的。如果你失去了一次商业交易，那就大不了了。可能有人不同意。

我也不会将复杂的结构放入上述任何非关系存储中。他们不能很好地进行大规模连接。而且，这没关系，因为这不是他们应该工作的方式。如果您可能将address_type 的标识放入关系系统中的customer_address 表中，则您可能希望将address_type 信息嵌入到存储在文档或键/值中的客户元组中。数据效率不是文档或键/值存储的领域。重点是分布和纯粹的速度。牺牲是足迹。

商店家族还有其他子类型标记为“nosql”，我在这里没有介绍。有大量（最新统计为 122 个）不同的项目专注于各种类型数据问题的非关系解决方案。 Riak 是我不断听说并迫不及待地想尝试的另一款产品。

这就是诀窍。大型关系型供应商一直在关注，而且很可能，他们都在构建或计划构建自己的非关系型解决方案以与其产品相结合。在接下来的几年里，如果不是更早的话，我们将看到这一运动成熟，大公司收购最好的品种，相关供应商开始为那些还没有提供集成解决方案的供应商提供集成解决方案。

对于数据管理领域的工作来说，这是一个非常激动人心的时刻。你应该尝试其中的一些。您可以下载 Couch 或 Mongo，并在几分钟内安装并运行它们。 HBase 有点难。

无论如何，我希望我所传达的信息不会令人困惑，我的启发不会有重大偏见或错误。

Good question. First a clarification. While the field of relational stores is held together by a rather solid foundation of principles, with each vendor choosing to add value in features or pricing, the non-relational (nosql) field is far more heterogeneous.

There are document stores (MongoDB, CouchDB) which are great for content management and similar situations where you have a flat set of variable attributes that you want to build around a topic. Take site-customization. Using a document store to manage custom attributes that define the way a user wants to see his/her page is well suited to the platform. Despite their marketing hype, these stores don't tend to scale into terabytes that well. It can be done, but it's not ideal. MongoDB has a lot of features found in relational databases, such as dynamic indexes (up to 40 per collection/table). CouchDB is built to be absolutely recoverable in the event of failure.

There are key/value stores (Cassandra, HBase...) that are great for highly-distributed storage. Cassandra for low-latency, HBase for higher-latency. The trick with these is that you have to define your query needs before you start putting data in. They're not efficient for dynamic queries against any attribute. For instance, if you are building a customer event logging service, you'd want to set your key on the customer's unique attribute. From there, you could push various log structures into your store and retrieve all logs by customer key on demand. It would be far more expensive, however, to try to go through the logs looking for log events where the type was "failure" unless you decided to make that your secondary key. One other thing: The last time I looked at Cassandra, you couldn't run regexp inside the M/R query. Means that, if you wanted to look for patterns in a field, you'd have to pull all instances of that field and then run it through a regexp to find the tuples you wanted.

Graph databases are very different from the two above. Relations between items(objects, tuples, elements) are fluid. They don't scale into terabytes, but that's not what they are designed for. They are great for asking questions like "hey, how many of my users lik the color green? Of those, how many live in California?" With a relational database, you would have a static structure. With a graph database (I'm oversimplifying, of course), you have attributes and objects. You connect them as makes sense, without schema enforcement.

I wouldn't put anything critical into a non-relational store. Commerce, for instance, where you want guarantees that a transaction is complete before delivering the product. You want guaranteed integrity (or at least the best chance of guaranteed integrity). If a user loses his/her site-customization settings, no big deal. If you lose a commerce transation, big deal. There may be some who disagree.

I also wouldn't put complex structures into any of the above non-relational stores. They don't do joins well at-scale. And, that's okay because it's not the way they're supposed to work. Where you might put an identity for address_type into a customer_address table in a relational system, you would want to embed the address_type information in a customer tuple stored in a document or key/value. Data efficiency is not the domain of the document or key/value store. The point is distribution and pure speed. The sacrifice is footprint.

There are other subtypes of the family of stores labeled as "nosql" that I haven't covered here. There are a ton (122 at last count) different projects focused on non-relational solutions to data problems of various types. Riak is yet another one that I keep hearing about and can't wait to try out.

And here's the trick. The big-dollar relational vendors have been watching and chances are, they're all building or planning to build their own non-relational solutions to tie in with their products. Over the next couple years, if not sooner, we'll see the movement mature, large companies buy up the best of breed and relational vendors start offering integrated solutions, for those that haven't already.

It's an extremely exciting time to work in the field of data management. You should try a few of these out. You can download Couch or Mongo and have them up and running in minutes. HBase is a bit harder.

In any case, I hope I've informed without confusing, that I have enlightened without significant bias or error.

回复收藏 0 原文