哪些应用程序不需要 ACID？

发布于 2024-11-03 20:16:23 字数 128 浏览 5 评论 0原文

很抱歉问了这个无知的问题，但是什么样的应用程序不需要符合 ACID 的数据库服务器呢？我有 SQL Server 背景，其中 ACID 一直“存在”，现在研究其他 DBMS 引起了我的思考。我能想到的大多数应用程序都希望原子性或隔离性。谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

所谓喜欢 2024-11-10 20:16:23

其他答案似乎缺少的是，ACID 的普遍适用的替代方案并不是“什么都没有”，而是所谓的最终一致性（有时称为 BASE）。

当人们说他们需要 ACID 语义时，他们真正的意思通常是，至少从领域/业务需求的角度来看，只是数据完整性。他们希望确保数据不会丢失或损坏。许多 NoSQL 数据库仍然提供这种保证，只是以不同的方式和根据自己的条款提供。

如果您将 NoSQL 或 BASE 数据库简单地视为“非 ACID 数据库”，那么当然可以使用 NoSQL 或 BASE 数据库作为 SQL 或 ACID 数据库的不安全替代方案。做出明智的决定意味着您了解在应用程序级别必须做什么，以弥补粗粒度交易的不足并发挥 EC 的优势。一些常见的技术是：

乐观并发，它已用于最大限度地减少事务中的锁定
操作的幂等性，这样如果长时间运行的操作中途失败，它可以只需一次又一次地重试，直到成功。
使用补偿事务，在分布式系统中通常称为传奇，其中多个独立事务按某些相关标识符和状态进行分组这整个操作都是独立跟踪的。通常，这些实际上对 saga 状态本身使用 ACID 语义，但这比两阶段提交轻得多。

事实上，如果您花费大量时间在分布式系统上工作 - 即使是那些在每个单独的子系统中都具有 ACID 语义的系统 - 您会发现许多相同的技术用于管理跨分布式系统。系统操作，因为没有它们，您只会抹杀性能（想想 BizTalk 和 BPEL）。

一旦您有了一些使用经验，您就会意识到它实际上很有意义，并且通常比尝试应用 ACID 语义更容易。计算过程只是现实生活过程的模型，而现实生活过程有时可能会在中途失败。您预订了航班，但突然无法再出发了。你做什么工作？你取消。也许你能收回你的钱，也许你没有，或者可能是介于两者之间的东西 - 这些都是你的业务规则。或者，您可能开始预订，但分心或偏离了方向，或者您的电源断了，现在您的会话已超时。你做什么工作？很简单，你重新开始。

为了真正正面解决这个问题，我会这样回答：

在以下情况下，您需要 ACID 语义：

您可以合理地期望多个用户或进程在同一时间处理相同的数据 >同一时间。
交易出现的顺序极其重要；
您永远不能容忍向用户显示过时的数据。
不完整交易会产生重大和/或直接成本（例如，总额不平衡可能产生严重后果的金融系统）。

另一方面，如果满足以下条件，则您不需要 ACID 语义：

用户只倾向于对自己的私有数据执行更新，或者根本不执行更新（仅追加）。< /p>
没有隐式（业务定义的）事务排序。例如，如果两个客户正在争夺库存中的最后一件商品，那么谁实际获得它对您来说并不重要。
用户往往会一次在同一屏幕上停留几秒钟或几分钟，因此无论如何都会查看过时的数据（这实际上描述了大多数应用程序）。
您可以简单地放弃不完整的交易；将它们暂时或在某些情况下永久保留在数据库中不会产生负面影响。

最重要的是，很少有应用程序真正需要 ACID 语义。然而，许多应用程序将在某个地方需要它们 - 通常在隔离的口袋中，例如传奇状态或消息队列。

下次您设计新的应用程序或功能时，请尝试考虑一下是否可以将原子/隔离的“事务”建模为异步“事件链”，并用一些额外的状态将它们全部联系起来一起。在某些情况下，答案可能是否，但您可能会惊讶于答案是是的频率。

What the other answers seem to be missing is that the generally-applicable alternative to ACID isn't "nothing", it's something called eventual consistency (sometimes nicknamed BASE).

When people say they need ACID semantics, often what they really mean, at least from a domain/business requirements point of view, is simply data integrity. They want to make sure that data doesn't get lost or corrupted. Many NoSQL databases still provide this guarantee, they just provide it in a different way and on their own terms.

It's certainly possible to use a NoSQL or BASE database as an unsafe alternative to a SQL or ACID database, if you treat it as simply a "non-ACID database". Making an informed decision means you understand what has to be done at the application level to compensate for the lack of coarse-grained transactions and play to the strengths of EC. Some common techniques are:

Optimistic concurrency, which is already used to minimize locking in a transactional environment.
Idempotence of operations, such that if a long-running operation fails halfway through, it can simply be retried again and again until it succeeds.
Long-running transaction techniques using compensating transactions, often called sagas in distributed systems, where multiple independent transactions are grouped by some correlation identifier and the state of the entire operation is tracked independently. Often these actually use ACID semantics for the saga state itself, but that is much more lightweight than a two-phase commit.

In point of fact, if you spend much time working on distributed systems - even those with ACID semantics available at each of the individual subsystems - you'll find a lot of these same techniques used to manage cross-system operations, because without them you just obliterate performance (think BizTalk and BPEL).

Once you've had some experience with it, you'll realize that it actually makes a lot of sense and is often easier than trying to apply ACID semantics. Computing processes are just models for real-life processes, and real-life processes can sometimes fail in mid-stream. You booked a flight but suddenly you can't go anymore. What do you do? You cancel. Maybe you get your money back, maybe you don't, or maybe it's something in between - those are your business rules. Or maybe you started your booking but got distracted or sidetracked or your power went out, and now your session's timed out. What do you do? Simple, you start over.

To really address the question head-on, I'd answer thusly:

You need ACID semantics when:

You can reasonably expect to have multiple users or processes working on the same data at the same time.
The order in which transactions appear is extremely important;
You cannot ever tolerate stale data being displayed to the user.
There is a significant and/or direct cost to incomplete transactions (e.g. a financial system where unbalanced totals can have grave consequences).

On the other hand, you don't need ACID semantics if:

Users only tend to perform updates on their own private data, or don't perform updates at all (just append).
There is no implicit (business-defined) ordering of transactions. For example, if two customers are competing for the last item in stock, it really doesn't matter to you who actually gets it.
Users will tend to be on the same screen for seconds or minutes at a time, and are therefore going to be looking at stale data anyway (this actually describes most applications).
You have the ability to simply abandon incomplete transactions; there is no negative impact of having them sitting around in the database temporarily or in some cases permanently.

The bottom line is that very few applications truly require ACID semantics everywhere. However, many applications will require them somewhere - often in isolated pockets like saga state or message queues.

Next time you're designing a new application or feature, try giving some thought to whether or not it might be possible to model an atomic/isolated "transaction" as an asynchronous "chain of events" with a little extra state to tie them all together. In some cases the answer will be no, but you might be surprised at how often the answer is yes.

回复收藏 0 原文

会傲 2024-11-10 20:16:23

这是一个悖论，每个 RDBMS 人员都认为没有 ACID，天就会塌下来，但大多数 NoSQL 人员很乐意部署和支持最终用户应用程序，而从未想过“我的应用程序使用 ACID 会更好”。与 Marc B 的回答相反，NoSQL 数据库不是更新随机丢失或数据随机损坏的数据库。关键区别在于，在 NoSQL 数据库中，您可以使用有限版本的原子性和原子性。隔离等，但实现任意复杂度的事务需要指数级的工作量。

没有理由不能使用非 ACID 数据库来实现银行系统。大多数 NoSQL 数据库都允许您使用微交易，从一个帐户中扣除资金并将其添加到另一个帐户中，系统中总金额发生变化的可能性为 0%。

为了在现实世界的示例中讨论这个问题，我将描述我们的应用程序。我的公司向高中销售软件，主要用于时间表，但也用于点名、管理教师缺勤/替换、短途旅行和房间预订。我们的软件基于内部开发的非 ACID 数据库引擎，称为 Mrjb（仅在内部可用），该引擎具有 NoSQL 数据库的典型局限性。

ACID 和 NoSQL 之间与最终用户相关的差异的一个例子是，如果 2 个用户尝试在完全相同的时间标记同一个卷，则最终结果将是数据组合的可能性（非常）小由两个用户提交。 ACID 数据库将保证最终结果要么是一个用户的数据，要么是另一个用户的数据，或者可能一个用户的更新将失败并向用户返回一条错误消息。

在这种情况下，我认为我们的用户不会关心个别学生的“缺勤”状态是否全部与一个用户的更新一致或两者的混合，尽管他们会担心我们分配的缺勤状态与两者相反用户的输入。这个例子在实践中不应该发生，如果发生了，那就是一个“竞争条件”，对于我们相信哪个用户，基本上没有正确的答案。

有人提出了与我们的 Mrjb 数据库相关的问题，即我们是否能够实现诸如“不得允许 Student 对象在没有相应 Family 对象的情况下存在”之类的约束。（“ACID”中的“C”=一致性）。事实上，我们可以而且确实维持这种约束——微交易的另一个例子。

另一个例子是上传每日时间表所依据的新版本的周期性学校时间表（通常为两周周期）。我们很难让这个更新事务成为原子的，或者允许其他事务与这个更新隔离地执行。因此，我们基本上可以选择在这一重大事务发生时“停止世界”，这大约需要 2 秒，或者允许学生打印包含更新前和更新后数据组合的时间表（有可能发生这种情况的 100 毫秒窗口）。 “停止世界”选项可能是更好的选择，但事实上我们选择了后者。您可能会说混合时间表比更新前的时间表更糟糕，但在这两种情况下，我们都需要依靠学校制定流程来通知学生时间表已更改 - 学生使用过时的时间表即使时间表一致，也是一个大问题。另请注意，学生通常会在线查看他们的时间表，在这种情况下，问题会大大减少。

我还为 http://brainresource.com 编写了一个“基于文件系统的 Blob 数据库”，用于存储他们的大脑扫描结果。这是一个重要的数据库，并且没有 ACID 属性，尽管他们确实使用 RDBMS 来存储有关其主题的其他数据。

作为记录，我们公司的描述如下：http://edval.com.au，我们的 NoSql 技术如下描述（描述为技术）：http://www.edval.biz/memory-resident-programming-object-databases。有人担心这篇文章是垃圾邮件，给我们公司带来了麻烦，但我认为（a）所提出的问题不能仅用理论术语来回答 - 你需要一些现实世界的例子，并且（b）保留有关产品或数据库技术的任何识别信息都是不适当的。

It's a paradox that every RDBMS guy thinks the sky would fall without ACID, but most NoSQL guys happily deploy and support end-user applications without ever thinking "my application would be better with ACID". Contrary to Marc B's answer, NoSQL databases are not databases where updates randomly get lost or data randomly corrupted. The key difference is that in NoSQL databases you get to use limited versions of atomicity & isolation etc., but it takes an exponential amount of effort to implement transactions of arbitrary complexity.

There is no reason why you can't implement a bank system using a non-ACID database. Most NoSQL databases would let you use micro-transactions which deduct money from one account and add it to another, with a 0% chance of the total amount of money in the system changing.

In order to discuss this question in the context of real-world examples, I'll describe our application. My company sells software to high schools, primarily for timetabling but also roll-call, managing teacher absences/replacements, excursions and room bookings. Our software is based on an in-house developed non-ACID database engine called Mrjb (only available internally) which has limitations which are typical of NoSQL databases.

An example of the difference between ACID and NoSQL as relevant to the end user is that if 2 users try to mark the same roll at exactly the same time, there is a (very) small chance that the final result will be a combination of data submitted by both users. An ACID database would guarantee that the final result is either one user's data or the other's, or possibly that one user's update will fail and return an error-message to the user.

In this case I don't think our users would care about whether the individual students' "absence" statuses are all consistent with one user's update or a mixture of both, although they would be concerned if we assigned absence statuses which are contrary to both users' inputs. This example should not occur in practice, and if it does then it's a "race condition" where there's essentially no right answer about which user we believe.

A question was raised in relation to our Mrjb database about whether we're able to implement constraints such as "must not allow a Student object to exist without a corresponding Family object". (The 'C' in 'ACID' = Consistency). In fact we can and do maintain this constraint - another example of a micro-transaction.

Another example is when uploading a new version of the cyclical school timetable (typically a 2-week cycle) upon which the daily timetable is based. We would be hard pressed to make this update transaction atomic or to allow other transactions to execute in isolation from this update. So we basically have a choice to either "stop the world" while this major transaction occurs, which takes about 2 seconds, or allow the possibility that a student prints off a timetable containing a combination of pre-update and post-update data (there's probably a 100ms window in which this could occur). The "stop the world" option is probably the better option, but in fact we do the latter. You could argue that a mixed timetable is worse than a pre-update timetable, but in both cases we need to rely on the school having a process to notify students that the timetable has changed - a student working off an out-of-date timetable is a big problem even if it's a consistent timetable. Note also that students typically view their timetable online, in which case the problem is greatly reduced.

I also wrote a "file-system-based Blob database" for http://brainresource.com , to store their brain scans. This is an important database, and one which has no ACID properties, although they do use an RDBMS for other data about their subjects.

For the record, our company is described here: http://edval.com.au and our NoSql technology is described here (described as a technique): http://www.edval.biz/memory-resident-programming-object-databases . There was a concern that this post was spam, giving a plug to our company, but I would argue that (a) the question being asked cannot be answered on solely theoretical terms - you need some real-world examples, and (b) withholding any identifying information about the product or database technology is not appropriate.

回复收藏 0 原文