“最终”你到底能走多远? 一致性和无事务(又名 SimpleDB)?
我真的很想使用 SimpleDB,但我担心如果没有真正的锁定和事务,整个系统就会存在致命缺陷。 我知道对于高读/低写应用程序来说这是有意义的,因为最终系统会变得一致,但是中间的时间呢? 似乎不一致的数据库中的正确查询将以一种很难追踪的方式对整个数据库造成严重破坏。 希望我只是一个担心疣......
I really want to use SimpleDB, but I worry that without real locking and transactions the entire system is fatally flawed. I understand that for high-read/low-write apps it makes sense, since eventually the system becomes consistent, but what about that time in between? Seems like the right query in an inconsistent db would perpetuate havoc throughout the entire database in a way that's very hard to track down. Hopefully I'm just being a worry wart...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设您正在谈论 这个 SimpleDB,那么您不是忧心忡忡; 有真正的理由不将其用作现实世界的 DBMS。
从 DBMS 中的事务支持获得的属性可以用首字母缩写词“ACID”缩写:原子性、一致性、隔离性和持久性。 A 和 D 主要与系统崩溃有关,而 C 和 I 与常规操作有关。 人们在使用商业数据库时完全认为它们是理所当然的,因此,如果您使用的数据库没有其中一个或多个,您可能会遇到许多令人讨厌的意外。
原子性:任何事务要么完全完成,要么根本不完成(即,它将干净地提交或中止)。 这适用于单个语句(例如“UPDATE table ...”)以及更长、更复杂的事务。 如果你没有这个,那么任何出错的地方(比如磁盘已满、计算机崩溃等)都可能导致事情半途而废。 换句话说,您永远不能依赖 DBMS 真正完成您告诉它的事情,因为任何数量的现实问题都可能妨碍,甚至一个简单的 UPDATE 语句也可能部分完成。
一致性:您设置的有关数据库的任何规则都将始终得到执行。 例如,如果您有一条规则,规定 A 始终等于 B,那么任何人对数据库系统所做的任何操作都无法打破该规则 - 任何尝试的操作都会失败。 如果您的所有代码都是完美的,那么这一点就不那么重要了……但实际上,什么时候会出现这种情况呢? 另外,如果您缺少这个安全网,那么当您丢失时,事情就会变得非常糟糕......
隔离:对数据库采取的任何操作都将像连续发生一样执行(一次一个),即使实际上它们是同时发生的(彼此交错)。 如果多个用户同时访问该数据库,而您没有这个数据库,那么您甚至无法想象的事情都会出错; 即使是原子语句也可能以不可预见的方式相互作用并把事情搞砸。
持久性:如果断电或软件崩溃,正在进行的数据库事务会怎样? 如果你有耐久性,答案是“没什么——它们都是安全的”。 数据库通过使用所谓的“撤消/重做日志记录”来实现此目的,其中您对数据库所做的每一件小事都会首先记录下来(通常为了安全起见,记录在单独的磁盘上),以便您可以在发生故障后重建当前状态。 如果没有这个,上面的其他属性就毫无用处,因为你永远无法 100% 确定崩溃后事情会保持一致。
这些事情对你来说重要吗? 答案与您正在进行的交易类型以及在失败情况下您想要的保证有关。 在某些情况下(例如只读数据库),您可能不需要这些,但是一旦您开始做任何不平凡的事情,并且发生了一些不好的事情,您就会希望拥有它们。 也许您可以在发生意外情况时恢复到备份,但我的猜测是事实并非如此。
另请注意,放弃所有这些保护并不意味着您的数据库会表现更好; 事实上,情况可能恰恰相反。 这是因为现实世界的 DBMS 软件也有大量代码来优化查询性能。 因此,如果您编写一个连接 SimpleDB 上 6 个表的查询,不要假设它会找出运行该查询的最佳方式 - 当商业 DBMS 可以使用索引哈希连接并在 0.5 秒内获得它。 您可以使用无数的小技巧来优化查询性能,相信我,当它们消失时,您真的会想念它们。
所有这些都不是对 SimpleDB 的攻击; 从软件作者处获取:“虽然它是这是一个很棒的教学工具,我无法想象会有人愿意将它用于其他用途。”
Assuming you're talking about this SimpleDB, you're not being a worrywart; there are real reasons not to use it as a real world DBMS.
The properties that you get from transaction support in a DBMS can be abbreviated by the acronym "A.C.I.D.": Atomicity, Consistency, Isolation, and Durability. The A and D have mostly to do with system crashes, and the C and I have to do with regular operation. They're all things people totally take for granted when working with commercial databases, so if you work with a database that doesn't have one or more of them, you might be in for any number of nasty surprises.
Atomicity: Any transaction will either complete fully or not at all (i.e. it will either commit or abort cleanly). This applies to single statements (like "UPDATE table ...") as well as longer, more complicated transactions. If you don't have this, then anything that goes wrong (like, the disk getting full, the computer crashing, etc.) might leave something half-done. In other words, you can't ever rely on the DBMS to really do the things you tell it to, because any number of real-world problems can get in the way, and even a simple UPDATE statement might get partially completed.
Consistency: Any rules you've set up about the database will always be enforced. Like, if you have a rule that says A always equals B, then nothing anybody does to the database system can break that rule - it'll fail any operation that tries. This isn't quite as important if all your code is perfect ... but really, when is that ever the case? Plus, if you're missing this safety net, things get really yucky when you lose ...
Isolation: Any actions taken on the database will execute as if they happened serially (one at a time), even if in reality they're happening concurrently (interleaved with each other). If more than one user is going to hit this database at the same time, and you don't have this, then things you can't even dream up will go wrong; even atomic statements can interact with each other in unforeseen ways and screw things up.
Durability: If you lose power or the software crashes, what happens to database transactions that were in progress? If you have durability, the answer is "nothing - they're all safe". Databases do this by using something called "Undo / Redo Logging", where every little thing you do to the database is first logged (typically on a separate disk for safety) in a way such that you can reconstruct the current state after a failure. Without that, the other properties above are sort of useless, because you can never be 100% sure that things will stay consistent after a crash.
Do any of these things matter to you? The answer has everything to do with the types of transactions you're doing, and what guarantees you want in a failure situation. There may well be cases (like a read-only database) where you don't need these, but as soon as you start doing anything non-trivial, and something bad happens, you'll wish you had 'em. Maybe it's OK for you to just revert to a backup anytime something unexpected happens, but my guess is that it isn't.
Also note that dropping all of these protections doesn't make it a given that your database will perform better; in fact, it's probably the opposite. That's because real-world DBMS software also has tons of code to optimize query performance. So, if you write a query that joins 6 tables on SimpleDB, don't assume that it'll figure out the optimal way to run that query - you might end up waiting hours for it to complete, when a commercial DBMS could use an indexed hash join and get it in .5 seconds. There are a zillion little tricks that you can do to optimize query performance, and believe me, you'll really miss them when they're gone.
None of this is meant as a knock on SimpleDB; take it from the author of the software: "Although it is a great teaching tool, I can't imagine that anyone would want to use it for anything else."
这是一致性和可扩展性以及(在某种程度上)可用性之间的非常经典的战斗。 有些数据并不总是需要那么一致。 例如,查看 digg.com 以及某个故事的挖掘数量。 值很可能在“digg”记录中重复,而不是强制数据库对“user_digg”表进行联接。 如果这个数字不完全准确,有什么关系吗? 可能不会。 那么使用 SimpleDB 之类的东西可能是一个不错的选择。 但是,如果您正在编写银行系统,那么您可能应该首先重视一致性。 :)
除非您从第一天起就知道必须处理大规模问题,否则我会坚持使用简单的更传统的系统,例如 RDBMS。 如果您在具有合理商业模式的地方工作,那么当流量大幅上涨时,您将有望看到收入大幅上涨。 然后你可以用这笔钱来帮助解决扩展问题。 扩展是困难的,扩展也很难预测。 大多数对您造成伤害的扩展问题都是您从未预料到的。
我宁愿让一个网站起步,花几周时间在流量增加时解决规模问题,然后花太多时间担心规模问题,以至于我们永远不会将其投入生产,因为我们的钱用完了。 :)
This is the pretty classic battle between consistency and scalability and - to some extent - availability. Some data doesn't always need to be that consistent. For instance, look at digg.com and the number of diggs against a story. There's a good chance that value is duplicated in the "digg" record rather than forcing the DB to do a join against the "user_digg" table. Does it matter if that number isn't perfectly accurate? Probably not. Then using something like SimpleDB might be a good fit. However if you are writing a banking system, you should probably value consistency above all else. :)
Unless you know from day 1 that you have to deal with massive scale, I would stick to simple more conventional systems like RDBMS. If you are working somewhere with a reasonable business model, you will hopefully see a big spike in revenue if there's a big spike in traffic. Then you can use that money to help solving the scaling problems. Scaling is hard and scaling is hard to predict. Most of the scaling problems that hurt you will be ones that you never expect.
I would much rather get a site off the ground and spend a few weeks fixing scale issues when traffic picks up then spend so much time worrying about scale that we never make it to production because we run out of money. :)