泄漏抽象的含义?
“泄漏抽象”一词是什么意思? (请举例说明。我经常很难理解纯粹的理论。)
What does the term "Leaky Abstraction" mean? (Please explain with examples. I often have a hard time grokking a mere theory.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
这是一个 meatspace 示例:
汽车对驾驶员有抽象。最纯粹的形式是有方向盘、加速器和制动器。这种抽象隐藏了很多关于引擎盖下的细节:发动机、凸轮、正时皮带、火花塞、散热器等。
这种抽象的巧妙之处在于,我们可以用改进的部件替换部分实现,而无需重新培训用户。假设我们用电子点火装置替换分电器盖,并用可变凸轮替换固定凸轮。这些变化提高了性能,但用户仍然用方向盘转向并使用踏板启动和停止。
这实际上是相当了不起的……一个 16 岁或一个 80 岁的人可以操作这台复杂的机器,而无需真正了解它的内部工作原理!
但也有泄漏的情况。传动装置有小泄漏。在自动变速箱中,您可以感觉到汽车在换档时暂时失去动力,而在 CVT 变速箱中,您会感觉到扭矩一直平稳上升。
还有更大的泄漏。如果发动机转速太快,可能会对其造成损坏。如果发动机缸体过冷,汽车可能无法启动或性能不佳。如果你同时打开收音机、车头灯和空调,你会发现你的油耗会下降。
Here's a meatspace example:
Automobiles have abstractions for drivers. In its purest form, there's a steering wheel, accelerator and brake. This abstraction hides a lot of detail about what's under the hood: engine, cams, timing belt, spark plugs, radiator, etc.
The neat thing about this abstraction is that we can replace parts of the implementation with improved parts without retraining the user. Let's say we replace the distributor cap with electronic ignition, and we replace the fixed cam with a variable cam. These changes improve performance but the user still steers with the wheel and uses the pedals to start and stop.
It's actually quite remarkable... a 16 year old or an 80 year old can operate this complicated piece of machinery without really knowing much about how it works inside!
But there are leaks. The transmission is a small leak. In an automatic transmission you can feel the car lose power for a moment as it switches gears, whereas in CVT you feel smooth torque all the way up.
There are bigger leaks, too. If you rev the engine too fast, you may do damage to it. If the engine block is too cold, the car may not start or it may have poor performance. And if you crank the radio, headlights, and AC all at the same time, you'll see your gas mileage go down.
它只是意味着你的抽象公开了一些实现细节,或者说你在使用抽象时需要了解实现细节。该术语源自 Joel Spolsky,约 2002 年。请参阅维基百科 文章 了解更多信息。
一个典型的例子是网络库,它允许您将远程文件视为本地文件。使用此抽象的开发人员必须意识到网络问题可能会导致其失败,而本地文件则不会。然后,您需要开发代码来处理网络库提供的抽象之外的特定错误。
It simply means that your abstraction exposes some of the implementation details, or that you need to be aware of the implementation details when using the abstraction. The term is attributed to Joel Spolsky, circa 2002. See the wikipedia article for more information.
A classic example are network libraries that allow you to treat remote files as local. The developer using this abstraction must be aware that network problems may cause this to fail in ways that local files do not. You then need to develop code to handle specifically errors outside the abstraction that the network library provides.
维基百科对此有一个相当好的定义
或者换句话说,对于软件来说,当您可以通过程序中的限制或副作用观察功能的实现细节时。
一个简单的例子是 C# / VB.Net 闭包及其无法捕获 ref / out 参数。无法捕获它们的原因是提升过程如何发生的实现细节。但这并不是说有更好的方法可以做到这一点。
Wikipedia has a pretty good definition for this
Or in other words for software it's when you can observe implementation details of a feature via limitations or side effects in the program.
A quick example would be C# / VB.Net closures and their inability to capture ref / out parameters. The reason they cannot be captured is due to an implementation detail of how the lifting process occurs. This is not to say though that there is a better way of doing this.
下面是 .NET 开发人员熟悉的一个示例:ASP.NET 的
Page
类尝试隐藏 HTTP 操作的细节,特别是表单数据的管理,以便开发人员不必处理发布的值(因为它会自动将表单值映射到服务器控件)。但是,如果您超出了最基本的使用场景,
Page
抽象就会开始泄漏,并且除非您了解类的实现细节,否则很难使用页面。一个常见的示例是向页面动态添加控件 - 动态添加的控件的值不会为您映射,除非您在恰好的时间添加它们:在底层引擎映射传入表单之前值到适当的控制。当您必须了解这一点时,抽象就已经泄露了。
Here's an example familiar to .NET developers: ASP.NET's
Page
class attempts to hide the details of HTTP operations, particularly the management of form data, so that developers don't have to deal with posted values (because it automatically maps form values to server controls).But if you wander beyond the most basic usage scenarios the
Page
abstraction begins to leak and it becomes hard to work with pages unless you understand the class' implementation details.One common example is dynamically adding controls to a page - the value of dynamically-added controls won't be mapped for you unless you add them at just the right time: before the underlying engine maps the incoming form values to the appropriate controls. When you have to learn that, the abstraction has leaked.
嗯,在某种程度上,这纯粹是理论上的事情,尽管并非不重要。
我们使用抽象来使事情更容易理解。我可能会对某种语言的字符串类进行操作,以隐藏我正在处理作为单独项目的有序字符集的事实。我处理一组有序的字符,以隐藏我正在处理数字的事实。我处理数字是为了隐藏我正在处理 1 和 0 的事实。
泄漏抽象是指没有隐藏其本应隐藏的细节的抽象。如果在 Java 或 .NET 中调用 string.Length 5 个字符的字符串,我可以获得 5 到 10 之间的任何答案,因为实现细节中这些语言调用的字符实际上是 UTF-16 数据点,可以表示 1 或.5 个字符。抽象已经泄露。不过,不泄漏意味着查找长度要么需要更多的存储空间(用于存储实际长度),要么从 O(1) 变为 O(n)(以计算出实际长度)。如果我关心真正的答案(通常你并不关心),你需要努力了解到底发生了什么。
更多有争议的情况发生在诸如方法或属性允许您了解内部工作原理的情况下,无论它们是抽象泄漏,还是转移到较低抽象级别的明确定义的方法,有时可能是人们不同意的问题。
Well, in a way it is a purely theoretical thing, though not unimportant.
We use abstractions to make things easier to comprehend. I may operate on a string class in some language to hide the fact that I'm dealing with an ordered set of characters that are individual items. I deal with an ordered set of characters to hide the fact that I'm dealing with numbers. I deal with numbers to hide the fact that I'm dealing with 1s and 0s.
A leaky abstraction is one that doesn't hide the details its meant to hide. If call string.Length on a 5-character string in Java or .NET I could get any answer from 5 to 10, because of implementation details where what those languages call characters are really UTF-16 data-points which can represent either 1 or .5 of a character. The abstraction has leaked. Not leaking it though means that finding the length would either require more storage space (to store the real length) or change from being O(1) to O(n) (to work out what the real length is). If I care about the real answer (often you don't really) you need to work on the knowledge of what is really going on.
More debatable cases happen with cases like where a method or property lets you get in at the inner workings, whether they are abstraction leaks, or well-defined ways to move to a lower level of abstraction, can sometimes be a matter people disagree on.
我将继续使用 RPC 来举例。
在 RPC 的理想世界中,远程过程调用应该看起来像本地过程调用(或者故事是这样的)。它对程序员来说应该是完全透明的,这样当他们调用
SomeObject.someFunction()
时,他们不知道SomeObject
(或者只是someFunction
)那个问题)是本地存储和执行或远程存储和执行。从理论上讲,这使得编程变得更简单。现实情况有所不同,因为进行本地函数调用(即使您使用的是世界上最慢的解释语言)和:
仅在时间内大约三个数量级(或更多!)的差异。这三个以上的数量级将在性能上产生巨大的差异,当您第一次错误地将 RPC 视为真正的函数调用时,这将使过程调用的抽象泄漏变得相当明显。此外,除非代码中存在严重问题,否则真正的函数调用除了实现错误之外几乎不会有任何故障点。 RPC 调用具有以下所有可能的问题,这些问题将作为失败案例而出现,超出了您对常规本地调用的预期:
因此现在您的 RPC 调用“就像本地函数调用一样” “在执行本地函数调用时,您不必应对大量额外的故障情况。抽象概念再次泄露,而且更加严重。
最后,RPC 是一个糟糕的抽象,因为它在每个级别都像筛子一样泄漏——无论成功还是失败。
I'll continue in the vein of giving examples by using RPC.
In the ideal world of RPC, a remote procedure call should look like a local procedure call (or so the story goes). It should be completely transparent to the programmer such that when they call
SomeObject.someFunction()
they have no idea ifSomeObject
(or justsomeFunction
for that matter) are locally stored and executed or remotely stored and executed. The theory goes that this makes programming simpler.The reality is different because there's a HUGE difference between making a local function call (even if you're using the world's slowest interpreted language) and:
In time alone that's about three orders (or more!) of magnitude difference. Those three+ orders of magnitude are going to make a huge difference in performance that will make your abstraction of a procedure call leak rather obviously the first time you mistakenly treat an RPC as a real function call. Further a real function call, barring serious problems in your code, will have very few failure points outside of implementation bugs. An RPC call has all of the following possible problems that will get slathered on as failure cases over and above what you'd expect from a regular local call:
So now your RPC call which is "just like a local function call" has a whole buttload of extra failure conditions you don't have to contend with when doing local function calls. The abstraction has leaked again, even harder.
In the end RPC is a bad abstraction because it leaks like a sieve at every level -- when successful and when failing both.
什么是抽象?
示例:驾驶 737/747 是“抽象”的 飞机
……这是在一个理想的世界中。事实上,驾驶飞机要复杂得多。因为许多细节并没有被“抽象掉”。
737 示例中的抽象漏洞
现实中,飞行员必须担心很多事情:风速、推力、迎角、燃料、高度、天气问题、下降角。计算机可以帮助飞行员完成这些任务,但并不是所有事情都是自动化/简化的……并不是所有事情都是“抽象出来的”。
例如,如果飞行员在立柱上拉得太用力 - 飞机会服从,但随后飞机可能会失速,这真的很糟糕。
换句话说,飞行员在不了解其他任何事情的情况下仅仅控制方向盘是不够的......不......飞行员必须了解潜在的风险和局限性在飞行员驾驶飞机之前……飞行员必须知道飞机是如何工作的,以及飞机是如何飞行的;飞行员必须知道实施细节......拉得太用力会导致失速,或者着陆太陡会毁坏飞机等等。
这些事情是没有被抽象掉。很多东西都是抽象的,但不是一切。抽象是“泄漏的”。
代码中的抽象泄漏
......这与您的代码中的情况相同。如果你不知道底层的实现细节,那么你就会遇到问题。
ORM 抽象了处理数据库查询时的很多麻烦,但如果您曾经做过类似的事情:
那么您会意识到这是杀死您的应用程序的好方法。您需要知道,对 2500 万用户调用
User.all
会增加您的内存使用量,并会导致问题。您需要了解一些基本细节。抽象是有漏洞的。参考文献:
What is abstraction?
Example: Flying a 737/747 is "abstracted" away
....that is in an ideal world. In reality, flying a plane is much more complicated. Because many details ARE NOT "abstracted away".
Leaky Abstractions in 737 Example
Pilots in reality have to worry about a LOT of things: wind speed, thrust, angles of attack, fuel, altitude, weather problems, angles of descent. Computers can help the pilot in these tasks, but not everything is automated / simplified......not everything is "abstracted away".
e.g. If the pilot pulls up too hard on the column - the plane will obey, but then the plane might stall, and that's really bad.
In other words, it is not enough for the pilot to simply control the steering wheel without knowing anything else.........nooooo.......the pilot must know about the underlying risks and limitations of the plane before the pilot flies one.......the pilot must know how the plane works, and how the plane flies; the pilot must know implementation details..... that pulling up too hard will lead to a stall, or that landing too steeply will destroy the plane etc.
Those things are not abstracted away. A lot of things are abstracted, but not everything. The abstraction is "leaky".
Leaky Abstractions in Code
......it's the same thing in your code. If you don't know the underlying implementation details, then you're gonna have problems.
ORMs abstract a lot of the hassle in dealing with database queries, but if you've ever done something like:
Then you will realise that's a nice way to kill your app. You need to know that calling
User.all
with 25 million users is going to spike your memory usage, and is going to cause problems. You need to know some underlying details. The abstraction is leaky.References:
django ORM 多对多示例中的示例:
示例 API 中的通知在将 Publication 对象添加到多对多属性之前,您需要 .save() 基本 Article 对象 a1 的用法。请注意,更新多对多属性会立即保存到底层数据库,而更新单个属性直到调用 .save() 才会反映在数据库中。
抽象是我们正在使用对象图,其中单值属性和多值属性只是属性。但是,作为关系数据库支持的数据存储的实现存在泄漏……因为 RDBS 的完整性系统是通过对象接口的薄薄的外表出现的。
An example in the django ORM many-to-many example:
Notice in the Sample API Usage that you need to .save() the base Article object a1 before you can add Publication objects to the many-to-many attribute. And notice that updating the many-to-many attribute saves to the underlying database immediately, whereas updating a singular attribute is not reflected in the db until the .save() is called.
The abstraction is that we are working with an object graph, where single-value attributes and mult-value attributes are just attributes. But the implementation as a relational database backed data store leaks... as the integrity system of the RDBS appears through the thin veneer of an object interface.
事实上,在某个时候,根据您的规模和执行情况,您将需要熟悉抽象框架的实现细节,以便理解它为什么会这样做。
例如,考虑这个
SQL
查询:及其替代方案:
现在,它们看起来确实是逻辑上等效的解决方案,但由于各个列名称规范,第一个解决方案的性能更好。
这是一个微不足道的例子,但最终还是回到了 Joel Spolsky 的名言:
在某些时候,当您的运营达到一定规模时,您将希望优化数据库 (SQL) 的工作方式。为此,您需要了解关系数据库的工作方式。它一开始就被抽象给你了,但它是有漏洞的。你需要在某个时候学习它。
The fact that at some point, which will guided by your scale and execution, you will be needed to get familiar with the implementation details of your abstraction framework in order to understand why it behave that way it behave.
For example, consider this
SQL
query:And its alternative:
Now, they do look like a logically equivalent solutions, but the performance of the first one is better due the individual column names specification.
It's a trivial example but eventually it comes back to Joel Spolsky quote:
At some point, when you will reach a certain scale in your operation, you will want to optimize the way your DB (SQL) works. To do it, you will need to know the way relational databases works. It was abstracted to you in the beginning, but it's leaky. You need to learn it at some point.
假设我们在库中有以下代码:
当消费者调用 API 时,他们会得到一个 Object[]。消费者必须了解对象数组的第一个字段具有颜色值,第二个字段是模型值。这里,抽象已从库泄漏到消费者代码。
解决方案之一是返回一个封装了设备型号和颜色的对象。消费者可以调用该对象来获取模型和颜色值。
Assume, we have the following code in a library:
When the consumer calls the API, they get an Object[]. The consumer has to understand that the first field of the object array has color value and second field is the model value. Here the abstraction has leaked from library to the consumer code.
One of the solutions is to return an object which encapsulates Model and Color of the Device. The consumer can call that object to get the model and color value.
泄漏抽象都是关于封装状态的。泄漏抽象的非常简单的示例:
以及正确的方法(不是泄漏抽象):
更多描述 这里。
Leaky abstraction is all about encapsulating state. very simple example of leaky abstraction:
and the right way(not leaky abstraction):
more description here.