预计算 SQL 属性指南
我经常处理聚合实体或父实体,这些实体具有从其组成成员或子成员派生的属性。例如:
TcpConnection
对象的byte_count 和
packet_count
计算公式为 它的两个组成TcpStream
对象的属性相同,而这两个对象又是 根据其组成的TcpPacket
对象计算。一个
Invoices
对象可能有一个total
,它基本上是它的SUM()InvoiceLineItems
的组成部分价格,包含少量运费、折扣和税费 抛出的逻辑。
这些派生属性的按需计算(无论是在视图中还是在报告或 Web 界面等表示逻辑中更常见)是通常慢得令人难以接受。
在性能问题迫使您采取行动之前,您如何决定是否将派生属性“提升”到预先计算的字段?
Often I deal with aggregate or parent entities which have attributes derived from their constituent or children members. For example:
The
byte_count
andpacket_count
of aTcpConnection
object is computed from
the same attributes of its two constituentTcpStream
objects, which in turn are
computed from their constituentTcpPacket
objects.An
Invoices
object might have atotal
which is basically the SUM() of its
constituentInvoiceLineItems
' prices, with a little freight, discount and tax
logic thrown in.
When dealing with millions of packets or millions of invoiced line items (I wish!), on-demand computation of these derived attributes -- either in a VIEW or more commonly in presentation logic like reports or web interfaces -- is often unacceptably slow.
How do you decide, before performance concerns force your hand, whether to "promote" derived attributes to precomputed fields?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我个人不会非规范化,除非性能权衡迫使我这么做(因为非规范化的缺点太严重了,恕我直言),但您也可能会考虑:
参考:数据库程序员:非规范化的争论 。请务必阅读他关于保持非规范化值正确的文章 - 他的建议是使用触发器。这让我们明白了非规范化所需要的那种权衡。
I personally wouldn't denormalize until performance trade-offs force my hand (because the downside of denormalizations are too drastic IMHO), but you might also consider:
Ref: The Database Programmer: The Argument for Denormalization. Be sure to read as well his article on Keeping Denormalized Values Correct - his recommendation is to use triggers. That brings home the kind of trade-off denormalization requires.
基本上,你不知道。你留下的性能担忧迫使你出手。
这是最好的答案,因为 99% 的情况下,您不应该像这样进行预优化,最好即时计算。
然而,客户端应用程序开发人员很常见地带着错误的先入之见来到服务器端,例如“按需计算......派生属性... - 经常慢得令人无法接受”,但这不是真的。这里正确的措辞是“很少慢得令人无法接受”。
因此,除非您是这方面的专家(数据库开发架构师等),否则您不应该进行过早的优化。等到它明显需要修复,然后看看预聚合。
Basically, you don't. You left performance concerns force your hand.
That's the best answer because 99% of the time, you should not be pre-optimizing like this, it's better to just calc it on the fly.
However, it is quite common for client-application developers to come to the server-side with mistaken preconceptions like "on-demand computation of ...derived attributes... -- is often unacceptably slow", and this just IS NOT true. The correct wording here would be "is rarely unacceptably slow".
As such, unless you are an expert in this (a DB Development Architect, etc.), you should not be engaging in premature optimization. Wait until it's obvious that is has to be fixed, then look at pre-aggregation.
实际上,数据的最新程度决定了您如何实施它。
我将假设两种简单的状态:当前或非当前。
也就是说,我将针对与产品中相同数量的数据进行开发,因此我有一些对响应时间的信心。您很少会对代码性能感到惊讶......
How current the data must be determines how you implement it, really.
I'll assume 2 simple states: current or not current.
That said, I would develop against the same quantity of data as I have in prod so I have some confidence in response time. You should rarely be surprised by your code performance...