预计算 SQL 属性指南

发布于 2024-08-20 14:43:43 字数 483 浏览 3 评论 0原文

我经常处理聚合实体或父实体,这些实体具有从其组成成员或子成员派生的属性。例如:

  • TcpConnection 对象的

    byte_count 和 packet_count 计算公式为 它的两个组成 TcpStream 对象的属性相同,而这两个对象又是 根据其组成的 TcpPacket 对象计算。

  • 一个Invoices对象可能有一个total,它基本上是它的SUM() InvoiceLineItems 的组成部分价格,包含少量运费、折扣和税费 抛出的逻辑。

这些派生属性的按需计算(无论是在视图中还是在报告或 Web 界面等表示逻辑中更常见)是通常慢得令人难以接受。

在性能问题迫使您采取行动之前,您如何决定是否将派生属性“提升”到预先计算的字段?

Often I deal with aggregate or parent entities which have attributes derived from their constituent or children members. For example:

  • The byte_count and packet_count of a TcpConnection object is computed from
    the same attributes of its two constituent TcpStream objects, which in turn are
    computed from their constituent TcpPacket objects.

  • An Invoices object might have a total which is basically the SUM() of its
    constituent InvoiceLineItems' prices, with a little freight, discount and tax
    logic thrown in.

When dealing with millions of packets or millions of invoiced line items (I wish!), on-demand computation of these derived attributes -- either in a VIEW or more commonly in presentation logic like reports or web interfaces -- is often unacceptably slow.

How do you decide, before performance concerns force your hand, whether to "promote" derived attributes to precomputed fields?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

离线来电— 2024-08-27 14:43:43

我个人不会非规范化,除非性能权衡迫使我这么做(因为非规范化的缺点太严重了,恕我直言),但您也可能会考虑:

  1. 方便:例如,如果两个不同的客户端应用程序想要计算相同的派生属性,它们都必须编写查询来计算它们。非规范化以更简单的方式为两个客户端应用程序提供派生属性。
  2. 随着时间的推移保持稳定性:例如,如果计算派生属性的公式是可变的,非规范化允许您捕获并存储某个时间点的派生值,以便将来的计算永远不会出错
  3. 更简单查询:增加数据库结构的复杂性可能意味着您的 Select 查询在客户端更简单。
  4. 性能:对非规范化数据的选择查询可以更快。

参考:数据库程序员:非规范化的争论 。请务必阅读他关于保持非规范化值正确的文章 - 他的建议是使用触发器。这让我们明白了非规范化所需要的那种权衡。

I personally wouldn't denormalize until performance trade-offs force my hand (because the downside of denormalizations are too drastic IMHO), but you might also consider:

  1. Convenience: e.g. if two different client apps want to calculate the same derived attributes, they both have to code up the queries to calculate them. Denormalization offers both client apps the derived attribute in a simpler way.
  2. Stability over time: e.g. if the formula for calculating a derived attribute is changeable, denormalization allows you to capture and store the derived value at a point in time so future calculations will never get it wrong
  3. Simpler queries: adding complexity to the DB structure can mean your Select query is simpler at the client end.
  4. Performance: Select queries on denormalized data can be quicker.

Ref: The Database Programmer: The Argument for Denormalization. Be sure to read as well his article on Keeping Denormalized Values Correct - his recommendation is to use triggers. That brings home the kind of trade-off denormalization requires.

誰認得朕 2024-08-27 14:43:43

基本上,你不知道。你留下的性能担忧迫使你出手。

这是最好的答案,因为 99% 的情况下,您不应该像这样进行预优化,最好即时计算。

然而,客户端应用程序开发人员很常见地带着错误的先入之见来到服务器端,例如“按需计算......派生属性... - 经常慢得令人无法接受”,但这不是真的。这里正确的措辞是“很少慢得令人无法接受”。

因此,除非您是这方面的专家(数据库开发架构师等),否则您不应该进行过早的优化。等到它明显需要修复,然后看看预聚合。

Basically, you don't. You left performance concerns force your hand.

That's the best answer because 99% of the time, you should not be pre-optimizing like this, it's better to just calc it on the fly.

However, it is quite common for client-application developers to come to the server-side with mistaken preconceptions like "on-demand computation of ...derived attributes... -- is often unacceptably slow", and this just IS NOT true. The correct wording here would be "is rarely unacceptably slow".

As such, unless you are an expert in this (a DB Development Architect, etc.), you should not be engaging in premature optimization. Wait until it's obvious that is has to be fixed, then look at pre-aggregation.

转角预定愛 2024-08-27 14:43:43

实际上,数据的最新程度决定了您如何实施它。

我将假设两种简单的状态:当前或非当前。

  • 当前:索引视图、触发器、用于维护聚合表的存储过程等
  • 非当前:报告服务快照、日志传送/复制、数据仓库等

也就是说,我将针对与产品中相同数量的数据进行开发,因此我有一些对响应时间的信心。您很少会对代码性能感到惊讶......

How current the data must be determines how you implement it, really.

I'll assume 2 simple states: current or not current.

  • Current: indexed views, triggers, stored procs to maintain aggregate tables etc
  • Not current: Reporting Service snapshots, log shipping/replication, data warehouse etc

That said, I would develop against the same quantity of data as I have in prod so I have some confidence in response time. You should rarely be surprised by your code performance...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文