Postgresql 应用程序插入和触发器性能

发布于 2024-09-19 12:45:19 字数 770 浏览 14 评论 0原文

我正在设计一个带有 SQL 后端 (Postgresql) 的应用程序，并且有一些设计问题。简而言之，数据库将用于存储动态发生的网络事件，因此插入速度和性能至关重要，因为“实时”操作取决于这些事件。数据被转储为跨几个表的快速默认格式，我目前正在使用 postgresql 触发器将此数据放入用于报告的其他一些表中。

在典型事件中，数据被插入到两个不同的表中，每个表共享相同的主键（事件 ID）。然后，我需要将数据移动并重新排列到基于 Web 的报告界面使用的一些不同表格中。我的主要目标/关注点是减轻初始插入表的负载，以便它们可以完成自己的工作。报告是次要的，但通过触发器即时发生仍然很好，而不是在 cron 作业中我必须查询和管理已处理的事件。报告应该/永远不会触及初始插入表。性能方面......这有意义还是我完全关闭了？

一旦数据位于适当的报告表中，我就不需要将插入表中的数据保留太久，因此我将定期修剪这些数据以提高插入性能。在考虑这种情况时（我确信这种情况是半常见的），我提出了三个选项：

使用触发器在初始行插入时触发并填充报告表。这是我最初的计划。
使用触发器将插入数据复制到临时表（相同格式），然后触发器或 cron 填充报告表。这只是一个想法，但我认为对临时表的简单复制操作将减轻上述解决方案中触发器的任何查询。
修改我的初始输出程序，将所有数据转储到单个表（而不是跨两个表），然后触发该插入以填充报告表。因此，如果解决方案 1 是多表到多表触发情况，则这将是单表源到多表触发。

我是不是想多了？我想把这件事做好。任何意见都非常感谢！

原文

I'm working on designing an application with a SQL backend (Postgresql) and I've got a some design questions. In short, the DB will serve to store network events as they occur on the fly, so insertion speed and performance is critical due 'real-time' actions depending on these events. The data is dumped into a speedy default format across a few tables, and I am currently using postgresql triggers to put this data into some other tables used for reporting.

On a typical event, data is inserted into two different tables each share the same primary key (an event ID). I then need to move and rearrange the data into some different tables that are used by a web-based reporting interface. My primary goal/concern is to keep the load off the initial insertion tables, so they can do their thing. Reporting is secondary, but it would still be nice for this to occur on the fly via triggers as opposed to a cron job where I have to query and manage events that have already been processed. Reporting should/will never touch the initial insertion tables. Performance wise..does this make sense or am I totally off?

Once the data is in the appropriate reporting tables, I won't need to hang on to the data in the insertion tables too long, so I'll keep those regularly pruned for insertion performance. In thinking about this scenario, which I'm sure is semi-common, I've come up with three options:

Use triggers to trigger on the initial row insert and populate the reporting tables. This was my original plan.
Use triggers to copy the insertion data to a temporary table (same format), and then trigger or cron to populate the reporting tables. This was just a thought, but I figure that a simple copy operation to a temp table will offload any of the query-ing of the triggers in the solution above.
Modify my initial output program to dump all the data to a single table (vs across two) and then trigger on that insert to populate the reporting tables. So where solution 1 is a multi-table to multi-table trigger situation, this would be a single-table source to multi-table trigger.

Am I over thinking this? I want to get this right. Any input is much appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

很快妥协 2024-09-26 12:45:19

您可能会遇到性能略有提高，因为有更多“事情”要做（尽管它们不应该以任何方式影响操作）。但是使用触发器/其他 PL 是将其减少到最低限度的好方法，因为它们的执行速度比从应用程序发送到数据库服务器的代码更快。

我会同意你的第一个想法 1) 因为在我看来这是最干净、最有效的方法。
2) 是对性能要求最高的解决方案，因为 cron 会比使用服务器端功能的其他解决方案执行更多查询。 3) 是可能的，但会导致“更丑陋”的数据库布局。

回复收藏 0 原文

雨的味道风的声音 2024-09-26 12:45:19

这是一个旧问题，但在这里添加我的答案。

报告是次要的，但通过触发器动态发生仍然很好，而不是我必须查询和管理已处理事件的 cron 作业。报告应该/永远不会触及初始插入表。性能方面..这有意义还是我完全关闭了？

恐怕这可能还很遥远，但在某些情况下可能并非如此。这取决于缓存对报告的影响。请记住，磁盘 I/O 和内存是您的商品，写入者和读取者很少在 PostgreSQL 上互相阻塞（除非他们显式提升锁——例如 SELECT ... FOR UPDATE 将阻塞写入者）。基本上，如果您的表适合 RAM，那么您最好从它们中进行报告，因为您可以为事件条目的 WAL 段提交保持磁盘 I/O 空闲。如果它们不适合 RAM，那么您可能会遇到由报告引起的缓存未命中问题。在这里具体化您的视图（即制作触发器维护的表）可能会减少这些，但它们会产生显着的复杂性成本。顺便说一句，如果您选择 1，那么我会暂时将其记为过早优化。另请记住，您可能会在以这种方式实现视图时引发缓存未命中和锁定争用，因此您可能会以这种方式引发有关插入的性能问题。

请记住，如果您可以从 RAM 进行操作（WAL 提交除外），那么您将不会遇到性能问题。

对于#2。如果您将临时表视为 CREATE TEMPORARY TABLE，那么就会造成混乱，包括性能问题和报告不显示您希望它们显示的内容。不要这样做。如果这样做，您可能会：

强制 PostgreSQL 在每次插入时重新计划触发器（或每个会话至少一次）。哎哟。
添加创建/删除表的开销
OID 环绕的可能性

等......

总之我认为你想得太多了。通过增加 Pg 盒上的 RAM 并确保有足够的内核来处理适当数量的插入会话和报告会话，您可以走得更远。如果您正确规划硬件，那么这一切都不会成为问题。

This is an old one but adding my answer here.

Reporting is secondary, but it would still be nice for this to occur on the fly via triggers as opposed to a cron job where I have to query and manage events that have already been processed. Reporting should/will never touch the initial insertion tables. Performance wise..does this make sense or am I totally off?

That may be way off, I'm afraid, but under a few cases it may not be. It depends on the effects of caching on the reports. Keep in mind that disk I/O and memory are your commodities, and that writers and readers rarely block eachother on PostgreSQL (unless they explicitly elevate locks--- a SELECT ... FOR UPDATE will block writers for example). Basically if your tables fit comfortably in RAM you are better off reporting from them since you are keeping disk I/O free for the WAL segment commit of your event entry. If they don't fit in RAM then you may have cache miss issues induced by reporting. Here materializing your views (i.e. making trigger-maintained tables) may cut down on these but they have a significant complexity cost. This, btw, if your option 1. So I would chalk this one up provisionally as premature optimization. Also keep in mind you may induce cache misses and lock contention on materializing the views this way, so you might induce performance problems regarding inserts this way.

Keep in mind if you can operate from RAM with the exception of WAL commits, you will have no performance problems.

For #2. If you mean temporary tables as CREATE TEMPORARY TABLE, that's asking for a mess including performance issues and reports not showing what you want them to show. Don't do it. If you do this, you might:

Force PostgreSQL to replan your trigger on every insert (or at least once per session). Ouch.
Add overhead creating/dropping tables
Possibilities of OID wraparound

etc.....

In short I think you are overthinking it. You can get very far by bumping RAM up on your Pg box and making sure you have enough cores to handle the appropriate number of inserting sessions plus the reporting one. If you plan your hardware right, none of this should be a problem.

回复收藏 0 原文

~没有更多了~