高效的数据库操作

发布于 2024-10-14 06:11:54 字数 283 浏览 1 评论 0原文

这是我在工作中研究解决方案的场景。我们在 postgres 中有一个表,用于存储网络上发生的事件。目前它的工作方式是,当网络事件发生时插入行,同时删除与特定时间戳匹配的旧记录,以便将表大小限制在大约 10,000 条记录。基本上,与日志旋转类似的想法。网络事件一次会突发数千个,因此事务率太高,导致性能下降,有时服务器崩溃或变得非常慢,最重要的是,客户要求将表大小保持在百万条记录这将加速性能下降(因为我们必须不断删除与特定时间戳匹配的记录)并导致空间管理问题。我们使用简单的 JDBC 来读/写表。技术社区能否提出更好的方法来处理此表中的插入和删除?

Here is the scenario I am researching a solution for at work. We have a table in postgres which stores events happening on network. Currently the way it works is, rows get inserted as network events come and at the same time older records which match the specific timestamp get deleted in order to keep table size limited to some 10,000 records. Basically, similar idea as log rotation. Network events come in burst of thousands at a time, hence rate of transaction is too high which causes performance degradation, after sometime either server just crashes or becomes very slow, on top of that, customer is asking to keep table size up to million records which is going to accelerate performance degradation (since we have to keep deleting record matching specific timestamp) and cause space management issue. We are using simple JDBC to read/write on table. Can tech community out there suggest better performing way to handle inserts and deletes in this table?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蘑菇王子 2024-10-21 06:11:54

我想我会使用分区表,也许是所需总大小的 10 倍,插入最新的分区,并删除最旧的分区。

http://www.postgresql.org/docs/9.0/static/ddl -partitioning.html

这使得“删除最旧的”负载比查询和删除小得多。

更新:我同意 nos 的评论,但插入/删除可能不是你的瓶颈。也许先进行一些调查。

I think I would use partitioned tables, perhaps 10 x total desired size, inserting into the newest, and dropping the oldest partition.

http://www.postgresql.org/docs/9.0/static/ddl-partitioning.html

This makes load on "dropping oldest" much smaller than query and delete.

Update: I agree with nos' comment though, the inserts/deletes may not be your bottleneck. Maybe some investigation first.

醉梦枕江山 2024-10-21 06:11:54

您可以尝试一些事情 -

  • 写入日志,有一个单独的批处理过程。写入表中。
  • 保持写入原样,定期或在流量较低时进行删除。
  • 对缓冲区/缓存进行写入,让实际的数据库写入从缓冲区发生。

一些一般性建议 -

  • 由于您要根据时间戳进行删除,因此请确保时间戳已建立索引。您还可以使用计数器/自动递增 rowId 来执行此操作(例如,删除 id< currentId -1000000 的位置)。
  • 此外,JDBC 批量写入比单行写入快得多(数量级的加速,很容易)。如果您可以缓冲写入,一次批量写入 100 行将有很大帮助。

Some things you could try -

  • Write to a log, have a separate batch proc. write to the table.
  • Keep the writes as they are, do the deletes periodically or at times of lower traffic.
  • Do the writes to a buffer/cache, have the actual db writes happen from the buffer.

A few general suggestions -

  • Since you're deleting based on timestamp, make sure the timestamp is indexed. You could also do this with a counter / auto-incremented rowId (e.g. delete where id< currentId -1000000).
  • Also, JDBC batch write is much faster than individual row writes (order of magnitude speedup, easily). Batch writing 100 rows at a time will help tremendously, if you can buffer the writes.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文