数据库架构设计 - 提高归档能力的技巧?

发布于 2024-07-13 00:42:00 字数 558 浏览 7 评论 0原文

我正在数据库中设计一个表,它将存储来自应用程序的日志条目。 有一些事情让我比平时更多地思考这个设计。

  • 然而,系统将在运行时使用这些日志条目来做出决策,因此需要相对快速地访问它们。
  • 他们还面临的问题是,它们的数量将会很多(我估计每月增加 1250 万)。
  • 我最多不需要超过过去 30 到 45 天的时间来进行决策处理。
  • 我需要将所有这些文件保留超过 45 天才能获得支持和帮助。 法律问题,可能至少2年。
  • 表设计相当简单,都是简单类型(没有 blob 或任何东西),在可能的情况下将使用数据库引擎放入默认数据,最多一个外键。
  • 如果有什么区别的话,数据库将是 Microsoft SQL Server 2005。

我的想法是将它们写入实时表/数据库,然后使用 ETL 解决方案将“旧”条目移动到存档表/数据库 - 这是很大且在较慢的硬件上。

我的问题是,您是否知道有关数据库/表设计的任何提示、技巧或建议,以确保其尽可能正常工作? 另外,如果您认为这是一个坏主意,请告诉我,以及您认为更好的主意是什么。

I am designing a table in the database which will store log entries from the application. There are a few things which is making me think about this design more than usual.

  • However these log entries will be used at runtime by the system to make decisions so they need to be relatively fast to access.
  • They also have the problem is that there is going to be lots of them (12.5 million added per month is my estimate).
  • I don't need more than the last 30 to 45 days at most for the decision processing.
  • I need to keep all of them for much longer than 45 days for support & legal issues, likely atleast 2 years.
  • The table design is fairly simple, all simple types (no blobs or anything), where possible will use the database engine to put in the default data, at most one foreign key.
  • If it makes any difference the database will be Microsoft SQL Server 2005.

What I was thinking is having them written to a live table/database and then using an ETL solution move "old" entries to an archive table/database - which is big and on slower hardware.

My question is do you know of any tips, tricks or suggestions for the database/table design to make sure this works as well as possible? Also if you think it's a bad idea please let me know, and what you think a better idea would be.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

盗琴音 2024-07-20 00:42:00

一些数据库提供“分区”(例如 Oracle)。 分区就像一个视图,它将具有相同定义的多个表收集到一个中。 您可以定义将新数据排序到不同表中的条件(例如,月份或年份% 6)。

从用户的角度来看,这只是一张表。 从数据库 PoV 来看,它是几个独立的表,因此您可以高效地对它们运行完整的表命令(例如截断、删除、从表中删除(无条件)、加载/转储等)。

如果您无法进行分区,则使用视图可以获得类似的效果。 在这种情况下,您可以在单个视图中收集多个表并重新定义该视图,例如每月一次,以“释放”一个表,其中包含其他表中的旧数据。 现在,您可以有效地归档该表,清除它,并在完成大量工作后将其再次附加到视图。 这应该对提高性能有很大帮助。

[编辑] SQL Server 2005(企业版)开始支持分区。 感谢米奇·小麦

Some databases offer "partitions" (Oracle, for example). A partition is like a view which collects several tables with an identical definition into one. You can define criteria which sort new data into the different tables (for example, the month or week-of-year % 6).

From a user point of view, this is just one table. From the database PoV, it's several independent tables, so you can run full table commands (like truncate, drop, delete from table (without a condition), load/dump, etc.) against them in an efficient manner.

If you can't have a partition, you get a similar effect with views. In this case, you can collect several tables in a single view and redefine this view, say, once a month to "free" one table with old data from the rest. Now, you can efficiently archive this table, clear it and attach it again to the view when the big work has been done. This should help greatly to improve performance.

[EDIT] SQL server 2005 onwards (Enterprise Edition) supports partitions. Thanks to Mitch Wheat

若沐 2024-07-20 00:42:00

大表速度会很快变慢,并且使用 ETL 从大表中基于日期提取数据然后删除旧行会产生很大的性能开销。 答案是使用多个表 - 根据您的数据,可能每月 1 个表。 当然,您需要一些逻辑来在查询中生成表名称。

我同意使用触发器填充“CurrentMonthAudit”表,在月底,您可以将该表重命名为 MonthAuditYYYYMM。 使用 ETL 将旧表从主服务器上移走将变得很容易,并且每个表都将易于管理。 相信我,这比尝试管理大约 2.5 亿行的单个表要好得多。

Big tables slow down quickly, and it's a big performance overhead to use ETL to pull data based on date, from a big table and then delete the old rows. The answer to this is to use multiple tables - probably 1 table/month based on your figures. Of course you'll need some logic to generate the table names within your queries.

I agree with using Triggers to populate the 'CurrentMonthAudit' table, at the end of month, you can then rename that table to MonthAuditYYYYMM. Moving old tables off your main server using ETL will then be easy, and each of your tables will be manageable. Trust me this is much better than trying to manage a single table with approx 250M rows.

美人如玉 2024-07-20 00:42:00

您的第一个正确决定是让一切尽可能简单。

我很幸运地使用了您的简单只写事务日志文件模式,其中记录只是按时间顺序排列。 然后,您可以选择多种方式来切换旧数据。 只要您牢记简单性,即使每月有不同的表也可以通过查询进行管理。 如果您正在运行任何类型的复制,则可以推出复制的表并用作存档。 然后在每个月的第一天从一张新的空桌子开始。

通常,我对这样做的关系设计后果感到不寒而栗,但我发现只写时间顺序日志表是通常设计模式的例外,原因与您在这里处理的原因相同。

但要远离触发因素。 越远越好。 最简单的解决方案是您在这里讨论的类型的主表,具有简单、强大、现成的、经过时间验证的复制机制。

(顺便说一句 - 如果设计良好,大表不会很快放慢速度 - 它们会缓慢放慢速度。)

Your first good decision is keeping everything as simple as possible.

I've had good luck with your pattern of a simple write-only transaction log file where the records are just laid down in chronological order. Then you have several options for switching out aged data. Even having monthly disparate tables is manageable query-wise as long as you keep simplicity in mind. If you have any kind of replication in operation, your replicated tables can be rolled out and serve as the archive. Then start with a fresh empty table at the first of each month.

Normally I shudder at the relational design consequences of doing something like this, but I've found that write-only chronological log tables are an exception to the usual design patterns, for the reasons you are dealing with here.

But stay away from triggers. As far as possible. The simplest solution is a primary table of the type you're talking about here, with a simple robust off-the-shelf time-proven replication mechanism.

(BTW - Large tables don't slow down quickly if they are well designed - they slow down slowly.)

多像笑话 2024-07-20 00:42:00

如果您不需要搜索最近的日志记录,还有另一个选择:根本不使用数据库。 相反,将日志信息写入文件并每晚轮换文件名。 文件写入后,您可以启动后台作业将数据直接导入存档数据库。

数据库并不总是最好的选择,特别是对于日志文件:)

If you do not need to search the recent log records, there is another option: Don't use a database at all. Instead, write the log info to a file and rotate the filename every night. When a file has been written, you can then start a background job to import the data directly into the archive database.

Databases are not always the best option, especially for log files :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文