如何存储历史数据

发布于 2024-09-26 04:56:13 字数 1431 浏览 3 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

优雅的叶子 2024-10-03 04:56:14

直接在操作系统中支持历史数据将使您的应用程序比其他方式更加复杂。一般来说,我不建议这样做,除非您有在系统内操作记录的历史版本的硬性要求。

如果您仔细观察,就会发现对历史数据的大多数要求都属于以下两类之一:

  • 审核日志记录:最好使用审核表来完成。编写一个工具来生成脚本来通过从系统数据字典读取元数据来创建审计日志表和触发器是相当容易的。这种类型的工具可用于改进大多数系统的审计日志记录。如果您想实现数据仓库(见下文),您还可以使用此子系统来捕获更改的数据。

  • 历史报告:报告历史状态、“当前”位置或一段时间内的分析报告。通过查询上述类型的审计日志记录表可以满足简单的历史报告要求。如果您有更复杂的需求,那么为报告实现数据集市可能比尝试将历史记录直接集成到操作系统中更经济。

    缓慢变化的维度是迄今为止最简单的跟踪机制查询历史状态和大部分历史跟踪都可以自动化。通用处理程序并不难编写。一般来说,历史报告不必使用最新数据,因此批量刷新机制通常就可以。这使您的核心和报告系统架构相对简单。

如果您的要求属于这两类之一,那么您最好不要将历史数据存储在操作系统中。将历史功能分离到另一个子系统中可能会减少总体工作量,并生成能够更好地实现其预期目的的事务和审计/报告数据库。

Supporting historical data directly within an operational system will make your application much more complex than it would otherwise be. Generally, I would not recommend doing it unless you have a hard requirement to manipulate historical versions of a record within the system.

If you look closely, most requirements for historical data fall into one of two categories:

  • Audit logging: This is better off done with audit tables. It's fairly easy to write a tool that generates scripts to create audit log tables and triggers by reading metadata from the system data dictionary. This type of tool can be used to retrofit audit logging onto most systems. You can also use this subsystem for changed data capture if you want to implement a data warehouse (see below).

  • Historical reporting: Reporting on historical state, 'as-at' positions or analytical reporting over time. It may be possible to fulfill simple historical reporting requirements by querying audit logging tables of the sort described above. If you have more complex requirements then it may be more economical to implement a data mart for the reporting than to try and integrate history directly into the operational system.

    Slowly-changing dimensions are by far the simplest mechanism for tracking and querying historical state and much of the history tracking can be automated. Generic handlers aren't that hard to write. Generally, historical reporting does not have to use up-to-the-minute data, so a batched refresh mechanism is normally fine. This keeps your core and reporting system architecture relatively simple.

If your requirements fall into one of these two categories, you are probably better off not storing historical data in your operational system. Separating the historical functionality into another subsystem will probably be less effort overall and produce transactional and audit/reporting databases that work much better for their intended purpose.

栩栩如生 2024-10-03 04:56:14

我不认为有一个特定的标准方法可以做到这一点,但我想我会提出一种可能的方法。我在 Oracle 和我们内部的 Web 应用程序框架中工作,该框架利用 XML 来存储应用程序数据。

我们使用称为“主-详细信息”模型的东西,它最简单地由以下部分组成:

主表,例如称为小部件,通常只包含一个 ID。通常包含不会随时间变化/不是历史数据的数据。

详细信息/历史记录表例如称为Widget_Details,至少包含:

  • ID - 主键。详细信息/历史 ID
  • MASTER_ID - 例如,在本例中称为“WIDGET_ID”,这是主记录的 FK
  • START_DATETIME - 指示该数据库行开始的时间戳
  • END_DATETIME - 指示该数据库行结束的时间戳
  • STATUS_CONTROL - 指示单个字符列该行的状态。 “C”表示当前,NULL 或“A”表示历史/存档。我们之所以使用它,是因为我们无法在 END_DATETIME 为 NULL 时建立索引
  • CREATED_BY_WUA_ID - 存储导致创建该行的帐户的 ID
  • XMLDATA - 存储实际数据

因此,本质上,一个实体首先在主数据库中拥有 1 行,在主数据库中拥有 1 行行于细节。结束日期为 NULL 且 STATUS_CONTROL 为“C”的详细信息。
发生更新时,当前行将更新为当前时间的 END_DATETIME,并将 status_control 设置为 NULL(或“A”,如果首选)。在详细信息表中创建一个新行,仍然链接到同一个主表,其中 status_control 'C'、进行更新的人员的 ID 以及存储在 XMLDATA 列中的新数据。

这是我们历史模型的基础。创建/更新逻辑在 Oracle PL/SQL 包中处理,因此您只需向函数传递当前 ID、您的用户 ID 和新的 XML 数据,并在内部执行所有更新/插入行以表示历史模型中的行。开始和结束时间指示表中该行的活动时间。

存储很便宜,我们通常不会删除数据,而是更喜欢保留审计跟踪。这使我们能够在任何给定时间查看数据的样子。通过索引 status_control = 'C' 或使用视图,混乱并不是一个问题。显然,您的查询需要考虑到您应该始终使用记录的当前(NULL end_datetime 和 status_control = 'C')版本。

I don't think there is a particular standard way of doing it but I thought I would throw in a possible method. I work in Oracle and our in-house web application framework that utilizes XML for storing application data.

We use something called a Master - Detail model that at it's simplest consists of:

Master Table for example calledWidgets often just containing an ID. Will often contain data that won't change over time / isn't historical.

Detail / History Table for example called Widget_Details containing at least:

  • ID - primary key. Detail/historical ID
  • MASTER_ID - for example in this case called 'WIDGET_ID', this is the FK to the Master record
  • START_DATETIME - timestamp indicating the start of that database row
  • END_DATETIME - timestamp indicating the end of that database row
  • STATUS_CONTROL - single char column indicated status of the row. 'C' indicates current, NULL or 'A' would be historical/archived. We only use this because we can't index on END_DATETIME being NULL
  • CREATED_BY_WUA_ID - stores the ID of the account that caused the row to be created
  • XMLDATA - stores the actual data

So essentially, a entity starts by having 1 row in the master and 1 row in the detail. The detail having a NULL end date and STATUS_CONTROL of 'C'.
When an update occurs, the current row is updated to have END_DATETIME of the current time and status_control is set to NULL (or 'A' if preferred). A new row is created in the detail table, still linked to the same master, with status_control 'C', the id of the person making the update and the new data stored in the XMLDATA column.

This is the basis of our historical model. The Create / Update logic is handled in an Oracle PL/SQL package so you simply pass the function the current ID, your user ID and the new XML data and internally it does all the updating / inserting of rows to represent that in the historical model. The start and end times indicate when that row in the table is active for.

Storage is cheap, we don't generally DELETE data and prefer to keep an audit trail. This allows us to see what our data looked like at any given time. By indexing status_control = 'C' or using a View, cluttering isn't exactly a problem. Obviously your queries need to take into account you should always use the current (NULL end_datetime and status_control = 'C') version of a record.

憧憬巴黎街头的黎明 2024-10-03 04:56:14

我认为你的做法是正确的。历史表应该是没有索引的主表的副本,请确保您也更新了表中的时间戳。

如果您尽快尝试其他方法,您将面临问题:

  • 维护开销
  • 选择查询中的更多标志
  • 会减慢
  • 表、索引的增长

I think your approach is correct. The historical table should be a copy of the main table without indexes, make sure you have updated timestamps in the table as well.

If you try the other approach soon enough you will face problems:

  • maintenance overhead
  • more flags in selects
  • queries slowdown
  • growth of tables, indexes
祁梦 2024-10-03 04:56:14

SQL Server 2016及更高版本中,有一个名为时态表旨在以开发人员最少的努力解决这一挑战。时态表的概念与更改数据捕获 (CDC) 类似,不同之处在于时态表抽象了使用 CDC 时必须手动执行的大部分操作。

In SQL Server 2016 and above, there is a new feature called Temporal Tables that aims to solve this challenge with minimal effort from developer. The concept of temporal table is similar to Change Data Capture (CDC), with the difference that temporal table has abstracted most of the things that you had to do manually if you were using CDC.

誰認得朕 2024-10-03 04:56:14

只是想添加一个我开始使用的选项,因为我使用 Azure SQL 并且多表的事情对我来说太麻烦了。我在表上添加了一个插入/更新/删除触发器,然后使用“FOR JSON AUTO”功能将之前/之后的更改转换为 json。

 SET @beforeJson = (SELECT * FROM DELETED FOR JSON AUTO)
SET @afterJson = (SELECT * FROM INSERTED FOR JSON AUTO)

这将返回更改之前/之后记录的 JSON 表示形式。然后,我将这些值存储在历史表中,并带有更改发生时的时间戳(我还存储当前关注记录的 ID)。使用序列化过程,我可以控制在架构发生更改时数据的回填方式。

我从这个链接这里了解到了这一点

Just wanted to add an option that I started using because I use Azure SQL and the multiple table thing was way too cumbersome for me. I added an insert/update/delete trigger on my table and then converted the before/after change to json using the "FOR JSON AUTO" feature.

 SET @beforeJson = (SELECT * FROM DELETED FOR JSON AUTO)
SET @afterJson = (SELECT * FROM INSERTED FOR JSON AUTO)

That returns a JSON representation fo the record before/after the change. I then store those values in a history table with a timestamp of when the change occurred (I also store the ID for current record of concern). Using the serialization process, I can control how data is backfilled in the case of changes to schema.

I learned about this from this link here

可爱暴击 2024-10-03 04:56:14

我知道这篇旧文章,但只是想补充几点。
解决此类问题的标准是最适合具体情况的标准。了解此类存储的需求以及历史/审计/更改跟踪数据的潜在用途非常重要。

审核(安全目的):对所有可审核表使用通用表。定义结构来存储列名、前值和后值字段。

存档/历史:对于跟踪以前的地址、电话号码等情况。如果您的活动交易表架构将来不会发生重大变化(如果您的历史表必须具有相同的结构)。
如果您预计表规范化、数据类型更改、添加/删除列,请以 xml 格式存储历史数据。定义一个包含以下列(ID、日期、架构版本、XMLData)的表。这将很容易处理架构更改。但是您必须处理 xml,这可能会给数据检索带来一定程度的复杂性。

I Know this old post but Just wanted to add few points.
The standard for such problems is what works best for the situation. understanding the need for such storage, and potential use of the historical/audit/change tracking data is very importat.

Audit (security purpose) : Use a common table for all your auditable tables. define structure to store column name , before value and after value fields.

Archive/Historical: for cases like tracking previous address , phone number etc. creating a separate table FOO_HIST is better if you your active transaction table schema does not change significantly in the future(if your history table has to have the same structure).
if you anticipate table normalization , datatype change addition/removal of columns, store your historical data in xml format . define a table with the following columns (ID,Date, Schema Version, XMLData). this will easily handle schema changes . but you have to deal with xml and that could introduce a level of complication for data retrieval .

海之角 2024-10-03 04:56:14

您可以使用 MSSQL Server 审核功能。从 SQL Server 2012 版本开始,您将在所有版本中找到此功能:

http://technet .microsoft.com/en-us/library/cc280386.aspx

You can use MSSQL Server Auditing feature. From version SQL Server 2012 you will find this feature in all editions:

http://technet.microsoft.com/en-us/library/cc280386.aspx

↘人皮目录ツ 2024-10-03 04:56:14

您可以对表进行分区吗?

“使用 SQL Server 2008 的分区表和索引策略
当数据库表的大小增长到数百 GB 或更多时,加载新数据、删除旧数据和维护索引会变得更加困难。只是表的巨大大小导致此类操作需要更长的时间。即使必须加载或删除的数据也可能非常大,使得在表上执行 INSERT 和 DELETE 操作变得不切实际。 Microsoft SQL Server 2008 数据库软件提供表分区功能,使此类操作更易于管理。”

You could just partition the tables no?

"Partitioned Table and Index Strategies Using SQL Server 2008
When a database table grows in size to the hundreds of gigabytes or more, it can become more difficult to load new data, remove old data, and maintain indexes. Just the sheer size of the table causes such operations to take much longer. Even the data that must be loaded or removed can be very sizable, making INSERT and DELETE operations on the table impractical. The Microsoft SQL Server 2008 database software provides table partitioning to make such operations more manageable."

一曲琵琶半遮面シ 2024-10-03 04:56:14

真正的问题是您是否需要同时使用历史数据和活动数据来进行报告?如果是这样,请将它们保留在一张表中,分区并创建一个视图以供活动记录在活动查询中使用。如果您只需要偶尔查看它们(研究法律问题或类似问题),那么请将它们放在单独的表中。

The real question is do you need to use historical data and active data together for reporting? If so keep them in one table, partition and create a view for active records to use in active queries. If you only need to look at them occasionally (to research leagal issues or some such) then put them in a separate table.

扶醉桌前 2024-10-03 04:56:14

另一种选择是按[每日|每小时|任意]存档操作数据。大多数数据库引擎支持将数据提取到存档中

基本上,这个想法是创建一个计划的 Windows 或 CRON 作业,

  1. 确定操作数据库中的当前表,
  2. 将每个表中的所有数据选择到 CSV 或 XML 文件中,
  3. 将导出的数据压缩到 ZIP 文件,最好带有生成的时间戳在文件名中以便于归档。

许多 SQL 数据库引擎都附带了可用于此目的的工具。例如,在 Linux 上使用 MySQL 时,可以在 CRON 作业中使用以下命令来安排提取:

mysqldump --all-databases --xml --lock-tables=false -ppassword | gzip -c | cat > /media/bak/servername-$(date +%Y-%m-%d)-mysql.xml.gz

Another option is to archive the operational data on a [daily|hourly|whatever] basis. Most database engines support the extraction of the data into an archive.

Basically, the idea is to create a scheduled Windows or CRON job that

  1. determines the current tables in the operational database
  2. selects all data from every table into a CSV or XML file
  3. compresses the exported data to a ZIP file, preferably with the timestamp of the generation in the file name for easier archiving.

Many SQL database engines come with a tool that can be used for this purpose. For example, when using MySQL on Linux, the following command can be used in a CRON job to schedule the extraction:

mysqldump --all-databases --xml --lock-tables=false -ppassword | gzip -c | cat > /media/bak/servername-$(date +%Y-%m-%d)-mysql.xml.gz
傲鸠 2024-10-03 04:56:14

您可以在表上创建物化/索引视图。根据您的要求,您可以对视图进行全部或部分更新。请参阅此来创建 mview 和日志。 如何在 SQL Server 中创建物化视图?

You can create a materialized/indexed views on the table. Based on your requirement you can do full or partial update of the views. Please see this to create mview and log. How to create materialized views in SQL Server?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文