为什么我们需要时态数据库?
我正在阅读有关时态数据库的内容,它们似乎已经建立在时间方面。 我想知道为什么我们需要这样一个模型?
它与普通的 RDBMS 有什么不同? 难道我们不能拥有一个普通的数据库(即 RDBMS)并拥有一个将时间戳与发生的每个事务相关联的触发器吗? 也许性能会受到影响。 但我仍然对时态数据库在市场上有充分的理由持怀疑态度。
目前的数据库是否支持这样的功能?
I was reading about temporal databases and it seems they have built in time aspects. I wonder why would we need such a model?
How different is it from a normal RDBMS? Can't we have a normal database i.e. RDBMS and say have a trigger which associates a time stamp with each transaction that happens? May be there would be a performance hit. But I'm still skeptical on temporal databases having a strong case in the market.
Does any of the present databases support such a feature?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
考虑一下你的约会/日记——从 1 月 1 日到 12 月 31 日。 现在我们可以查询日记中任何一天的约会/日记条目。 这种排序称为有效时间。 然而,约会/条目通常不是按顺序插入的。
假设我想知道 4 月 4 日我的日记中有哪些约会/条目。 也就是4月4日那天我的日记里存在的所有记录。 这是交易时间。
鉴于可以创建和删除约会/条目等。典型的记录具有涵盖条目期间的开始和结束有效时间以及指示条目出现在日记中的期间的开始和结束事务时间。
当日记可能进行历史修改时,这种安排是必要的。 假设在 4 月 5 日,我意识到 2 月 14 日的约会实际上发生在 2 月 12 日,即我发现我的日记中有错误 - 我可以更正错误,以便更正有效的时间图片,但现在,我查询什么是4 月 4 日的日记中的日期将是错误的,除非约会/条目的交易时间也被存储。 在这种情况下,如果我查询截至 4 月 4 日的日记,它将显示 2 月 14 日存在约会,但如果我查询截至 4 月 6 日的日记,它将显示 2 月 12 日有约会。
时态数据库的这种时间旅行特性使得记录有关如何在数据库中纠正错误的信息成为可能。 这对于记录修订时间并允许查询数据如何修订的真实数据审核图片是必要的。
时间。
大多数业务信息应存储在这种双时态方案中,以便提供真实的审计记录并最大化商业智能 - 因此需要关系数据库的支持。 请注意,每个数据项在二维时间模型中占据一个(可能无界)正方形,这就是人们经常使用 GIST 索引来实现双时态索引的原因。 这里的问题是,GIST 索引实际上是为地理数据设计的,而对时间数据的要求有些不同。
PostgreSQL 9.0 排除约束应该提供组织时态数据的新方法,例如,同一元组的事务和有效时间段不应重叠。
Consider your appointment/journal diary - it goes from Jan 1st to Dec 31st. Now we can query the diary for appointments/journal entries on any day. This ordering is called the valid time. However, appointments/entries are not usually inserted in order.
Suppose I would like to know what appointments/entries were in my diary on April 4th. That is, all the records that existed in my diary on April 4th. This is the transaction time.
Given that appointments/entries can be created and deleted etc. A typical record has a beginning and end valid time that covers the period of the entry and a beginning and end transaction time that indicates the period during which the entry appeared in the diary.
This arrangement is necessary when the diary may undergo historical revision. Suppose on April 5th I realise that the appointment I had on Feb 14th actually occurred on February 12th i.e. I discover an error in my diary - I can correct the error so that the valid time picture is corrected, but now, my query of what was in the diary on April 4th would be wrong, UNLESS, the transaction times for appointments/entries are also stored. In that case if I query my diary as of April 4th it will show an appointment existed on February 14th but if I query as of April 6th it would show an appointment on February 12th.
This time travel feature of a temporal database makes it possible to record information about how errors are corrected in a database. This is necessary for a true audit picture of data that records when revisions were made and allows queries relating to how data have been revised over
time.
Most business information should be stored in this bitemporal scheme in order to provide a true audit record and to maximise business intelligence - hence the need for support in a relational database. Notice that each data item occupies a (possibly unbounded) square in the two dimensional time model which is why people often use a GIST index to implement bitemporal indexing. The problem here is that a GIST index is really designed for geographic data and the requirements for temporal data are somewhat different.
PostgreSQL 9.0 exclusion constraints should provide new ways of organising temporal data e.g. transaction and valid time PERIODs should not overlap for the same tuple.
时态数据库通常通过具有一些固定的时间尺度(例如秒甚至毫秒)然后仅存储测量数据的变化来有效地存储数据的时间序列。 RDBMS 中的时间戳是每次测量的离散存储值,效率非常低。 时态数据库通常用于 SCADA 等实时监控应用程序。 OSISoft (http://www.osisoft.com/) 的 PI 数据库是一个完善的系统。
A temporal database efficiently stores a time series of data, typically by having some fixed timescale (such as seconds or even milliseconds) and then storing only changes in the measured data. A timestamp in an RDBMS is a discretely stored value for each measurement, which is very inefficient. A temporal database is often used in real-time monitoring applications like SCADA. A well-established system is the PI database from OSISoft (http://www.osisoft.com/).
据我了解(并且过度简化),时态数据库记录有关数据何时有效以及数据本身的事实,并允许您查询时态方面。 您最终会处理“有效时间”和“交易时间”表,或涉及“有效时间”和“交易时间”方面的“双时表”。 您应该考虑阅读这两本书中的任何一本书:
As I understand it (and over-simplifying enormously), a temporal database records facts about when the data was valid as well as the the data itself, and permits you to query on the temporal aspects. You end up dealing with 'valid time' and 'transaction time' tables, or 'bitemporal tables' involving both 'valid time' and 'transaction time' aspects. You should consider reading either of these two books:
时态数据库经常用于金融服务行业。 原因之一是您很少(如果有的话)被允许删除任何数据,因此记录上的 ValidFrom - ValidTo 类型字段用于提供记录何时正确的指示。
Temporal databases are often used in the financial services industry. One reason is that you are rarely (if ever) allowed to delete any data, so ValidFrom - ValidTo type fields on records are used to provide an indication of when a record was correct.
除了“我可以用它做哪些新事情”之外,考虑“它统一了哪些旧事物?”可能会很有用。 时态数据库代表“普通”SQL 数据库的特定概括。 因此,它可以为您提供一个统一的解决方案来解决以前看似不相关的问题。 例如:
另一方面,时间模型本身已经完成了修订控制的一半,这可以激发进一步的应用。 例如,假设您在 SQL 之上推出自己的临时设施并允许分支,就像在版本控制系统中一样。 即使是有限的分支也可以很容易地提供“沙箱”——随意使用和修改数据库的能力,而不会对其他用户造成任何可见的更改。 这使得在复杂的数据库上提供高度真实的用户培训变得容易。
具有简单合并功能的简单分支还可以简化一些常见的工作流程问题。 例如,非营利组织可能有志愿者或低薪工人进行数据输入。 为每个工作人员提供自己的分支可以让主管在将其合并到“普通”用户可见的主分支之前轻松审查其工作或对其进行增强(例如,去重)。 分支机构还可以简化权限。 如果用户仅被授予使用/查看其独特分支的权限,则您不必担心阻止所有可能的不需要的修改; 您只会合并有意义的更改。
Besides "what new things can I do with it", it might be useful to consider "what old things does it unify?". The temporal database represents a particular generalization of the "normal" SQL database. As such, it may give you a unified solution to problems that previously appeared unrelated. For example:
On the other hand, the temporal model itself is half-way to complete revision control, which could inspire further applications. For example, suppose you roll your own temporal facility on top of SQL and allow branching, as in revision control systems. Even limited branching could make it easy to offer "sandboxing" -- the ability to play with and modify the database with abandon without causing any visible changes to other users. That makes it easy to supply highly realistic user training on a complex database.
Simple branching with a simple merge facility could also simplify some common workflow problems. For example, a non-profit might have volunteers or low-paid workers doing data entry. Giving each worker their own branch could make it easy to allow a supervisor to review their work or enhance it (e.g., de-duplification) before merging it into the main branch where it would become visible to "normal" users. Branches could also simplify permissions. If a user is only granted permission to use/see their unique branch, you don't have to worry about preventing every possible unwanted modification; you'll only merge the changes that make sense anyway.
除了阅读维基百科文章之外? 维护“审核日志”或类似事务日志的数据库将具有一些“临时”属性。 如果您需要有关谁在何时对谁做了什么等问题的答案,那么时态数据库就是您的最佳选择。
Apart from reading the Wikipedia article? A database that maintains an "audit log" or similar transaction log will have some properties of being "temporal". If you need answers to questions about who did what to whom and when then you've got a good candidate for a temporal database.
您可以想象一个简单的时态数据库,它每隔几秒记录一次您的 GPS 位置。 压缩这些数据的机会很大,一个普通的数据库需要为每一行存储一个时间戳。 如果您需要大量的吞吐量,那么知道数据是临时的并且永远不需要更新和删除行可以让程序降低典型 RDBMS 中继承的大量复杂性。
尽管如此,时态数据通常只存储在普通的 RDBMS 中。 例如,PostgreSQL 有一些时间扩展,这使得这变得更容易一些。
You can imagine a simple temporal database that just logs your GPS location every few seconds. The opportunities for compressing this data is great, a normal database you would need to store a timestamp for every row. If you have a great deal of throughput required, knowing the data is temporal and that updates and deletes to a row will never be required permits the program to drop a lot of the complexity inherit in a typical RDBMS.
Despite this, temporal data is usually just stored in a normal RDBMS. PostgreSQL, for example has some temporal extensions, which makes this a little easier.
我想到两个原因:
Two reasons come to mind:
只是一个更新,时态数据库即将出现在 SQL Server 2016 中。
为了消除您的所有疑问,为什么需要时态数据库,而不是使用自定义方法进行配置,以及如何高效和高效地配置时态数据库。 SQL Server 会为您无缝配置它,请在此处查看 Channel9.msdn 上的深入视频和演示:https://channel9.msdn.com/Shows/Data-Expose/Temporal-in-SQL-Server-2016
MSDN 链接:https://msdn.microsoft.com/en-us/library/dn935015(v=sql .130).aspx
目前,您可以在 SQL Server 2016 的 CTP2(测试版 2)版本中使用它。
观看此视频了解如何在 SQL Server 2016 中使用临时表。
Just an update, Temporal database is coming to SQL Server 2016.
To clear all your doubts why one need a Temporal Database, rather than configuring with custom methods, and how efficiently & seamlessly SQL Server configures it for you, check the in-depth video and demo on Channel9.msdn here: https://channel9.msdn.com/Shows/Data-Exposed/Temporal-in-SQL-Server-2016
MSDN link: https://msdn.microsoft.com/en-us/library/dn935015(v=sql.130).aspx
Currently with the CTP2 (beta 2) release of SQL Server 2016 you can play with it.
Check this video on how to use Temporal Tables in SQL Server 2016.
我对时态数据库的理解是,它旨在存储某些类型的时态信息。 您可以使用标准 RDBMS 来模拟这一点,但是通过使用支持它的数据库,您可以拥有许多概念的内置习惯用法,并且查询语言可能会针对此类查询进行优化。
对我来说,这有点像使用特定于 GIS 的数据库而不是 RDBMS。 虽然您可以将坐标推送到普通的 RDBMS 中,但拥有适当的表示形式(例如,通过网格文件)可能会更快,并且拥有用于拓扑之类的 SQL 基元也很有用。
有学术数据库和一些商业数据库。 Timecenter 有一些链接。
My understanding of temporal databases is that are geared towards storing certain types of temporal information. You could simulate that with a standard RDBMS, but by using a database that supports it you have built-in idioms for a lot of concepts and the query language might be optimized for these sort of queries.
To me this is a little like working with a GIS-specific database rather than an RDBMS. While you could shove coordinates in a run-of-the-mill RDBMS, having the appropriate representations (e.g., via grid files) may be faster, and having SQL primitives for things like topology is useful.
There are academic databases and some commercial ones. Timecenter has some links.
时态数据库有用的另一个例子是数据随时间变化的情况。 我在一家电力零售商工作了几年,我们存储了 30 分钟的电表读数。 这些仪表读数可以随时修改,但我们仍然需要能够回顾读数变化的历史。
因此,我们拥有最新读数(我们对 30 分钟消耗的“当前理解”),但可以回顾我们对消耗的历史理解。 当您拥有可以通过这种方式调整的数据时,时态数据库就能很好地工作。
(话虽如此,我们用 SQL 手工雕刻它,但那是很久以前的事了。现在不会做出这个决定。)
Another example of where a temporal database is useful is where data changes over time. I spent a few years working for an electricity retailer where we stored meter readings for 30 minute blocks of time. Those meter readings could be revised at any point but we still needed to be able to look back at the history of changes for the readings.
We therefore had the latest reading (our 'current understanding' of the consumption for the 30 minutes) but could look back at our historic understanding of the consumption. When you've got data that can be adjusted in such a way temporal databases work well.
(Having said that, we hand carved it in SQL, but it was a fair while ago. Wouldn't make that decision these days.)