历史数据建模文献、方法和技术

发布于 2024-08-17 16:20:15 字数 838 浏览 7 评论 0原文

去年,我们推出了 http://tweetMp.org.au - 一个致力于澳大利亚政治和 Twitter 的网站。

去年年底,我们的政治家模式需要进行调整,因为一些政治家退休了,新的政治家加入了。

更改我们的数据库需要手动(SQL)更改,所以我正在考虑为我们的管理员实施一个 CMS,以便将来进行这些更改。

澳大利亚还有许多其他政府/政治网站可以管理自己的政治数据。

我想提出一种集中的方式来做到这一点。

经过思考一段时间后,也许最好的方法是不要对政客数据的当前视图以及它们与政治体系的关系进行建模,而是对交易进行建模。这样当前视图就是过去发生的所有事务/更改的投影。

使用这种方法,其他站点可以“订阅”更改(类似于 pubsubhub)并提交更改,然后将这些更改项集成到其架构中。

如果没有这种方法,大多数站点将不得不拆除整个数据库并重新填充它,因此任何关联的记录都需要重新关联。以这种方式管理数据非常烦人,并且严重阻碍了为了公共利益而对这些数据进行混搭。

我注意到有些事情是这样工作的——源代码版本控制、银行记录、stackoverflow 积分系统和许多其他示例。

当然,这种方法面临的直接挑战和设计问题包括

  • 当前视图是否被缓存和重新保留?多久更新一次?
  • 哪些基本实体必须存在且永不改变?
  • 可能还有更多我现在想不到的......

有没有关于这个主题的任何值得注意的文献可以推荐? 另外,像这样的数据建模的任何模式或实践可能有用吗?

非常感谢任何帮助。

-简历

Last year we launched http://tweetMp.org.au - a site dedicated to Australian politics and twitter.

Late last year our politician schema needed to be adjusted because some politicians retired and new politicians came in.

Changing our db required manual (SQL) change, so I was considering implementing a CMS for our admins to make these changes in the future.

There's also many other sites that government/politics sites out there for Australia that manage their own politician data.

I'd like to come up with a centralized way of doing this.

After thinking about it for a while, maybe the best approach is to not model the current view of the politician data and how they relate to the political system, but model the transactions instead. Such that the current view is the projection of all the transactions/changes that happen in the past.

Using this approach, other sites could "subscribe" to changes (a la` pubsubhub) and submit changes and just integrate these change items into their schemas.

Without this approach, most sites would have to tear down the entire db, and repopulate it, so any associated records would need to be reassociated. Managing data this way is pretty annoying, and severely impedes mashups of this data for the public good.

I've noticed some things work this way - source version control, banking records, stackoverflow points system and many other examples.

Of course, the immediate challenges and design issues with this approach includes

  • is the current view cached and repersisted? how often is it updated?
  • what base entities must exist that never change?
  • probably heaps more i can't think of right now...

Is there any notable literature on this subject that anyone could recommend?
Also, any patterns or practices for data modelling like this that could be useful?

Any help is greatly appreciated.

-CV

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一百个冬季 2024-08-24 16:20:15

这是数据建模中相当常见的问题。基本上可以归结为:

您对现在的视图、某个时间点的视图或两者都感兴趣吗?

例如,如果您有一项对订阅进行建模的服务您需要知道:

  • 某人在某个时间点使用过哪些服务:需要了解该信息才能计算出收费金额、查看帐户的历史记录等;以及
  • 某人现在拥有哪些服务:他们可以在网站上访问什么?

解决此类问题的起点是拥有一个历史记录表,例如:

  • 服务历史记录:id、userid、serviceid、start_date、end_date

将用户的服务历史记录链接在一起,您就拥有了他们的历史记录。那么你如何为他们现在拥有的东西建模呢?最简单(也是最非规范化的观点)是说最后一条记录或具有 NULL 结束日期或当前或未来结束日期的记录就是他们现在所拥有的。

正如您可以想象的那样,这可能会导致一些粗糙的 SQL,因此有选择地对它进行去命名,以便您拥有一个 Services 表和另一个用于历史记录的表。每次服务更改时,都会创建或更新历史记录。这种方法使历史表更像是一个审计表(您会看到的另一个术语)。

这与您的问题类似。您需要知道:

  • 众议院每个席位的现任议员是谁;
  • 每个席位的现任参议员是谁;
  • 每个部门的现任部长是谁;
  • 谁是首相。

但您还需要知道在某个时间点每个事物是谁,因此您需要所有这些事物的历史记录。

因此,2003 年 8 月 20 日,彼得·科斯特洛 (Peter Costello) 发布了一份新闻稿,您需要知道,此时他是:

  • 希金斯的成员;
  • 财务主管;和
  • 副总理。

因为可以想象,有人可能有兴趣查找彼得·科斯特洛或财务主管的所有新闻稿,这将导致相同的新闻稿,但如果没有历史记录就无法追踪。

此外,您可能需要知道哪些席位位于哪些州,可能还需要了解地理边界等等。

所有这些都不需要更改架构,因为架构应该能够处理它。

This is a fairly common problem in data modelling. Basically it comes down to this:

Are you interesting in the view now, the view at a point in time or both?

For example, if you have a service that models subscriptions you need to know:

  • What services someone had at a point in time: this is needed to work out how much to charge, to see a history of the account and so forth; and
  • What services someone has now: what can they access on the Website?

The starting point for this kind of problem is to have a history table, such as:

  • Service history: id, userid, serviceid, start_date, end_date

Chain together the service histories for a user and you have their history. So how do you model what they have now? The easiest (and most denormalized view) is to say the last record or the record with a NULL end date or a present or future end date is what they have now.

As you can imagine this can lead to some gnarly SQL so this is selectively denomralized so you have a Services table and another table for history. Each time Services is changed a history record is created or updated. This kind of approach makes the history table more of an audit table (another term you'll see bandied about).

This is analagous to your problem. You need to know:

  • Who is the current MP for each seat in the House of Representatives;
  • Who is the current Senator for each seat;
  • Who is the current Minister for each department;
  • Who is the Prime Minister.

But you also need to know who was each of those things at a point in time so you need a history for all those things.

So on the 20th August 2003, Peter Costello made a press release you would need to know that at this time he was:

  • The Member for Higgins;
  • The Treasurer; and
  • The Deputy Prime Minister.

because conceivably someone could be interesting in finding all press releases by Peter Costello or the Treasurer, which will lead to the same press release but will be impossible to trace without the history.

Additionally you might need to know which seats are in which states, possibly the geographical boundaries and so on.

None of this should require a schema change as the schema should be able to handle it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文