使用自然键,或使用代理键和审计表来审计/更改日志
这是我的第一个问题,所以要友善!
我是一名初级开发人员,没有太多经验,并且遇到了这个问题。
我有一个需要可审计的表。假设此表记录了呼叫中心拨打的电话(事实并非如此,但这只是一个示例)。我将其称为“CallHistory”。
我最初计划保留一个名为“Callees”的单独表,其中包含被叫者的姓名、电话号码等。该表将使用代理主键。
CallHistory 表将有一个指向 Callee 表的外键。
我最初这样做是为了,如果我更改了被叫方的电话号码,它会在整个系统中传播,而我不必更改多个表中的电话号码。
问题是,CallHistory 表的全部目的是记录呼叫的历史记录,包括误拨的呼叫(例如呼叫者拨打了错误的号码)。使用这种代理键方法将会丢失历史记录。
一位高级开发人员建议在 CallHistory 表中保留特定时间呼叫者每次拨打的电话号码副本,以保存历史记录。
我正在考虑出于同样的目的保留审计/更改日志表。
我的方法足以实现这个目的还是我完全偏离了轨道?您更喜欢哪种方法?
干杯, 安德鲁
My first question on here so be nice!
I am a junior developer with not much experience and am having trouble with this problem.
I have a table which needs to be auditable. Let's say this table records phone calls made by a call centre (it's not, but it's just an example). I'll call it "CallHistory".
I had originally planned to keep a separate table called "Callees" which has the callees' name, phone number, etc. This table would be using a surrogate primary key.
The CallHistory table would have a foreign key to the Callee table.
I originally did this so that if I changed a callee's phone number, it would propogate throughout the system and I wouldn't have to change the phone number in multiple tables.
The problem is, the whole point of the CallHistory table is to record the HISTORY of calls, including mis-dialed calls (say a caller dialled the wrong number). The history would be lost using this surrogate key approach.
One of the senior developers at work suggested keeping copies of the phone number for each dialling of a caller at that specific time in the CallHistory table to preserve the history.
I was thinking about keeping an audit/change log table for the same purpose.
Would my approach suffice for this purpsose or am I totally off track? Which approach do you prefer?
Cheers,
Andrew
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为您被这里关于正常形式的微妙之处所误导。问题是,与被叫方关联的电话号码与主叫方拨打的号码不是同一条信息。在一般情况下它们可能具有相同的值,但这是另一个问题。
所以在我看来,CallHistory 应该同时包含拨打的号码和对被叫表的引用。
I think you're being misled by a subtlety concerning the normal forms here. The thing is, the phone number associated with the callee is not the same piece of information as the number dialed by the caller. They might have the same value in the general case, but this is another issue.
So in my opinion, CallHistory should have both the numbre dialed and a reference to the callee table.
我同意里克的观点。是的,冗余数据是非常非常糟糕的、邪恶的、有异味的,而且是不受欢迎的。但仅仅因为两个字段被称为“电话号码”并不意味着它们是同一件事。 “客户当前的电话号码”和“我们上次与客户通话时的客户电话号码”不一定是同一件事。
我目前正在使用一个保存销售和商品信息的数据库。商品记录包括描述、库存编号和价格等信息。我们的销售记录还包括描述、库存编号和价格。描述和库存编号是多余的,应该删除。这是糟糕的设计。但两个地方都必须包含价格。当前价格与特定销售时的价格存在很大差异。那次出售可能是几年前的事了。从那时起,价格可能已经改变了十几次。
一般来说,在像您描述的应用程序中,我只需将电话号码放入历史记录表中即可完成。拥有“电话号码历史记录”表并链接到当时适用的电话号码记录几乎没有什么好处。它可能会为每个记录节省几个字节,但会增加很多复杂性。但是,如果有多个相关领域,情况就会发生变化。如果,我只是在这里发明一个例子来说明这一点,您是一家健康保险公司,并且由于不同的州法律、该地区可用的医生等原因,您的承保条款因地点而异,因此,当客户移动时,必须重写他的策略,现在电话号码可能与许多其他数据项相关,因此所有数据都应该放在一个表中,然后链接到相应的记录。否则,您可能有新泽西州的电话号码,但您链接到加利福尼亚州的保单条款等。
I agree with Rik. Yes, redundant data is very, very bad, evil, smelly, and otherwise undesirable. But just because two fields are called "phone number" does not make them the same thing. "Customer's current phone number" and "Customer's phone number at the time we last spoke to him" are not necessarily the same thing at all.
I'm presently working with a database that keeps sale and item information. An item record includes such information as description, stock number, and price. Our sale records also include description, stock number, and price. Description and stock number are redundant and should be eliminated. This was bad design. But price must be included in both places. There's a big difference between current price and the price at the time of a given sale. That sale could have been years ago. The price may have changed a dozen times since then.
In general in an application like you describe I'd just put phone number in the history table and be done with it. There's little to be gained by having a "phone number history" table and linking to the applicable-at-the-time phone number record. It might save a few bytes per record but it would add a bunch of complexity. However, if there are several related fields, the story changes. If, say -- and I'm just inventing an example here to give the idea -- you are a health insurance company and your terms of coverage vary depending on the location because of differing state laws, available doctors in the area, etc, so that when a customer moves his policy must be rewritten, now phone number may be related to many other data items, and so all should go in a single table and you link to the appropriate record. Otherwise you could have a New Jersey phone number but you're linking to California policy terms, etc.
你的问题是非常典型的设计困境。例如,如果您有标准形式的数据库,并且有以下表:销售、经理(销售人员)和区域(经理工作的地方)。您正在构建“按地区分组的年度销售额”之类的报告,在其中将销售与经理以及经理与地区结合起来。但如果其中一位经理在这一年里调到另一个办公室,你的报告似乎会显示不正确的数据,对吧?
3 种解决方案是什么
1) 在某些情况下,开发人员和分析师决定:好吧,我们的数据不是很正确,但目前还可以,我们希望保持正常形式并且不重复数据。该解决方案不太复杂。在这种情况下,您可以以正常形式创建 Callers 和 CallHistory 表,即电话号码将仅出现在 Callers 表中。
2)有要求不丢失任何历史变化。我们希望我们的报告和查询非常快(以数据库大小为代价)。在这种情况下,人们决定复制所有字段。例如,您可以创建包含电话号码、呼叫者姓名、地址等的 CallHistory 表,因为您预计这些字段中的每一个将来都可以更改。当然,您也可以创建 Callee 表(可能您需要它用于其他目的),但它可能会被 CallHistory 引用,也可能不会。假设您认为需要从 Callee 中删除某些记录,但希望它们位于 CallHistory 中。在这种情况下,开发人员经常认为他们可以违反数据的引用完整性,不要从 CallHistory 表中创建任何外键。这是合理的,因为没有外键,插入会更快。
3)我更喜欢的方法,但从实现的角度来看它是最复杂的:CallHistory表应该引用CalleeHistory表。 CalleeHistory 表将包含 Callee 表拥有的所有字段,但它也有一个代理键,例如 CalleeID + DateModified(有时开发人员使用 ModificationVersionNumber 而不是 DateModified)。在 CallHistory 中,我们有一个引用 CalleeID + DateModified 的代理外键。在这种情况下,您拥有规范化的数据(即电话号码在不同的表中不会重复),并且您也没有丢失任何历史更改。
正如我所说,通常需要在实现复杂性、数据库性能、数据库大小、数据完整性和系统功能要求之间进行权衡。如果您是初级开发人员,那么记住所有可能的解决方案是件好事,但您可能应该听取高级开发人员的意见,他们比 Stack Overflow 上的任何人都更了解您的系统和需求。
ps
如果您想了解其他方法,请阅读“缓慢变化的维度”,例如 http://en。 wikipedia.org/wiki/Slowly_chang_dimension
Your question is very typical design dilemma. For example, if you have database in normal form, and you have the following tables: sales, managers (who sells) and regions (where managers are working). You are building reports like "Yearly sales grouped by regions" where you joining sales with managers and managers with regions. But if one of the managers will relocate to another office during the year, seems that your report will show incorrect data, right?
What are 3 solutions
1) In some cases developers and analyst decide: well, our data is not very correct but it is OK for now, we want to stay with normal form and do not duplicate data. This solution is less complex. In this case you can create Callers and CallHistory tables in normal form, i.e. phone number will be in Callers table only.
2) There is a requirement not to lose any historical changes. And we want our reports and queries be very fast (at the cost of database size). In this case people decide to duplicate all fields. For example you can create CallHistory table that has phone number, callers name, address etc., because you anticipate that each of these fields can be changed in the future. Of course you can create Callee table as well (probably you will need it for another purposes) but it may be reefenced by CallHistory and may be not. Suppose that you think that some records need to be deleted from Callee but want them to be in CallHistory. This is the case when developers often think that they can violate referential integrity of the data, do not create any foreign keys from CallHistory table. And this is reasonable, because without foreign keys, inserts will work faster.
3) Approach I like more, but it is most complex from implementation point of view : CallHistory table should reference to CalleeHistory table. CalleeHistory table will have all fileds that Callee table has, but it also has a surrogate key, like CalleeID + DateModified (sometimes instead of DateModified developers use ModificationVersionNumber). In CallHistory we have a surrogate foreign key that reference CalleeID + DateModified. In this case you have normalized data (i.e. Phone number is not dublicated in different tables), and also you didnt lose any historical changes.
As far as I said, there is often a tradeoff between complexity of implementation, database performance, database size, data integrity and functional requirements to the system. If you are a junior developer, it is nice to have in mind all possible solutions, but probably you should listen to a senior developer, who knows about your system and requirements more than anybody on Stack Overflow.
p.s.
If you want to know about other approaches, read about Slowly changing dimensions, for example http://en.wikipedia.org/wiki/Slowly_changing_dimension