为了理智或性能而进行非规范化?
我开始了一个新项目,他们有一个非常标准化的数据库。所有可以查找的内容都存储为查找表的外键。这是规范化的并且很好,但我最终为最简单的查询执行了 5 个表连接。
from va in VehicleActions
join vat in VehicleActionTypes on va.VehicleActionTypeId equals vat.VehicleActionTypeId
join ai in ActivityInvolvements on va.VehicleActionId equals ai.VehicleActionId
join a in Agencies on va.AgencyId equals a.AgencyId
join vd in VehicleDescriptions on ai.VehicleDescriptionId equals vd.VehicleDescriptionId
join s in States on vd.LicensePlateStateId equals s.StateId
where va.CreatedDate > DateTime.Now.AddHours(-DateTime.Now.Hour)
select new {va.VehicleActionId,a.AgencyCode,vat.Description,vat.Code,
vd.LicensePlateNumber,LPNState = s.Code,va.LatestDateTime,va.CreatedDate}
我想建议我们对一些东西进行去噪。就像州代码一样。在我的一生中,我没有看到州代码发生变化。与三字母机构代码类似的故事。这些都是由各机构的机构发放的,永远不会改变。
当我向 DBA 询问状态代码问题和 5 个表连接时。我得到的答复是“我们已经标准化”并且“加入速度很快”。
是否有令人信服的非规范化论据?如果没有别的事,我会为了理智而这样做。
T-SQL 中的相同查询:
SELECT VehicleAction.VehicleActionID
, Agency.AgencyCode AS ActionAgency
, VehicleActionType.Description
, VehicleDescription.LicensePlateNumber
, State.Code AS LPNState
, VehicleAction.LatestDateTime AS ActionLatestDateTime
, VehicleAction.CreatedDate
FROM VehicleAction INNER JOIN
VehicleActionType ON VehicleAction.VehicleActionTypeId = VehicleActionType.VehicleActionTypeId INNER JOIN
ActivityInvolvement ON VehicleAction.VehicleActionId = ActivityInvolvement.VehicleActionId INNER JOIN
Agency ON VehicleAction.AgencyId = Agency.AgencyId INNER JOIN
VehicleDescription ON ActivityInvolvement.VehicleDescriptionId = VehicleDescription.VehicleDescriptionId INNER JOIN
State ON VehicleDescription.LicensePlateStateId = State.StateId
Where VehicleAction.CreatedDate >= floor(cast(getdate() as float))
I've started a new project and they have a very normalized database. everything that can be a lookup is stored as the foreign key to the lookup table. this is normalized and fine, but I end up doing 5 table joins for the simplest queries.
from va in VehicleActions
join vat in VehicleActionTypes on va.VehicleActionTypeId equals vat.VehicleActionTypeId
join ai in ActivityInvolvements on va.VehicleActionId equals ai.VehicleActionId
join a in Agencies on va.AgencyId equals a.AgencyId
join vd in VehicleDescriptions on ai.VehicleDescriptionId equals vd.VehicleDescriptionId
join s in States on vd.LicensePlateStateId equals s.StateId
where va.CreatedDate > DateTime.Now.AddHours(-DateTime.Now.Hour)
select new {va.VehicleActionId,a.AgencyCode,vat.Description,vat.Code,
vd.LicensePlateNumber,LPNState = s.Code,va.LatestDateTime,va.CreatedDate}
I'd like to recommend that we denormaize some stuff. like the state code. I don't see the state codes changing in my lifetime. similar story with the 3-letter agency code. these are handed out by the agency of agencies and will never change.
When I approached the DBA with the state code issue and the 5 table joins. i get the response that "we are normalized" and that "joins are fast".
Is there a compelling argument to denormalize? I'd do it for sanity if nothing else.
the same query in T-SQL:
SELECT VehicleAction.VehicleActionID
, Agency.AgencyCode AS ActionAgency
, VehicleActionType.Description
, VehicleDescription.LicensePlateNumber
, State.Code AS LPNState
, VehicleAction.LatestDateTime AS ActionLatestDateTime
, VehicleAction.CreatedDate
FROM VehicleAction INNER JOIN
VehicleActionType ON VehicleAction.VehicleActionTypeId = VehicleActionType.VehicleActionTypeId INNER JOIN
ActivityInvolvement ON VehicleAction.VehicleActionId = ActivityInvolvement.VehicleActionId INNER JOIN
Agency ON VehicleAction.AgencyId = Agency.AgencyId INNER JOIN
VehicleDescription ON ActivityInvolvement.VehicleDescriptionId = VehicleDescription.VehicleDescriptionId INNER JOIN
State ON VehicleDescription.LicensePlateStateId = State.StateId
Where VehicleAction.CreatedDate >= floor(cast(getdate() as float))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我不知道我是否会称之为非规范化——它看起来更像是你只想用自然外键(州缩写、机构代码)替换人工外键(StateId、AgencyId)。使用 varchar 字段而不是整数字段会降低联接/查询性能,但是 (a) 如果您在大多数情况下甚至不需要联接表,因为无论如何自然的 FK 就是您想要的,那么这没什么大不了的并且( b) 您的数据库需要相当大/具有高负载才能引起注意。
但 djna 是正确的,因为在做出这样的改变之前,你需要完全了解当前和未来的需求。您确定这三个字母的机构代码永远不会改变,即使是五年后也是如此?真的、真的确定吗?
I don't know if I would even call what you want to do denormalization -- it looks more like you just want to replace artificial foreign keys (StateId, AgencyId) with natural foreign keys (State Abbreviation, Agency Code). Using varchar fields instead of integer fields will slow down join/query performance, but (a) if you don't even need to join the table most of the time because the natural FK is what you want anyway it's not a big deal and (b) your database would need to be pretty big/have a high load for it to be noticeable.
But djna is correct in that you need a complete understanding of current and future needs before making a change like this. Are you SURE the three letter agency codes will never change, even five years from now? Really, really sure?
有时出于性能(和理智)的原因可能需要一些非规范化。如果没有看到所有的表/需求等,很难说...
但是为什么不直接构建一些方便的视图(进行一些连接),然后使用它们来编写更简单的查询呢?
Some denormalization can be needed for performance (and sanity) reasons at some times. Hard to tell wihout seeing all your tables / needs etc...
But why not just build a few convenience views (to do a few joins) and then use these to be able to write simpler queries?
谨防想要按照你当前的习惯来塑造事物。现在,陌生的代码看起来很笨拙并且妨碍你的理解。随着时间的推移,你可能会适应。
如果当前(或已知的未来)要求(例如性能)没有得到满足,那么那就是一个完全不同的问题。但请记住,任何东西都可以进行性能调整,目标不是让事情尽可能快,而是让它们足够快。
Beware of wanting to shape things to your current idioms. Right now the unfamiliar code seems unweildy and obstructive to your understanding. In time it's possible that you will become acclimatised.
If current (or known future) requirements, such as performance are not being met then that's a whole different issue. But remember anything can be performance tuned, the objective is not to make things as fast as possible, but to make them fast enough.
上一篇文章讨论了与您遇到的类似问题。希望对您有所帮助。
处理“超标准化”数据
我个人对标准化的看法是尽可能标准化,但非规范化只是为了性能。即使是为了性能而进行的非规范化也是应该避免的。在非规范化之前,我会进行分析、设置正确的索引等。
理智……这被高估了。尤其是我们这个职业。
This previous post dealt with a similar issue to the one you're having. Hopefully it will be helpful to you.
Dealing with "hypernormalized" data
My own personal take on normalization is to normalize as much as possible, but denormalize only for performance. And evn the denormalization for performance is something to avoid. I'd go the route of profiling,setting correct indexes, etc before I'd denormalize.
Sanity... That's overrated. Especially in our profession.
嗯,那么表演呢?如果性能还好,只需将五个表 JOIN 成一个视图,为了理智起见,当需要数据时从视图中 SELECT 即可。
状态缩写是我认为有意义的键没问题的情况之一。对于行数有限且我完全控制数据的非常简单的查找表(意味着它不是从某些外部源填充的),我有时会创建有意义的四个或五个字符键,以便键值可以代理用于某些查询中的完整描述性查找值。
Well, what about the performance? If the performance is okay, just make the five table JOIN into a view and, for sanity, SELECT from the view when you need the data.
State abbreviations are one of the cases in which I think meaningful keys are okay. For very simple lookup tables with a limited number of rows and where I'm in complete control of the data (meaning it's not populated from some outside source) I'll sometimes create meaningful four or five character keys so that the key value can proxy for the fully descriptive lookup value in some queries.
创建视图(或内联表值函数以获取参数化)。无论如何,我通常将所有代码放入 SP(生成的一些代码)中,无论它们是否使用视图,就是这样,您几乎只编写一次联接。
Create a view (or inline table-valued function to get parameterization). In any case, I usually put all my code into SPs (some code generated) whether they use views or not and that's that, you pretty much only ever write the join once.
如果没有计划如果代码发生变化你将做什么,以及你的人工密钥场景将如何比使用更好地解决这种可能性,那么三字母代码可能会改变的论点(对于这种“标准化”)并不是很有说服力。代码作为键。除非您已经实现了完全时态模式(这是非常困难的,并且您的示例没有建议),否则对我来说,您的标准化对您有何好处并不明显。现在,如果您与来自多个来源和标准的机构合作,这些机构可能具有冲突的代号,或者“州”最终可能意味着州、省、部门、州或州的两个字母代码,那就是另一回事了。然后,您需要自己的密钥,或者需要一个包含比该代码更多信息的两列密钥。
An argument (for this "normalization") that the three-letter codes might change isn't very compelling without a plan for what you will do if the codes do change, and how your artificial-key scenario will address this eventuality better than using the codes as keys. Unless you've implemented a fully temporal schema (which is horribly difficult to do and not suggested by your example), it's not obvious to me how your normalization benefits you at all. Now if you work with agencies from multiple sources and standards that might have colliding code names, or if "state" might eventually mean a two-letter code for state, province, department, canton, or estado, that's another matter. You then need your own keys or you need a two-column key with more information than that code.