请帮助解释我是否为了性能而破坏我的数据库架构:(
我在 Sql 2008(之前是 05 年)上使用了一个生产数据库近 3 年。 一直很好,但性能不是很好。 因此,我正在调整架构和查询以帮助加快一些速度。 此外,许多主表每个表包含大约 1-3 个磨机行(以给出 ua 的尺寸估计)。
这是一个示例数据库图(Soz,在 NDA 下,所以我无法显示原始图):-
alt text http://img11.imageshack.us/img11/4608/dbschemaexample.png
需要注意的事项(与我的问题直接相关):-
- 车辆可以有 0 (NULL) 或 1 个无线电。 (左外连接)
- 车辆可以有 0 (NULL) 或 1 个杯架(左外连接)
- 车辆有 1 个轮胎类型(内连接)。
首先,这看起来像一个规范化的数据库架构。 我很烂数据库理论,所以我猜这是 3NF(至少)...著名的最后一句话:)
现在,这正在影响我的数据库性能,因为这两个外连接和内连接被频繁调用,而且还有许多声明中还加入了一些内容。
为了尝试解决这个问题,我想我可以尝试索引视图。 创建视图是小菜一碟。 但对其建立索引,不起作用 -> 无法使用联接或自引用表创建索引视图(也是另一个问题:( )。
所以,我哭了几个小时(并且/wrists, 染过的头发 和 写了一首关于它的情绪歌曲并将其放在 myfailspace 上)并执行了以下操作...
- 添加了一个新的行进入每个“可选”外连接表(在本例中,Radios 和 CupHolders ID = 0,其余数据 =“Unknown Blah”或 0)
- 更新父表,以便任何 NULL 数据现在都有 0
- 。 现在,
我什至可以制作索引视图,现在速度非常快,
所以……这违背了我所学到的一切。 我感觉很脏。 独自的。 已感染。
这是一件坏事吗? 这是为了性能而对数据库进行非规范化的常见场景吗?
我想对此有一些想法,请:)
PS。 这些图像是谷歌随机找到的——所以不是我。
I've got a database in production for nearly 3 years, on Sql 2008 (was '05, before that). Has been fine, but it isn't very performant. So i'm tweaking the schema and queries to help speed some things up. Also, a score of main tables contain around 1-3 mill rows, per table (to give u a estimate on sizes).
Here's a sample database diagram (Soz, under NDA so i can't display the original) :-
alt text http://img11.imageshack.us/img11/4608/dbschemaexample.png
Things to note (which are directly related to my problem) :-
- A vehicle can have 0 (NULL) or 1 Radio. (Left Outer Join)
- A vehicle can have 0 (NULL) or 1 Cupholder (Left Outer Join)
- A vehicle has 1 Tyre Type (Inner Join).
Firstly, this looks like a normalised database schema. I suck and DB theory, so I'm guessing this is 3NF (at least) ... famous last words :)
Now, this is killing my database performance because these two outer joins and inner join are getting called a lot AND there's also a few more joins in many statements.
To try and fix this, I thought I might try and indexed view. Creating the view is a piece of cake. But indexing it, doesn't work -> can't create indexed views with joins OR self referencing tables (also another prob :( ).
So, i've cried for hours (and /wrists, dyed hair and wrote an emo song about it and put it on myfailspace) and did the following...
- Added a new row into each 'optional' outer join tables (in this example, Radios and CupHolders). ID = 0, rest of the data = 'Unknown Blah' or 0's.
- Update Parent tables, so that any NULL data's now have a 0.
- Update relationship from outer joins to inner joins.
Now, this works. I can even make my indexed view, which is very fast now.
So ... i'm in pain. This just goes against everything I've been taught. I feel dirty. Alone. Infected.
Is this a bad thing to do? Is this a common scenario of denormalizing a database for the sake of performance?
I would love some thoughts on this, please :)
PS. Those images a random google finds -- so not me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
索引中通常不使用空值。 您所做的就是提供一个哨兵值,以便该列始终具有一个允许更有效地使用索引的值。
您也没有更改数据库的结构,因此我不会将此称为非规范化。 我已经用日期值做到了这一点,其中“结束日期”为空,表示尚未结束。 相反,我将其设为未来已知的日期方式,以便进行索引。
我认为这很好。
null values generally are not used in indexs. What you've done is to provide a sentinel value so that the column always has a value which allows your indexes to be used more effectively.
You didn't change the structure of your database either, so I wouldn't call this denormalizing. I've done that with date values where you have an "end date" null denoted not ended yet. Instead I made it a known date way in the future which allowed for indexing.
I think this is fine.
数据库应该始终在 3NF 中设计和最初实现。 但世界是一个现实的地方,而不是理想的地方,出于性能原因恢复到 2NF(甚至 1NF)是可以的。 不要为此自责,现实世界中实用主义总是战胜教条主义。
如果您的解决方案能够提高性能,那么它就是一个好的解决方案。 拥有一台真正的收音机(例如),由无人制造且没有任何功能,这个想法并不是一个坏主意 - 相信我,以前已经做过很多次了 :-) 你会这样做的唯一原因使用该字段为 NULL 是为了查看哪些车辆没有收音机,这些查询之间几乎没有什么区别:
我的第一个想法是简单地将四个表合并为一个表并挂起重复数据问题。 DBMS 的大多数问题源于性能不佳而不是存储空间不足。
如果您当前的非规范化模式也变得很慢,也许可以将其保留为后备位置。
Database should always be designed and initially implemented in 3NF. But the world is a place of reality, not ideals, and it's okay to revert to 2NF (or even 1NF) for performance reasons. Don't beat yourself up about it, pragmatism beats dogmatism in the real world all the time.
Your solution, if it improves performance, is a good one. The idea of having an actual radio (for example), manufactured by nobody and having no features, is not a bad one - it's been done a lot before, believe me :-) The only reason you would use that field as NULL was to see which vehicles have no radio and there's little difference between these queries:
My first thought was to simply combine the four tables into one and hang the duplicate data issue. Most problems with DBMS' stem from poor performance rather than low storage space.
Maybe keep that as your fallback position if your current de-normalized schema becomes slow as well.
“......所以我正在调整架构和查询以帮助加快某些事情......” - 我对此持不同意见。 看来你正在放慢速度。 (开玩笑。)
我喜欢数据库程序员博客。 他有两篇赞成和反对标准化的专栏,您可能会觉得有用:
我不是 DBA,但我认为证据就在你眼前:性能更差。 我不明白将这些 1:1 关系拆分为单独的表有什么好处,但我很乐意接受指导。
在我更改任何内容之前,我会要求 SQL Server 对每个缓慢的查询进行 EXPLAIN PLAN,并使用该信息来准确查看应该更改的内容。 不要猜测,因为标准化专家告诉过你。 获取数据来支持您正在做的事情。 您所做的听起来像是在不进行分析的情况下优化中间层代码。 直觉并不是很准确。
"...So i'm tweaking the schema and queries to help speed some things up..." - I would beg to differ about this. It seems that you're slowing things down. (Just kidding.)
I like the Database Programmer blog. He has two columns for and against normalization that you might find helpful:
I'm not a DBA, but I think the evidence is in front of your eyes: Performance is worse. I don't see what splitting these 1:1 relationships into separate tables is buying you, but I'll be happy to take instruction.
Before I changed anything, I'd ask SQL Server to EXPLAIN PLAN on every query that was slow and use that information to see exactly what should be changed. Don't guess because a normalization guru told you so. Get the data to back up what you're doing. What you're doing sounds like optimizing middle tier code without profiling. Gut feelings aren't very accurate.
我遇到了同样的问题:表现与学术卓越。 我们有一个包含 300 列和 91000 条记录的客户数据库的大视图。 我们使用外连接来创建视图,但性能非常糟糕。 我们考虑更改为内连接,方法是在我们连接的列上放入值为零(而不是 null)的虚拟记录,以在视图上启用唯一索引。
我必须同意,如果性能很重要,有时就必须做一些奇怪的事情才能实现它。 最终那些为我们付账的人并不关心架构是否完美。
im running into the same issue of performance vs academic excellence. we have a large view on a customer database with 300 columns and 91000 records. we use outer joins to create the view and the performance is pretty bad. we have considered changing to inner joins by putting in the dummy records with a value of zero on the columns we join on (instead of null) to enable a unique index on the view.
i have to agree that if performance is important, sometimes strange things have to be done to make it happen. ultimately those who pay our bills don't care if the architecture is perfect.