在数据库中存储多项选择值
假设我让用户检查她所说的语言并将其存储在数据库中。重要的旁注,我不会在数据库中搜索任何这些值,因为我将有一些单独的搜索引擎用于搜索。 现在,存储这些值的明显方法是创建一个类似的表
UserLanguages
(
UserID nvarchar(50),
LookupLanguageID int
)
,但站点负载很高,我们正在尝试尽可能消除任何开销,因此为了避免在 UI 上显示结果时与主成员表连接,我正在考虑在主表中存储用户的语言,将它们用逗号分隔,例如“12,34,65”
同样,我不搜索它们,所以我不担心必须在该列上执行全文索引。
我确实没有看到这个解决方案有任何问题,但我是否忽略了任何事情?
谢谢, 安德烈
Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search.
Now, the obvious way of storing these values is to create a table like
UserLanguages
(
UserID nvarchar(50),
LookupLanguageID int
)
but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"
Again, I don't search for them so I don't worry about having to do fulltext index on that column.
I don't really see any problems with this solution, but am I overlooking anything?
Thanks,
Andrey
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
不。
,不过这是您的系统,我期待稍后回答不可避免的“帮助”问题...
Don't.
It's your system, though, and I look forward to answering the inevitable "help" questions later...
您现在可能不会错过任何东西,但是当您的需求发生变化时,您可能会后悔这个决定。您应该按照您的第一直觉建议将其标准化存储。这才是正确的做法。
您所建议的是典型的过早优化。您还不知道该连接是否会成为瓶颈,因此您也不知道是否真正购买了任何性能改进。等到你可以分析该事物,然后你就会知道该部分是否需要优化。
如果确实如此,我会考虑使用物化视图或其他一些方法,使用标准化数据将答案预先计算到不被视为记录簿的缓存中。
更一般地说,如果有必要,可以进行许多可能的优化,而不会按照您建议的方式影响您的设计。
You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.
What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.
If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.
More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.
这种类型的存储几乎总是困扰着我。首先,你甚至还没有处于第一正常形态。另一方面,某个经理或其他人肯定会回来说……“嘿,既然我们存储了这个,你能给我写一份关于……的报告吗?”
我建议采用标准化设计。将其放在单独的表中。
This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "
I would suggest going with a normalized design. Put it in a separate table.
我通常会远离您所描述的解决方案,当您以这种方式存储关系数据时,您会遇到麻烦。
作为替代解决方案:
您可以存储为一个位掩码整数,例如:
0 - 不选择
1 - 英语
2 - 西班牙语
4 - 德语
8 - 法语
16 - 俄语
--以此类推 2 的幂
因此,如果有人选择英语和俄语,则该值将为 17,并且您可以使用按位运算符轻松查询这些值。
I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.
As alternative solution:
You could store as one bitmasked integer, for example:
0 - No selection
1 - English
2 - Spanish
4 - German
8 - French
16 - Russian
--and so on powers of 2
So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.
问题:
Problems:
过早的优化是万恶之源。
编辑: 显然,我的观察背景被一些人误解了,因此遭到了否决。所以我会澄清。
对模型进行非规范化以使事情变得更容易和/或“更高效” - 例如创建串联列来表示业务信息(如 OP 情况) - 我称之为“不成熟的做法”优化”。
虽然可能存在一些极端的边缘情况,没有其他方法可以获得特定问题域所需的必要性能,但人们很少应该假设是这种情况。一般来说,这种过早的优化会导致长期的痛苦,因为它们很难撤消 - 在生产中更改数据模型比最初部署时需要付出更多的努力。
在设计数据库时,开发人员(和 DBA)应该应用规范化等标准实践,以确保他们的数据模型表达正在收集和管理的业务信息。我不认为正确使用数据标准化是一种“优化”——这是一种必要的实践。在我看来,数据建模者应该始终关注可以重构为(至少)第三范式(3NF)的模型。
Premature optimization is the root of all evil.
EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.
Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".
While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.
When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).
如果您不查询它们,则将它们存储在类似于初始计划的表单中不会丢失任何内容。
如果是的话,那么以逗号分隔的格式存储它们将会再次困扰你,而且我怀疑任何速度节省都会很重要,特别是当你考虑到将它们翻译回来所需的工作时。
If you're not querying against them, you don't lose anything by storing them in a form like your initial plan.
If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.
您似乎非常担心添加一些额外的查找表连接。根据我的经验,实际传输 HTML 响应并让浏览器呈现它所需的时间远远超过了一些额外的表连接。特别是如果您为主键和外键使用索引(正如您应该的那样)。这就像您正在计划一次多日越野旅行,并且担心 1 次额外的 10 分钟上厕所时间。
缺乏长期灵活性和数据完整性对于如此小的优化来说是不值得的(这可能没有必要,甚至不明显)。
You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.
The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).
不啊啊啊啊啊!!!!!!
正如上面几篇文章中说得很好的那样。
如果您想对这场辩论有相反的看法,请查看 wordpress。表格中充满了分隔数据,这是一个很棒、简单的平台。
Nooooooooooooooooo!!!!!!!!
As stated very well in the above few posts.
If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.