在数据库中存储多项选择值

发布于 2024-08-06 05:27:37 字数 375 浏览 13 评论 0原文

假设我让用户检查她所说的语言并将其存储在数据库中。重要的旁注,我不会在数据库中搜索任何这些值,因为我将有一些单独的搜索引擎用于搜索。 现在,存储这些值的明显方法是创建一个类似的表

UserLanguages
(
 UserID nvarchar(50),
 LookupLanguageID int
)

,但站点负载很高,我们正在尝试尽可能消除任何开销,因此为了避免在 UI 上显示结果时与主成员表连接,我正在考虑在主表中存储用户的语言,将它们用逗号分隔,例如“12,34,65”

同样,我不搜索它们,所以我不担心必须在该列上执行全文索引。

我确实没有看到这个解决方案有任何问题,但我是否忽略了任何事情?

谢谢, 安德烈

Say I offer user to check off languages she speaks and store it in a db. Important side note, I will not search db for any of those values, as I will have some separate search engine for search.
Now, the obvious way of storing these values is to create a table like

UserLanguages
(
 UserID nvarchar(50),
 LookupLanguageID int
)

but the site will be high load and we are trying to eliminate any overhead where possible, so in order to avoid joins with main member table when showing results on UI, I was thinking of storing languages for a user in the main table, having them comma separated, like "12,34,65"

Again, I don't search for them so I don't worry about having to do fulltext index on that column.

I don't really see any problems with this solution, but am I overlooking anything?

Thanks,
Andrey

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

月野兔 2024-08-13 05:27:37

不。

  • 现在你不搜索它们
  • 数据对任何事情都是无用的,除了这种情况
  • 没有数据完整性(例如没有FK)
  • 你仍然需要更改为“英语,德语”等以显示
  • “给我全部”说 x" = FAIL 的用户
  • 该列表实际上是一个演示问题

,不过这是您的系统,我期待稍后回答不可避免的“帮助”问题...

Don't.

  • You don't search for them now
  • Data is useless to anything but this one situation
  • No data integrity (eg no FK)
  • You still have to change to "English,German" etc for display
  • "Give me all users who speak x" = FAIL
  • The list is actually a presentation issue

It's your system, though, and I look forward to answering the inevitable "help" questions later...

冰火雁神 2024-08-13 05:27:37

您现在可能不会错过任何东西,但是当您的需求发生变化时,您可能会后悔这个决定。您应该按照您的第一直觉建议将其标准化存储。这才是正确的做法。

您所建议的是典型的过早优化。您还不知道该连接是否会成为瓶颈,因此您也不知道是否真正购买了任何性能改进。等到你可以分析该事物,然后你就会知道该部分是否需要优化。

如果确实如此,我会考虑使用物化视图或其他一些方法,使用标准化数据将答案预先计算到不被视为记录簿的缓存中。

更一般地说,如果有必要,可以进行许多可能的优化,而不会按照您建议的方式影响您的设计。

You might not be missing anything now, but when you're requirements change you might regret that decision. You should store it normalized like your first instinct suggested. That's the correct approach.

What you're suggesting is a classic premature optimization. You don't know yet whether that join will be a bottleneck, and so you don't know whether you're actually buying any performance improvement. Wait until you can profile the thing, and then you'll know whether that piece needs to be optimized.

If it does, I would consider a materialized view, or some other approach that pre-computes the answer using the normalized data to a cache that is not considered the book of record.

More generally, there are a lot of possible optimizations that could be done, if necessary, without compromising your design in the way you suggest.

梦太阳 2024-08-13 05:27:37

这种类型的存储几乎总是困扰着我。首先,你甚至还没有处于第一正常形态。另一方面,某个经理或其他人肯定会回来说……“嘿,既然我们存储了这个,你能给我写一份关于……的报告吗?”

我建议采用标准化设计。将其放在单独的表中。

This type of storage has almost ALWAYS come back to haunt me. For one, you are not even in first normal form. For another, some manager or the other will definitely come back and say.. "hey, now that we store this, can you write me a report on... "

I would suggest going with a normalized design. Put it in a separate table.

鹿港巷口少年归 2024-08-13 05:27:37

我通常会远离您所描述的解决方案,当您以这种方式存储关系数据时,您会遇到麻烦。

作为替代解决方案:
您可以存储为一个位掩码整数,例如:
0 - 不选择
1 - 英语
2 - 西班牙语
4 - 德语
8 - 法语
16 - 俄语
--以此类推 2 的幂

因此,如果有人选择英语和俄语,则该值将为 17,并且您可以使用按位运算符轻松查询这些值。

I generally stay away at the solution you described, you asking for troubles when you store relational data in such fashion.

As alternative solution:
You could store as one bitmasked integer, for example:
0 - No selection
1 - English
2 - Spanish
4 - German
8 - French
16 - Russian
--and so on powers of 2

So if someone selected English and Russian the value would be 17, and you could easily query the values with Bitwise operators.

我三岁 2024-08-13 05:27:37

问题:

  1. 您失去了加入功能(显然)。
  2. 您必须在每个页面加载/回发时重新解析列表。这会导致客户端产生更多代码。
  3. 您将失去所有试图保持数据库完整性的借口。试想一下,如果您决定稍后删除一种语言...修复所有用户配置文件的 sql 是什么?
  4. 假设您的各种配置文件选项存储在数据库的查找表中,您仍然需要在每个配置文件页面运行“30 个查询”。如果不是,那么您必须为每个小更改进行代码部署。糟糕,非常糟糕。
  5. 基于“不会发生”的事情做出设计决策绝对会导致失败。当然,商界人士说过他们永远不会这样做……直到他们想到他们绝对必须这样做的理由。今天。这将在您完成编码后立即完成。
  6. 正如我在评论中所说,对低使用率页面进行 30 个查询根本不算什么。不要担心,并且绝对不要优化,除非你确实知道这是必要的。猜猜它的个人资料页面有多少查询?

Problems:

  1. You lose join capability (obviously).
  2. You have to reparse the list on each page load / post back. Which results in more code client side.
  3. You lose all pretenses of trying to keep database integrity. Just imagine if you decide to REMOVE a language later on... What's the sql going to be to fix all of your user profiles?
  4. Assuming your various profile options are stored in a lookup table in the DB, you still have to run "30 queries" per profile page. If they aren't then you have to code deploy for each little change. bad, very bad.
  5. Basing a design decision on something that "won't happen" is an absolute recipe for failure. Sure, the business people said they won't ever do that... Until they think of a reason they absolutely must do it. Today. Which will be promptly after you finish coding this.
  6. As I stated in a comment, 30 queries for a low use page is nothing. Don't sweat it, and definitely don't optimize unless you know for darn sure it's necessary. Guess how many queries SO does for it's profile page?
捎一片雪花 2024-08-13 05:27:37

过早的优化是万恶之源。

编辑: 显然,我的观察背景被一些人误解了,因此遭到了否决。所以我会澄清。

对模型进行非规范化以使事情变得更容易和/或“更高效” - 例如创建串联列来表示业务信息(如 OP 情况) - 我称之为“不成熟的做法”优化”。

虽然可能存在一些极端的边缘情况,没有其他方法可以获得特定问题域所需的必要性能,但人们很少应该假设是这种情况。一般来说,这种过早的优化会导致长期的痛苦,因为它们很难撤消 - 在生产中更改数据模型比最初部署时需要付出更多的努力。

在设计数据库时,开发人员(和 DBA)应该应用规范化等标准实践,以确保他们的数据模型表达正在收集和管理的业务信息。我不认为正确使用数据标准化是一种“优化”——这是一种必要的实践。在我看来,数据建模者应该始终关注可以重构为(至少)第三范式(3NF)的模型。

Premature optimization is the root of all evil.

EDIT: Apparently the context of my observation has been misconstrued by some - and hence the downvotes. So I will clarify.

Denormalizing your model to make things easier and/or 'more performant' - such as creating concatenated columns to represent business information (as in the OP case) - is what I refer to as a "premature optimization".

While there may be some extreme edge cases where there is no other way to get the necessary performance necessary for a particular problem domain - one should rarely assume this is the case. In general, such premature optimizations cause long-term grief because they are hard to undo - changing your data model once it is in production takes a lot more effort than when it initially deployed.

When designing a database, developers (and DBAs) should apply standard practices like normalization to ensure that their data model expresses the business information being collected and managed. I don't believe that proper use of data normalization is an "optimization" - it is a necessary practice. In my opinion, data modelers should always be on the lookout for models that could be restructured to (at least) third normal form (3NF).

转角预定愛 2024-08-13 05:27:37

如果您不查询它们,则将它们存储在类似于初始计划的表单中不会丢失任何内容。
如果是的话,那么以逗号分隔的格式存储它们将会再次困扰你,而且我怀疑任何速度节省都会很重要,特别是当你考虑到将它们翻译回来所需的工作时。

If you're not querying against them, you don't lose anything by storing them in a form like your initial plan.
If you are, then storing them in the comma-delimited format will come back to haunt you, and I doubt that any speed savings would be significant, especially when you factor in the work required to translate them back.

一袭水袖舞倾城 2024-08-13 05:27:37

您似乎非常担心添加一些额外的查找表连接。根据我的经验,实际传输 HTML 响应并让浏览器呈现它所需的时间远远超过了一些额外的表连接。特别是如果您为主键和外键使用索引(正如您应该的那样)。这就像您正在计划一次多日越野旅行,并且担心 1 次额外的 10 分钟上厕所时间。

缺乏长期灵活性和数据完整性对于如此小的优化来说是不值得的(这可能没有必要,甚至不明显)。

You seem to be extremely worried about adding in a few extra lookup table joins. In my experience, the time it takes to actually transmit the HTML response and have the browser render it far exceed a few extra table joins. Especially if you are using indexes for your primary and foreign keys (as you should be). It's like you are planning a multi-day cross-country trip and you are worried about 1 extra 10 minute bathroom stop.

The lack of long-term flexibility and data integrity are not worth it for such a small optimization (which may not be necessary or even noticeable).

歌枕肩 2024-08-13 05:27:37

不啊啊啊啊啊!!!!!!

正如上面几篇文章中说得很好的那样。

如果您想对这场辩论有相反的看法,请查看 wordpress。表格中充满了分隔数据,这是一个很棒、简单的平台。

Nooooooooooooooooo!!!!!!!!

As stated very well in the above few posts.

If you want a contrary view to this debate, look at wordpress. Tables are chocked full of delimited data, and it's a great, simple platform.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文