用户请求非规范化数据库

发布于 2024-08-11 23:00:42 字数 268 浏览 6 评论 0原文

我正处于开发数据库驱动系统的早期阶段,系统的最大部分围绕继承类型的关系。有一个大约有 10 列的父实体,并且将有大约 10 个从父实体继承的子实体。每个子实体将有大约 10 列。我认为为父实体提供自己的表并为每个子实体提供自己的表 - 每个子类一个表的结构是有意义的。

今天,我的用户要求查看我创建的系统的结构。他们对每个子类一个表的结构的想法犹豫不决。他们更喜欢一张大约 100 列的大表,因为这样他们可以更轻松地执行自己的自定义查询。

为了用户的利益,我应该考虑对数据库进行非规范化吗?

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns and there will be about 10 child entities inheriting from the parent. Each child entity will have about 10 columns. I thought it made sense to give the parent entity its own table and give each of the children their own tables - a table-per-subclass structure.

Today, my users requested to see the structure of the system I created. They balked at the idea of the table-per-subclass structure. They would prefer one big ~100 column table because it would be easier for them to perform their own custom queries.

Should I consider denormalizing the database for the sake of the users?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

拥抱没勇气 2024-08-18 23:00:42

绝对不是。您以后随时可以创建视图来向他们展示他们想要查看的内容。

Absolutely not. You can always create a view later to show them what they want to see.

白衬杉格子梦 2024-08-18 23:00:42

他们实际上是在要求一份报告。

您可以授予他们访问包含他们需要的所有字段的视图的权限...这样您就不会弄乱您的数据模型。

They are effectively asking for a report.

You could give them access to a view containing all the fields they require... that way you don't mess up your data model.

看海 2024-08-18 23:00:42

不会。正确构建数据,如果用户需要数据的非规范化视图,请将其创建为数据库中的视图。

或者,考虑一下 RDBMS 可能不是适合该项目的存储工具。

No. Structure the data properly and if the users need the a denormalized view of the data create it as a VIEW in the database.

Alternatively, consider that perhaps an RDBMS is not the appropriate storage tool for this project.

那些过往 2024-08-18 23:00:42

出于某种原因,他们是系统的用户而不是程序员。为他们的查询提供单独的界面。像这样的高级用户可能会很有帮助,但处理起来却很痛苦。只需解释一下您需要以某种方式设计数据库,以便您可以完成您的工作,就这样。一旦完成,您就可以提供其他方法来使查询更容易。

They are the users and not the programmers of the system for a reason. Provide a separate interface for their queries. Power users like this can both be helpful and a pain to deal with. Just explain you need the database designed a certain way so you can do your job, period. Once that is accomplished you and provide other means to make querying easier.

梦回梦里 2024-08-18 23:00:42

他们知道什么!?您可能会说,用户一开始就不应该直接访问数据库。

这样做会让您面临巨大的性能问题,仅仅因为几个用户正在运行荒谬的查询。

What do they know!? You could argue that users shouldn't even be having direct access to a database in the first place.

Doing that leaves you open to massive performance issues, just because a couple of users are running ridiculous queries.

一影成城 2024-08-18 23:00:42

如果您以用户想要的格式创建了 VIEW,同时仍然维护正确规范化的表,会怎么样?

How about if you created a VIEW in the format your users wanted while still maintaining a properly normalized table?

永言不败 2024-08-18 23:00:42

除了支持或反对用户主张的许多技术原因之外,您还需要在沟通各种场景的后果以及(更重要的是)这些后果的成本时保持一致。如果用户是您的客户并且他们付钱让您做某项工作,请解释他们“提出的”想法可能会花费他们更多的钱,包括开发时间、额外的硬件资源等。

希望您能解释一下以这样的方式展示您的专业知识以及为什么您的想法从长远来看对您的用户具有更好的价值。

Aside from a lot of the technical reasons for or against your users' proposition, you need to be on same page in communicating the consequences of various scenarious and (more importantly) the costs of those consequences. If the users are your clients and they are paying you to do a job, explain that their awful "proposed" ideas may cost them more money in development time, additional hardware resources, etc.

Hopefully you can explain it in such a way that shows your expertise and why your idea is a much better value to your users in the long run.

清晰传感 2024-08-18 23:00:42

正如每个人或多或少提到的那样,这种方式就是疯狂,而你总是可以建立一个视图。

如果你无法让他们接受这一点,请考虑向他们展示这条线索,以及那些权衡过用户正在干预他们不完全理解的事情的专业人士的数量,其影响将是基础被破坏。

开发人员技巧的一个重要部分是对长期无法解决的问题的感觉,并且规范化规则在这方面几乎是规范的。在某些情况下,您需要非规范化(数据仓库等),但这听起来不像其中之一!

听起来好像你手头上可能有一个特别令人不安的用户——那些业余开发人员,他们认为只要有时间,他们自己就能更好地完成你的工作。这可能有帮助,也可能没有帮助,但我发现这些类型的人对演讲的反应很好——现在有几次我发现,如果我穿着得体,在我的个性中表现出一点力量,这会让他们感觉像我是专家,可以在很多问题发生之前就预防它们。

As everyone more or less mentioned, that way lies madness, and you can always build a view.

If you just can't get them to come around on this point, consider showing them this thread and the number of pros who weighed in saying that the users are meddling with things that they don't fully understand, and the impact will be an undermined foundation.

A big part of the developer's craft is the feel for what won't work out long term, and the rules of normalization are almost canonical in that respect. There are situations where you need to denormalize (data warehouses, etc) but this doesn't sound like one of them!

It also sounds as though you may have a particularly troubling brand of user on your hand -- the amatuer developer who thinks they could do your job better themselves if only they had the time. This may or may not help, but I've found that those types respond well to presentation -- a few times now I've found that if I dress sharp and show a little bit of force in my personality, it helps them feel like I'm an expert and prevents a bunch of problems before they start.

可是我不能没有你 2024-08-18 23:00:42

我强烈建议提出一个不涉及对您的数据库运行直接报告的答案。当这种情况发生时,您的数据库结构就已经确定,您基本上可以将其视为遗产。

视图是一个好的开始,但稍后您可能希望将其构造为导出,以进一步解耦。当然,然后您会遇到想要“实时”数据的人。正确的业务分析通常表明这是不必要的。实际的实时要求并不能通过报告系统得到最好的处理。

需要明确的是:我个人更喜欢每个子类的表方法,但我认为这实际上并不像直接报告事务表那样是一个大问题。

I would strongly recommend coming up with an answer that doesn't involve someone running direct reports against your database. The moment that happens, your DB structure is set in stone and you can basically consider it legacy.

A view is a good start, but later on you'll probably want to structure this as an export, to decouple further. Of course, then you'll encounter someone who wants "real time" data. Proper business analysis usually reveals this to be unnecessary. Actual real time requirements are not best handled through reporting systems.

Just to be clear: I'd personally favour the table per subclass approach, but I don't think it's actually as big an issue as the direct reporting off transaction tables is going to be.

趁年轻赶紧闹 2024-08-18 23:00:42

我会选择视图(正如其他人所建议的那样)或内联表值函数(这样做的好处是您需要参数 - 例如日期范围或客户帐户 - 这可以帮助阻止用户查询而没有任何限制问题空间)首先。内联 TVF 实际上是一个参数化视图,并且在引擎如何处理它们方面更接近于视图,而不是多语句表值函数或标量函数,后者的性能可能非常差。

但是,在某些情况下,如果视图复杂或密集,这可能会影响生产性能。如果临时用户查询编写得不好,与构建得更好的查询相比,它还可能导致锁定持续时间更长或升级得更远。在存在多对一或多对多关系的情况下,用户也可能会误解 ER 数据模型并产生相乘的数字。下一个选择可能是用索引来具体化这些视图或制作表并保持更新,这让我们更接近我的下一个选择...

所以,考虑到视图选项的这些缺点,并且已经考虑通过开始制作副本来缓解它对于数据,我考虑的下一个选择是拥有一个单独的只读(对于这些用户)数据版本,其结构不同。通常,我会首先查看 Kimball 式星型模式。您不需要拥有成熟的时间一致的数据仓库。当然,这是一种选择,但您可以简单地使报告模型保持最新的数据。星型模式是非规范化的一种特殊形式,特别适合数字报告,并且给定的星型不应该被用户意外滥用。您可以通过多种方式使 star 保持最新状态,包括触发器、计划作业等。它们可以非常快地满足报告需求,并在同一生产安装上运行 - 如果不仅仅是单独的数据库,也可能在单独的实例上运行。

尽管这样的解决方案可能要求您有效地将存储需求增加一倍以上,但与其他实践相比,如果您很好地了解您的数据并且不介意有两个模型(一个用于事务,一个用于分析),那么它可能是一个非常好的选择(请注意,无论如何,通过使用最简单的第一个视图选项,您已经开始进行这种逻辑分离)。

一些架构师通常会将其服务器加倍,并使用具有某种复制的相同模型,以便提供索引更重或不同的报告服务器。这样的第二个服务器不会影响具有报告要求的生产事务,并且可以相当容易地保持最新状态。只会有一种模型,但是当然,这与仅允许用户临时访问底层模型具有相同的可用性问题,而不会影响性能,因为他们有自己的游乐场。

有很多方法可以给这些猫剥皮。祝你好运。

I would opt for a view (as others have suggested) or an inline table-valued function (the benefits of this is you require parameters - like an date range or a customer account - which can help to stop users from querying without any limits on the problem space) first. An inline TVF is really a parametrized view and is far closer to a view in terms of how the engine treats them than it is to a multi-statement table valued function or a scalar function, which can perform incredibly poorly.

However, in some cases, this can impact production performance if the view is complex or intensive. With poorly written ad hoc user queries, it can also cause locks to persist longer or be escalated further than they would on a better built query. It is also possible for users to misinterpret an E-R data model and produce multiplied numbers in cases where there are many-to-one or many-to-many relationships. The next option might be to materialize these views with indexes or make tables and keep them updated, which gets us closer to my next option...

So, given those drawbacks of the view option and already thinking of mitigating it by starting to make copies of data, the next option I would consider is to have a separate read-only (for these users) version of the data which is structured differently. Typically, I would first look at a Kimball-style star schema. You do not need to have a full-fledged time-consistent data warehouse. Of course, that's an option, but you could simply keep a reporting model up to date with data. Star-schemas are a special form of denormalization and are particularly good for numerical reporting, and a given star should not be able to be abused by users accidentally. You can keep the star up to date in a number of ways, including triggers, scheduled jobs, etc. They can be very fast for reporting needs and run on the same production installation - perhaps on a separate instance if not just a separate database.

Although such a solution may require you to effectively more than double your storage requirements, when compared with other practices it might be a really good option if you understand your data well and don't mind having two models - one for transactions and one for analysis (note that you will already start to have this logical separation anyway with the use of a the simplest first option of view).

Some architects will often double their servers and use the SAME model with some kind of replication in order to provide a reporting server which is indexed more heavily or differently. Such a second server doesn't impact production transactions with reporting requirements and can be kept up to date fairly easily. There will only be one model, but of course, this has the same usability problems with allowing users ad hoc access to the underlying model only, without the performance affects, since they get their own playground.

There are a lot of ways to skin these cats. Good luck.

飘然心甜 2024-08-18 23:00:42

顾客永远是对的。但是,当您将客户的需求转换为美元和美分时,客户可能会做出让步。 100 列的表将需要额外的开发时间来编写代码来执行数据库通过正确实现自动执行的操作。此外,他们的支持成本将会更高,因为更多的代码意味着更多的问题和更低的调试难度。

The customer is always right. However, the customer is likely to back down when you convert their requirement into dollars and cents. A 100 column table will require extra dev time to write the code that does what the database would do automatically with the proper implementation. Further, their support costs will be higher since more code means more problems and lower ease of debugging.

陌伤ぢ 2024-08-18 23:00:42

我将在这里唱反调,并说这两种解决方案听起来都像是实际数据的糟糕近似值。面向对象的编程语言不倾向于使用这两种数据模型来实现是有原因的,这并不是因为 Codd 1970 年关于关系的想法是存储和查询面向对象数据结构的理想系统。 :-)

请记住,SQL 最初被设计为一种用户界面语言(这就是为什么它看起来有点像英语,而完全不像那个时代的其他语言:Algol、C、APL、Prolog)。我听说现在不向用户公开 SQL 数据库的唯一原因是安全性(他们可能会关闭服务器!)和可用性(当你可以点击点击时谁愿意编写 SQL?),但是如果这是他们的服务器并且他们想要,那为什么不让他们呢?

考虑到“系统的最大部分围绕继承类型的关系”,那么我会认真考虑一个可以让我本地表示这一点的数据库,Postgres(如果 SQL 很重要)或本机对象数据库(如果您不需要 SQL 兼容性,那么使用起来非常棒)。

最后,请记住,每个工程决策都是一种权衡。通过“坚持你的立场”(正如其他人提议的那样),你隐含地表示用户愿望的价值为零。不要向 SO 询问正确答案,因为我们不知道您的用户想要如何处理您的数据(甚至不知道您的数据是什么,或者您的用户是谁)。告诉他们为什么你想要一个多表解决方案,然后与他们一起制定一个你们都能接受的解决方案。

I'm going to play devil's advocate here and say that both solutions sound like poor approximations of the actual data. There's a reason that object-oriented programming languages don't tend to be implemented with either of these data models, and it's not because Codd's 1970 ideas about relations were the ideal system for storing and querying object-oriented data structures. :-)

Remember that SQL was originally designed as a user interface language (that's why it looks vaguely like English and not at all like other languages of that era: Algol, C, APL, Prolog). The only reasons I've heard for not exposing a SQL database to users today are security (they could take down the server!) and usability (who wants to write SQL when you can clicky clicky?), but if it's their server and they want to, then why not let them?

Given that "the largest part of the system revolves around an inheritance type of relationship", then I'd seriously consider a database that lets me represent that natively, either Postgres (if SQL is important) or a native object database (which are awesome to work with, if you don't need SQL compatibility).

Finally, remember that every engineering decision is a tradeoff. By "sticking to your guns" (as somebody else proposed), you're implicitly saying the value of your users' desires are zero. Don't ask SO for a correct answer to this, because we don't know what your users want to do with your data (or even what your data is, or who your users are). Go tell them why you want a many-tables solution, and then work out a solution with them that's acceptable to both of you.

尽揽少女心 2024-08-18 23:00:42

您已经实现了类表继承,并且他们要求单表继承。这两种设计在某些情况下都是有效的。

您可能需要获取 Martin Fowler 的企业应用程序架构模式的副本,以了解有关其优点的更多信息以及每种设计的缺点。无论如何,这本书都是你书架上的经典参考书。

You've implemented Class Table Inheritance and they're asking for Single Table Inheritance. Both designs are valid in certain situations.

You might want to get a copy of Martin Fowler's Patterns of Enterprise Application Architecture to read more about the advantages and disadvantages of each design. That book is a classic reference to have on your bookshelf, in any case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文