数据库结构-加入或不加入
我们正在 mySQL Workbench 的帮助下为新应用程序制定数据库结构,随着多对多关系的增加,创建数据列表所需的联接数量急剧增加。
该应用程序的读取量很大,每个表有几十万行。
问题:
在需要的地方合并表从而减少连接真的那么糟糕吗?
我们应该开始考虑水平分区吗? (与合并表结合)
是否有比数据透视表更好的方法来处理多对多关系?
我们讨论了将所有数据存储在序列化文本列中并让应用程序而不是数据库进行排序,但这似乎是一个非常糟糕的主意,即使数据库将被大量缓存。您觉得怎么样?
We're drawing up the database structure with the help of mySQL Workbench for a new app and the number of joins required to make a listing of the data is increasing drastically as the many-to-many relationships increases.
The application will be quite read-heavy and have a couple of hundred thousand rows per table.
The questions:
Is it really that bad to merge tables where needed and thereby reducing joins?
Should we start looking at horizontal partitioning? (in conjunction with merging tables)
Is there a better way then pivot tables to take care of many-to-many relationships?
We discussed about instead storing all data in serialized text columns and having the application make the sorting instead of the database, but this seems like a very bad idea, even though that the database will be heavily cached. What do you think?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用数据库的规范化形式。对于大多数任务,您不需要超过 3 或 4 个联接,并且您仍然可以为最常见的联接编写视图。非规范化会让您在更改一个属性时始终考虑更新多个位置/表中的字段,并且肯定会导致更多的问题而不是好处。
如果您担心报告性能,那么您仍然可以将定时批次的数据提取到单独的表中,以获得报告查询所需的性能。如果为了简化查询,您可以使用视图。
Go with the normalized form of the database. For most part of the tasks you won't need more than 3 or 4 Joins and you still can write views for the most common joins. Denormalization will have you to always think of updating fields in multiple places/tables when changing one property and will surely lead to more problems than benefits.
If you worry about reporting performance then you still can extract the data in timed batches into separate tables to get the desired performance for your reporting queries. If it's for query simplicity you can use views.
按相反的顺序:
算了。使用数据库。人们说“在应用程序中实现它”通常是那些对编写数据库的工作量一无所知的人。
取决于具体需要。
取决于具体需要。 OLTP(事务处理)- 寻求第一个范式。 OLAP(分析处理)- 寻找合适的星图并进行反规范化以获得最佳性能。混合 - 忘记它。不适用于较大的安装,因为理论不同...除非您将数据库设置为 OLTP,然后使用特殊的 OLAP 多维数据集数据库(mySQL 没有)。
In inverse order:
Forget it. Use the database. People saynig "make it in the application" are pretty often those ignorant to the amount of work going into writing databases.
Depends on exact need.
Depends on exact need. OLTP (Transaction processing) - go for for firth normal form. OLAP (Analytical processing) - go for a proper star diagram and denormalize to get optimal performance. Mixed - forget it. Does not work for larger installs because the theories are different... except if you make the database OLTP and then use a special OLAP cube database (which mySQL does not have).
数据库旨在处理大量连接。使用此功能,因为它将使数据库中的多种数据操作变得更加容易。否则,为什么不直接使用平面文件呢?
Databases are designed to handle lots of joins. Use this feature as it will make many kinds of data manipulation in the database much easier. Otherwise, why not just use a flat file?
与往常一样,这取决于您的应用程序,但一般来说,过多的非规范化可能会在以后反过来影响您。良好规范化的数据库意味着您应该能够以稍后可能需要的大多数方式查询数据,特别是用于报告(这通常是事后的想法)。
如果您将所有数据粘贴在序列化文本列中,并且您的客户要求提供一份显示具有特定属性的所有行的报告,那么您将必须执行一系列字符串操作才能获取此数据。
如果您担心查询的连接过多,您可以考虑将某些数据集公开为视图......
As always, it depends on your application, but in general, too much denormalisation can come back and bite you later on. A well normalised database means that you should be able to query your data in most ways that you may need later on, particularly for reporting (which often is an afterthought).
If you stick all your data in serialized text columns and your client asks for a report showing all rows that have a particular attribute, then you're going to have to do a bunch of string manipulation to get this data out.
If you're worried about too many joins for your queries, you could consider exposing certain sets of the data as a view...
如果您确保对外键建立索引(您确实设置了外键,不是吗?)并且在查询中具有正确的 where 子句,那么数据库应该可以轻松处理 10-15 个连接。尤其是行数这么少。我对包含数百万行的表进行了如此多的联接查询,并且它们运行良好。
通常,对数据进行分区比非规范化更好。
就非规范化而言,除非您还制定了保持非规范化数据与父表同步的策略,否则不要这样做。
至于您是否真的需要那么多表,或者您的设计是否糟糕,我们对此发表评论的唯一方法就是查看表结构。
If you make sure to index the foreign keys (you did set up foreign keys didn't you?) and have proper where clauses in your queries, 10-15 joins should be easily handled by a database. Especially with so few rows. I have queries with that many joins on tables with millions of rows and they run fine.
Usually it is better to partition data than to denormalize.
As far as denomalizing goes, don't do it unless you also institute a strategy for keeping the denormalized data in synch with the parent table.
As to whether you really need that many tables or if your design is bad, well the only way we could comment on that is if we saw the table structure.
除非您有明确的证据表明性能因连接而受到影响,否则请保持正常化。否则,正如其他人所说,您将不得不担心多次更新。
特别是如果数据库被大量缓存,正如您所说,您会惊讶于 DBMS 执行此类操作的速度有多快 - 毕竟,这就是它的设计目的。
除非它是那种具有大量数据的怪物应用程序,需要特殊的性能优化,否则您会发现减少开发、测试以及随后的维护工作将更加重要。
连接通常很好,也不错。它们允许您将数据保留在应有的位置,从而为您提供最大的灵活性。
正如我们多次说过的,过早的优化通常是不好的,而不是好的。
Unless you have clear evidence that performance is suffering because of the joins, stay normalised. Otherwise, as others have said, you'll have to worry about multiple updates.
Especially if the database is heavily cached, as you say, you'll be surprised how quick the DBMS is at doing this kind of thing - it is what it's designed for, after all.
Unless it's the sort of monster application, with huge amounts of data, that demands special performance optimisations, you'll find that keeping down the development, testing, and later, maintenance effort, will be much more important.
Joins are good, usually, not bad. They allow you to keep the data where it should be, which gives you maximum flexibility.
And as has been said many times, premature optimisation is usually bad, not good.