数据库设计:在运行时修改表模式是否实用?
假设我有一个产品表,其中将保留一些字段,所有其他属性都是用户生成的可为空的列。
(reserved)
__
/ \
/ \
--------------------------------------
| id | name | color | height | width |
--------------------------------------
与 EAV 一样,它将允许任意数量的属性,但这些属性也将是可查询的。这种方法有哪些潜在的缺点?
如果用户在 ADD/DROP COLUMN 语句中唯一控制的是字段名称(始终会对其进行验证以防止删除保留字段),我们是否可以排除安全问题?
当表变得非常大时,
ADD/DROP COLUMN
语句可能会变得多么昂贵?假设我们设置了速率限制以避免用户滥用系统。从性能角度来看,对于单个表而言,多少(可为空、非索引)列过多?
从
Suppose I have a products table, in which a few fields will be reserved and all other attributes are user generated nullable columns.
(reserved)
__
/ \
/ \
--------------------------------------
| id | name | color | height | width |
--------------------------------------
Like EAV, it will allow any number of properties, but the properties will be queryable as well. What are the potential downsides of this approach?
Can we rule out security issue if the only thing user controls in
ADD/DROP COLUMN
statements is field name (which would be always validated to prevent dropping reserved fields)?How expensive
ADD/DROP COLUMN
statements may become when tables grow really large? Assuming we have rate limiting in place to avoid abuse of the system by user.How many (nullable, non-indexed) columns are too many for a single table, from performance perspective?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果有第二个包含键/值对的表,情况会好得多。
是什么让您认为第二种表方法不可查询?
DDL 语句不能位于事务中。它可能取决于您正在使用的数据库引擎,但如果 DDL 必须等到所有其他事务完成,和/或它会在等待其他事务完成时阻止所有其他事务,我不会感到惊讶。换句话说,性能会很糟糕。
You'd be far better off with a second table with the key/value pairs.
And what makes you think the second table approach wouldn't be queryable?
DDL statements cannot be in a transaction. It would probably depend on the database engine you're using, but I wouldn't be surprised if DDL would have to wait until every other transaction finished, and/or it would block all other transactions while waiting for the other transactions finish. In other words, the performance would suck.
在运行时动态修改关系数据库的模式非常昂贵,并且通常会产生一些不良影响(不会达到释放东西和吃掉孩子的程度,但接近;))
所以我会考虑两种选择。
保留不同类型的通用字段名称,并使用一个配置数据库来映射用户选择使用这些额外字段的用途,以便您在自定义报告流程中拥有正确的列标题等(我已经在几个例子中看到过这种情况) 已经
考虑使用允许存储不同对象(通常采用 JSON 等)的非关系数据库
Dynamically modifying a schema at runtime for a relational database is extremely expensive and can usually have some ill effects (not to the point of unleashing things and eating children, but close ;))
So I'd look at two choices.
Leave in generic field names of varying types, with a configuration database that maps what the user chose to use these extra fields for so you have proper column headings, etc. in custom reporting processes (I've seen this used in several ERP packages).
Consider using a non-relational database that allows storing disparate objects (generally in JSON, etc.)
与其向现有表添加列,不如使用产品表的外键创建包含属性键值对的第二个表怎么样?这将允许产品具有任意数量和种类的属性,并且可以通过主键-外键连接表来轻松查询。
Instead of adding columns to your existing table, how about creating a second table that contains key-value pairs of attributes with a foreign key to the products table? This will allow for any number and variety of attributes for products, and is easily queryable by joining the tables on the primary key-foreign key.
在生产环境中,数据库模式的修改是一件令人头疼的事情。 用户对数据库模式的修改就像有人用卡车碾过你,然后倒退并再次撞到你一样令人头疼。
用户在正常业务处理过程中添加到数据库的信息(包括新属性的名称)是数据,而不是架构信息,因此应如此对待。
In production modification of database schemas is a headache. User modification of database schemas is the kind of headache you get when someone runs you over in a truck, then backs up and hits you again.
Information added to the database by users during normal business processing (and this includes names of new attributes) is data, not schema information, and should be treated as such.