如何设计表的列不固定的模式
我正在尝试设计一个表的列不固定的模式。例如:我有一个员工表,其中表的列不固定并且变化(员工的属性不固定并且变化)。需要频繁添加新属性/列。
Employee 表本身中的可空列,即没有规范化
不要添加可为空的列,而是将这些列在各自的表中分开,例如:如果 Address 是要添加的列,然后创建表 Address[EmployeeId, AddressValue]。
创建表 ExtensionColumnName [EmployeeId, ColumnName] 和 ExtensionColumnValue [EmployeeId, ColumnValue]。 ExtensionColumnName 将 ColumnName 作为“Address”,ExtensionColumnValue 将 ColumnValue 作为地址值。
员工表
员工 ID
名称ExtensionColumnName 表
列名称 ID
员工 ID
列名扩展列值表
员工 ID
列名称 ID
ColumnValue
前两种方法有一个缺点,因为架构会随着每个新属性的变化而变化。请注意,添加新属性是频繁的并且是一项要求。
我不确定这是好还是坏设计。如果有人做出类似的决定,请提供有关外键/数据完整性、索引、性能、报告等方面的见解。
I am trying to design a schema where the columns of a table are not fixed. Ex: I have an Employee table where the columns of the table are not fixed and vary (attributes of Employee are not fixed and vary). Frequent addition of a new attribute / column is requirement.
Nullable columns in the Employee table itself i.e. no normalization
Instead of adding nullable columns, separate those columns out in their individual tables ex: if Address is a column to be added then create table Address[EmployeeId, AddressValue].
Create tables ExtensionColumnName [EmployeeId, ColumnName] and ExtensionColumnValue [EmployeeId, ColumnValue]. ExtensionColumnName would have ColumnName as "Address" and ExtensionColumnValue would have ColumnValue as address value.
Employee table
EmployeeId
NameExtensionColumnName table
ColumnNameId
EmployeeId
ColumnNameExtensionColumnValue table
EmployeeId
ColumnNameId
ColumnValue
There is a drawback is the first two ways as the schema changes with every new attribute. Note that adding a new attribute is frequent and a requirement.
I am not sure if this is the good or bad design. If someone had a similar decision to make, please give an insight on things like foreign keys / data integrity, indexing, performance, reporting etc.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
查看当前的 NoSQL 数据库可能会很有用,这些数据库允许您为每条记录存储任意的键值对集。
我建议您查看 couchdb、mongodb、lucene 等...
如果 SQL 数据库中的架构经常更改,这最终会导致一场噩梦,尤其是在报告方面。
将所有内容放入 (rowId, key, value) 三元组中很灵活,但由于记录数量庞大,速度较慢。
ERP 供应商的做法只是为他们确定的字段制定模式,并在固定的命名列中添加大量的“弹性字段”(即 20 个数字、20 个字符串等),然后使用查找表来查看哪个 flexcolumn 对应什么。这为未来提供了一定的灵活性,同时本质上具有静态模式。
It might be useful to look at the current crop of NoSQL databases which allow you to store arbitrary sets of key-value pairs per record.
I would recommend you look at couchdb, mongodb, lucene, etc ...
If the schema changes often in an SQL database this ends up in a nightmare, especially with reporting.
Putting everything in (rowId, key, value) triads is flexible, but slower because of the huge number of records.
The way the ERP vendors do it is just make their schema of the fields they're sure of and add a largisch number of "flexfields" (i.e. 20 numbers, 20 strings, etc) in fixed named columns and use a lookup table to see which flexcolumn corresponds to what. This allows some flexibility for the future while essentially having a static schema.
我建议使用数字二和三的组合。在可能的情况下,为标准关联(例如地址)建立模型表。这是最理想的方法...
但是,对于无法汇总为这样的逻辑分组的不断变化的值,除了
EMPLOYEES
表之外,还可以使用两个表:>employee_attribute_type_code
和DESCRIPTION)在
EMPLOYEE_ATTRIBUTES
中,将主键设置为:这将阻止同一员工出现重复属性。
I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...
But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the
EMPLOYEES
table:employee_attribute_type_code
and DESCRIPTION)In
EMPLOYEE_ATTRIBUTES
, set the primary key to be made of:This will stop duplicate attributes to the same employee.
如果如您所说,将频繁添加新属性,则 EAV 数据模型可能很适合你。
If, as you say, new attributes will be added frequently, an EAV data model may work well for you.
有一种模式,叫做观察模式。
有关说明,请参阅以下问题/答案:一个,两个,三个。
一般来说,看起来像这样:
例如,主题雇员、公司和动物都可以有观察姓名(特征),主题雇员和动物可以观察体重(测量值),对象啤酒瓶可以观察标签(性状)和体积< /em>(测量)。这一切都适合模型。
There is a pattern, called observation pattern.
For explanation, see these questions/answers: one, two, three.
In general, looks like this:
For example, subjects employee, company and animal can all have observation Name (trait), subjects employee and animal can have observation Weight (measurement) and subject beer bottle can have observations Label (trait) and Volume (measurement). It all fits in the model.
将 ExtensionColumn 表合并为一个表
如果您使用单调序列在所有对象表中分配主键,则单个属性表可以保存所有对象的属性。
Combine your ExtensionColumn tables into one
If you use a monotonic sequence for assigning primary keys in all your object tables then a single property table can hold properties for all objects.
我会使用 1 和 2 的组合。如果您经常添加属性,我认为您无法处理数据要求。
我猜想添加的一些属性属于另一个表。如果你不断添加java认证、asp认证等属性,那么你就需要一个认证表。这可以与列出可用认证的认证代码表相关。
像经理这样的属性可以是属性或关系表。如果员工之间有多种关系,请考虑使用具有关系类型的关系表。具有矩阵管理结构的组织将需要一个关系表。
地址和电话号码通常放在不同的表中。像employee_id、address_type 这样的地址键就比较合适。如果需要历史记录,请在键中添加 start_date 列。
如果您要保留历史记录,我建议在适当的列上使用 start_date 和 end_date 列。我尝试使用一种关系,其中当“start_date <= date-being-considered <”时记录处于活动状态。 end_date' 为真。
体重、眼睛颜色等属性。
I would use a combination of 1 and 2. If you are adding attributes frequently, I don't think you have a handle on the data requirements.
I supect some of the attributes being added belong in a another table. If you keep adding attribututes like java certified, asp certified, ..., then you need a certification table. This can be relationship to a certifications code table listing available certifications.
Attributes like manager may be either an attribute or relationship table. If you have multiple relationships between employees, then consider a relationship table with a releation type. Organizations with a matrix management structure will require a releationship table.
Addresses and phone numbers often go in separate tables. An address key like employee_id, address_type would be appropriate. If history is desired add a start_date column to the key.
If you are keeping history I recommend using start_date and end_date columns on the appropriate columns. I try to use a relationship where the record is active when 'start_date <= date-being-considered < end_date' is true.
Attributes like weight, eye color, etc.