Oracle - XMLType 或名称=值对 (EAV) 的使用
我正在致力于设计日志信息的数据模型。
日志信息可以具有可变元素,并且是非常动态的。
哪种类型的数据模型应该效果更好? 使用 XMLType 列或具有 Name=Value 对的子表?
我想避免创建多个列,因为它们本质上是动态的并且可以非常频繁地更改。
我知道 EAV 模型不太适合查询,但据我所知,Oracle 11g 提供了一个 PIVOT 函数,可以将行转置为列,那么这会对性能有何影响?
加载的数据将用于下游 ETL 系统以及技术分析师的偶尔查询
感谢
Rajesh... 我计划创建 2 个表:
具有公共属性的父表,这些属性是每个日志事件的一部分,而子表将有一个父表 id (logId) 并将有其他事务元素。
目前,除了标准元素之外,还有 200 个元素,任何日志事件最多可以有 10 个不同的元素。此外,这个数字非常不稳定,并且可能会频繁变化。每次有新元素时都不断更改表数据结构是没有意义的。
此外,数据的存储时间不会超过 7 天,并且该表不会(或偶尔)用于查询。将有一个下游 ETL 作业来提取数据并执行平面文件转换。
我不担心数据类型,因为一切都是字符串数据。 您认为这种方法有任何问题吗?
I am working on designing a data model for log information.
The Log Information can have variable elements and that is very dynamic.
What type of data model should work better?
Use of XMLType column or an child table with Name=Value pairs?
I want to avoid creating multiple columns as they columns are dynamic by nature and can change very frequently.
I know EAV model is not very good for querying, but from what I have heard Oracle 11g provides a PIVOT function which can transpose rows to columns and so how would that impact performance?
The data loaded will be used for downstream ETL systems and also occasional querying by technical analysts
Thanks
Rajesh...
I plan to create 2 tables:
Parent table with common attributes which are part of every log event while the child table will have a parent table id (logId) and will have other transactional elements.
Currently there are 200 elements apart from standard elements and any log event can have unto 10 different elements. Also this number is very volatile and can change very frequently. It does not make sense to keep changing the table data structure every time there are new elements.
Also the data will not be stored for more than 7 days and the table is not going to be used (or occasional) for querying. There will be a downstream ETL job which will pull the data and perform flat file transforms.
I am not worried about the datatype because everything is string data.
Do you see any concerns with this approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我倾向于使用列来表示已知的、“几乎总是存在”的内容,并使用 XML 来表示对于列来说过于个性化的任何位。只要有某种模式可供使用(可能但不一定由数据库强制执行),XML 可能会更容易被 ETL 处理。
I'd tend towards columns for known, 'pretty much always there' stuff, and XML for any bits that were too individualistic for columns. XML is probably going to be easier for ETL to handle as long as there is some sort of schema to work with (possibly, but not necessarily, enforced by the database).
EAV 模型(很快)变得过于复杂而无法查询,即使使用 Pivot 语句也是如此。子查询、嵌套查询和联接将难以编写和理解,并且性能会降低。
除了性能之外,您无法保证数据正确存储(日期在日期列中,数字在数字列中......等等)。这意味着您将所有内容存储在字符串中并将其转换回正确的数据类型,希望它第一次插入正确。阅读本文档了解各种副作用。
http://antognini.ch/papers/DatatypesWorstPractices_20080509.pdf
即使在日志记录的情况下,也有是您想要跟踪大多数日志的一些常见参数,以及一些特定于某些日志的参数。为这些特殊字段创建一个包含 NULLLable 列的表是更好的方法。您始终可以对最重要的字段建立索引,如果您只想查看/公开某些字段,则可以构建特定于特定类型日志的视图。
EAV models ( very quickly) get too complex to Query, even with Pivot statements. Sub Queries, nested Queries and Joins will be difficult to write and understand and will be less performant.
Apart from Performance, you cannot guarantee that the data is being stored correctly (date is in Date column, number in number column.. and so on). This will mean you store everything in Strings and convert it back to the correct data type hoping it was inserted correctly the first time. Read this document for various side effects.
http://antognini.ch/papers/DatatypesWorstPractices_20080509.pdf
Even in the case of logging, there are a few common parameters that you want to track for most of the logs and only a few that are specific to some logs. Creating a table with NULLLable columns for those special fields is a better approach. You can always index the most important fields and you can build views specific to a particular type of logs if you wish to see/expose only some fields.