假设:
- 我有许多由事实和外键(“维度”和“键值”类型)组成的表。例如,ENCOUNTER:
ID - 主键
维度
键值
- TYPE_ID
- STATUS_ID
- PATIENT_CLASS_ID
- DISPOSITION_ID
- ...
事实
- ADMISSION_DATE
- DISCHARGE_DATE
-
。 ..
- 我没有创建数据仓库的选项
- 我想简化报告的数据结构
我的方法是创建许多伪维度视图(“D_LOCATION”基于 DEPARTMENT 和LOCATION 表)和伪事实视图(基于 ENCOUNTER 表的“F_ENCOUNTER”)。在伪事实视图中,我会将键值表(例如 STATUS、PATIENT_CLASS)连接到事实表以包含名称字段(例如 STATUS.NAME、PATIENT_CLASS.NAME)。
问题:
- 如果查询从 F_ENCOUNTER 选择所有字段的子集(即不是所有 key-value.name 字段),Oracle 10g 优化器是否足够智能以排除某些键值表连接(即未包含在查询中)?
- 我可以做些什么来优化这个架构(除了索引之外)
- 还有其他方法吗?
** 编辑 **
目标(按重要性排序):
- 降低查询复杂度;增加查询一致性;减少报告开发时间
- 优化查询处理
- 最大限度地减少管理员负担
- 减少存储
Assumptions:
- I have a number of tables comprised of facts and foreign keys ('dimensional' and 'key-value' type). For example, ENCOUNTER:
ID - primary key
dimensions
key-value
- TYPE_ID
- STATUS_ID
- PATIENT_CLASS_ID
- DISPOSITION_ID
- ...
facts
- ADMISSION_DATE
- DISCHARGE_DATE
-
...
- I don't have the option to create a data warehouse
- I would like to simplify the data structure for reporting
My approach is to create a number of pseudo-dimensional views ('D_LOCATION' based on the DEPARTMENT and LOCATION tables) and pseudo-fact views ('F_ENCOUNTER' based on ENCOUNTER table). In the pseudo-fact view, I would JOIN the key-value tables (e.g. STATUS, PATIENT_CLASS) to the fact table to include the name fields (e.g. STATUS.NAME, PATIENT_CLASS.NAME).
Questions:
- If a query selects a subset of all of the fields from F_ENCOUNTER (i.e. not all of the key-value.name fields), is the Oracle 10g optimizer smart enough to exclude some of the key-value table joins (i.e. the ones that aren't included in the query)?
- Is there anything that I can do to optimize this architecture (other than indices)
- Is there another approach?
** edit **
Goals (in order of importance):
- reduce query complexity; increase query consistency; decrease report-development time
- optimize query-processing
- minimize administrator burden
- decrease storage
发布评论
评论(2)
一种优化建议是不要使用键值对表。维度表的概念是,每条记录都应包含有关该概念的所有信息,而无需连接到规范化表 - 即,将星型模式转变为雪花型模式。
虽然值可能会在维度表记录中重复,但它的优点是报告查询中的联接较少。以这种方式对表进行非规范化可能看起来违反直觉,但在性能至关重要的情况下,它通常是最佳解决方案。
One optimization suggestion is not to use key-value pair tables. The concept of a Dimension table is that each record should contain all information about that concept without needing to join to normalized tables - i.e. turning a star schema into a snowflake schema.
While values might be repeated across dimension table records, it has the advantage of fewer joins in your reporting queries. Denormalizing tables in this way might seem counter intuitive but where performance is paramount it is usually the best solution.
TYPE_NAME
而不是TYPE_ID
、PATIENT_CLASS_NAME
而不是PATIENT_CLASS_ID
等)。TYPE_NAME
instead ofTYPE_ID
,PATIENT_CLASS_NAME
instead ofPATIENT_CLASS_ID
, etc.).