数据仓库任意字段

发布于 2024-09-19 01:24:47 字数 1725 浏览 4 评论 0原文

在我们的应用程序中，我们支持用户编写的插件。

这些插件生成各种类型的数据（int、float、str 或 datetime），并且这些数据标有大量元数据（用户、当前目录等）以及三个自由文本字段（MetricName、Var1），Var2) 。

现在我们拥有几年的这些数据，我正在尝试设计一个模式，允许以分析方式（图表和其他内容）非常快速地访问这些指标。只要我们感兴趣的指标很少，这很容易，但是我们有大量不同粒度的不同指标，并且我们希望存储用户添加的数据以供以后分析（可能是在架构更改）。

示例数据：（请记住，这非常简单）

=========================================================================================================
| BaseDir         | User    | TrialNo | Project | ... | MetricValue | MetricName | Var1 | Var2      |
=========================================================================================================
| /path/to/me     | me      | 0       | domino  | ... | 20          | Errors     | core  | dumb      |
| /path/to/me     | me      | 0       | domino  | ... | 98.6        | Tempuratur | body  |           |
| /some/other/pwd | oneguy  | 223     | farq    | ... | 443         | ManMonths  | waste | Mythical  |
| /some/other/pwd | oneguy  | 224     | farq    | ... | 0           | Albedo     | nose  | PolarBear |
| /path/to/me     | me      | 0       | domino  | ... | 70.2        | Tempuratur | room  |           |
| /path/to/me2    | me      | 2       | domino  | ... | 2020        | Errors     | misc  | filtered  |

任何人都可以添加解析器插件来开始测量 AirSpeed 指标，并且我们希望我们的 analisys 工具能够“正常工作”该新指标。

更新：

考虑到许多 MetricName 是事先众所周知的，如果我可以启用对这些指标的分析，并简单地存储其他用户添加的指标，我就可以满足我的要求。我们可以接受这样一个事实：如果不编辑架构，新指标将无法用于重型分析。

大家觉得这个解决方案怎么样？

我将我们的指标分为三个事实表，一张用于不需要 MetricTopic 的事实，一张用于需要 MetricTopic 的事实，一张用于所有其他指标，包括意外的指标。

Metrics Schema #3

对于赏金：

我会接受任何展示如何使这个系统更实用的批评，或者带来使其与行业最佳实践更加一致。对文献的引用增加了分量。

原文

In our application, we support user-written plugins.

Those plugins generate data of various types (int, float, str, or datetime), and those data are labeled with bunches of meta-data (user, current directory, etc.) as well as three free-text fields (MetricName, Var1, Var2) .

Now we have several years of this data, and I'm trying to design a schema which allows very fast access to those metrics in an analytical fashion (charts and stuff). This is easy as long as there are only a few metrics we're interested in, but we have a large number of different metrics at different granularities, and we'd like to store user-added data to allow for later analysis (possibly after a schema change).

Example data: (please keep in mind this is very simplified)

=========================================================================================================
| BaseDir         | User    | TrialNo | Project | ... | MetricValue | MetricName | Var1 | Var2      |
=========================================================================================================
| /path/to/me     | me      | 0       | domino  | ... | 20          | Errors     | core  | dumb      |
| /path/to/me     | me      | 0       | domino  | ... | 98.6        | Tempuratur | body  |           |
| /some/other/pwd | oneguy  | 223     | farq    | ... | 443         | ManMonths  | waste | Mythical  |
| /some/other/pwd | oneguy  | 224     | farq    | ... | 0           | Albedo     | nose  | PolarBear |
| /path/to/me     | me      | 0       | domino  | ... | 70.2        | Tempuratur | room  |           |
| /path/to/me2    | me      | 2       | domino  | ... | 2020        | Errors     | misc  | filtered  |

Anyone can add a parser plugin to start measuring a AirSpeed metric, and we'd like our analisys tools to "just work" on that new metric.

Update:

Considering that many of the MetricName's are well-known beforehand, I can satisfy my requirements if I can enable analysis on those metrics, and simply store the other user-added metrics. We can accept the fact that new metrics won't be available for heavy-duty analysis without an edit to the schema.

What do you guys think of this solution?

I've divided our metrics into three fact tables, one for facts that don't need a MetricTopic, one for ones that do, and one for all the other metrics, including unexpected ones.

Metrics Schema #3

For the bounty:

I'll accept any critique which shows how to make this system more functional, or brings it into closer alignment with industry best-practices. References to literature gives added weight.

分享到QQ

分享到微博