Oracle中可变长度时空数据的动态数据库设计(需要模式设计)

发布于 2024-12-21 16:41:03 字数 1854 浏览 3 评论 0原文

目前我正在开展一个研究项目,我需要存储时空数据并对其进行有效分析。我在下面给出确切的要求。

研究的是气象数据,所以数据属性有温度、湿度、压力、风速、风向等。属性的数量我们之前是未知的,根据需求我们可能需要添加更多的属性(表具有动态属性和不同的数据类型性质)。同样,数据是在一定的持续时间和时间间隔内从不同的位置、不同的高度捕获的。

那么,针对需求设计架构的最佳方法应该是什么?我们必须有效地找出关系。

该项目的目的不仅是存储数据库,还需要操作数据。

表格格式的示例数据-

location | time                | height | pressure | temperature | wind-direction | ...
L1       | 2011-12-18 08:04:02 | 7      | 1009.6   | 28.3        | east           | ...
L1       | 2011-12-18 08:04:02 | 15     | 1008.6   | 27.9        | east           | ...
L1       | 2011-12-18 08:04:02 | 27     | 1007.4   | 27.4        | east           | ...
L1       | 2011-12-18 08:04:04 | 7      | 1010.2   | 28.4        | north-east     | ...
L1       | 2011-12-18 08:04:04 | 15     | 1009.4   | 28.2        | north-east     | ...
L1       | 2011-12-18 08:04:04 | 27     | 1008.9   | 27.6        | north-east     | ...
L2       | 2011-12-18 08:04:02 | ..... so on

这里我需要为上述示例数据设计一个架构,其中Location是可以使用oracle MDSYS.SDO_GEOMETRY类型实现的空间位置。

限制是:

  1. 属性的数量(表列)在开发过程中是未知的。在运行时,可以添加任何新属性(例如湿度、折射率等)。所以我们不能设计属性特定的表模式。
        1.1)对于这个约束,我想使用像 -
    这样的模式         tbl_attributes(attr_id_pk, attr_name, attr_type);
             tbl_data(loc, 时间, attr_id_fk, 值);

         我的设计属性值必须是varchar类型,并且根据要求我想到了强制转换(根本不是一个好主意)。
         但是仅使用 SQL 查询查找具有此架构的关系数据非常困难。例如我想找到 -
             1.1.1) 风向为东,气温在27-28℃时L1位置的平均气压
            1.1.2) 高度为 15 时压力最大的位置。
         1.2) 我也在考虑在运行时编辑表模式,我认为这又不是一个好主意。
  2. 我们将使用一个加载应用程序,它将处理这个问题动态插入取决于模式(无论它可能是什么)。
  3. 需要有效地检索统计数据,如上面给出的一些示例[1.1.*]。

Currently I am working on a research project, where I need to store spatio-temporal data and analyze them efficiently. I am giving the exact requirement below.

The research is going on meteorological data, so the data attributes are temperature, humidity, pressure, wind-speed, wind-direction etc. The number of attributes is previously unknown to us, depending on requirement we may need to add more attributes (Table having dynamic attribute and different datatype nature). Again the data is captured from various locations, from various height and in a certain time duration as well as time interval.

So, what should be the best way to design a schema for the requirement? We must have to find out relation efficiently.

The purpose of the project is not only to store database, also need to manipulate the data.

Sample data in table format -

location | time                | height | pressure | temperature | wind-direction | ...
L1       | 2011-12-18 08:04:02 | 7      | 1009.6   | 28.3        | east           | ...
L1       | 2011-12-18 08:04:02 | 15     | 1008.6   | 27.9        | east           | ...
L1       | 2011-12-18 08:04:02 | 27     | 1007.4   | 27.4        | east           | ...
L1       | 2011-12-18 08:04:04 | 7      | 1010.2   | 28.4        | north-east     | ...
L1       | 2011-12-18 08:04:04 | 15     | 1009.4   | 28.2        | north-east     | ...
L1       | 2011-12-18 08:04:04 | 27     | 1008.9   | 27.6        | north-east     | ...
L2       | 2011-12-18 08:04:02 | ..... so on

Here I need to design a schema for the above sample data where Location is a spatial location that can be implemented using oracle MDSYS.SDO_GEOMETRY type.

Constraints are:

  1. The no of attributes (table column) is unknown during development. In runtime any new attribute(let say - humidity, refractive index etc.) can be added. So we can't design attribute specific table schema.
        1.1) for this constraint I thought to use a schema like -
               tbl_attributes(attr_id_pk, attr_name, attr_type);
               tbl_data(loc, time, attr_id_fk, value);

         The my design the attribute value must be varchar type, and as required I thought to cast (not a good idea at all).
         But finding relational data with this schema is very difficult using SQL query only. For example I want to find -
              1.1.1) avg pressure for location L1 when wind direction is east and temperature in between 27-28
             1.1.2) locations, where pressure is maximum at 15 height.
         1.2) I am also thinking to edit table schema during runtime, which is again not a good idea I think.
  2. We will use a loader application, which will be taking care of this dynamic insertion depending on the schema (what ever it maybe).
  3. Need to retrieve statistical data efficiently as some example is given above [1.1.*].

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

梦里泪两行 2024-12-28 16:41:03

当你这么说时,我不完全确定我明白你的意思

开发过程中属性数量(表列)未知。在
运行时任何新属性(比如说 - 湿度、折射率等)
可以添加。

首先,我认为这并不是真正随机发生的:即,当您从现场获得一堆新数据时,您知道(在导入之前)这些数据具有一两个额外的维度。正确的?

此外,在这个新数据批次中您获得“折射率”的事实不会使旧数据神奇地获得该维度的正确值。

因此,我会选择经典的对象到 RDBMS 映射,其中包含:

一个标头表,其中包含每个测量都存在的内容:即时间和空间,可能是源(即实验室、传感器、提供数据的团队)和一个自动生成的密钥。

一个或多个详细表,其中值被定义为适当的字段。

示例:

标题

location | time                | height | source  |Key          | 
L1       | 2011-12-18 08:04:02 | 7      | team-1  | 002020013   | 
L1       | 2011-12-18 08:04:02 | 15     | team-1  | 002020017   | 
L1       | 2011-12-18 08:04:02 | 27     | Lab-X   | 002020018   | 
L1       | 2011-12-18 08:04:04 | 7      | Lab-Y   | 002020021   | 
L1       | 2011-12-18 08:04:04 | 15     | Lab-X   | 002020112   | 

大气数据(基本)

Key       | pressure  | temp | wind-dir  | 
002020013 | 1009.6    | 28.3 | east      |
002020017 | 1019.3    | 29.2 | east      |
002020018 | 1011.6    | 26.9 | east      |

光传感器数据

Key       | refractive-ind  | albedo  | Ultraviolet  |
002020017 |         79.6    | .37865  | 7.0E-34      |
002020018 |         67.4    | .85955  | 6.5E-34      |
002020021 |         91.6    | .98494  | 8.1E-34      |

换句话说:每组不同的数据都将使用一个或多个子表(如果需要,您可以“动态”添加这些子表)并且您仍然可以通过标准方式创建查询,您只需要加入子表(如果可能的话:即如果您想通过风向和折射率进行分析,您可以 - 但仅当您拥有一组具有这两个值的数据时)通过使用参考键来保持这些一致)。

我相信这比使用内部包含 CSV 的文本字段、数据 blob 或使用键值关联更有效。

I am not completely sure I understand what you mean when you say that

The no of attributes (table column) is unknown during development. In
runtime any new attribute(let say - humidity, refractive index etc)
can be added.

first of all, I suppose that this is not really happening at random: i.e. when you get a new bunch of data from the field you know (before importing) that these have an extra dimension or two. Correct?

Also, the fact that in this new data batch you get "refractive index" will not make the older data magically acquire a proper value for this dimension.

Therefore I would go for a classical Object-to-RDBMS mapping where you have:

a header table with things that exist for every measurement: i.e. time and space, possibly the source (i.e. lab, sensor, team which provided the data) and an autogenerated key.

one or more detail table where the values are defined as proper fields.

Example:

Header

location | time                | height | source  |Key          | 
L1       | 2011-12-18 08:04:02 | 7      | team-1  | 002020013   | 
L1       | 2011-12-18 08:04:02 | 15     | team-1  | 002020017   | 
L1       | 2011-12-18 08:04:02 | 27     | Lab-X   | 002020018   | 
L1       | 2011-12-18 08:04:04 | 7      | Lab-Y   | 002020021   | 
L1       | 2011-12-18 08:04:04 | 15     | Lab-X   | 002020112   | 

Atmospheric data (basic)

Key       | pressure  | temp | wind-dir  | 
002020013 | 1009.6    | 28.3 | east      |
002020017 | 1019.3    | 29.2 | east      |
002020018 | 1011.6    | 26.9 | east      |

Light-sensor data

Key       | refractive-ind  | albedo  | Ultraviolet  |
002020017 |         79.6    | .37865  | 7.0E-34      |
002020018 |         67.4    | .85955  | 6.5E-34      |
002020021 |         91.6    | .98494  | 8.1E-34      |

In other words: every different set of data will use one or more subtables (these you can add "dynamically", if needed) and you can still create queries by standard means, you will just have to join subtables (where possible: i.e. if you want to analyze by Wind Directions AND refractive index, you can - but only when you have set of data which have both values) by using the reference keys to keep these consistent).

I believe this more efficient than using text fields with CSV inside, or data blobs or using a key-values associations.

自在安然 2024-12-28 16:41:03

我肯定会选择 1.2(在运行时编辑表模式),至少在一开始是这样。任何足够高级的配置都与编程没有区别;不要认为您可以神奇地避免对程序进行更改。

不要害怕alter table。是的,前期成本更高 - 您可能需要一个流程(而不仅仅是一个程序)来确保您的架构保持干净。并且存在一些潜在的锁定问题(有解决方案)。 但如果您做得正确,您只需为每次更改支付一次费用。

使用完全通用的解决方案,您将为每次查询支付少量费用。您的查询将变得复杂、缓慢、丑陋,并且更有可能失败。您永远无法编写像 select avg(value) ... 这样的查询,它可能有效也可能无效,具体取决于数据的访问方式。您可以使用 PL/SQL 函数来捕获异常,或使用内联视图和提示来强制执行特定的访问模式。无论哪种方式,您的查询都会更复杂、更慢,并且您必须确保每个人在使用数据之前都理解这些问题。

使用通用解决方案时,优化器会很糟糕,因为它对您的数据一无所知。 Oracle 无法预测 where attr_name = 'Temperature' and is_number(value) = 28.4 将返回多少行。但它可以很好地猜测 其中温度 = 28.4。您可能对通用列有更糟糕的计划(即慢速查询)。

I would definitely go with 1.2 (edit table schema during runtime), at least to begin with. Any sufficiently advanced configuration is indistinguishable from programming; don't think you can magically avoid making changes to your program.

Don't be scared of alter table. Yes, the upfront costs are higher - you may need a process (not just a program) to ensure your schema stays clean. And there are some potential locking problems (that have solutions). But if you do it right you only have to pay the price once for each change.

With a completely generic solution you will pay a small price with every query. Your queries will be complicated, slower, ugly, and more likely to fail. You can never write a query like select avg(value) ..., it may or may not work, depending on how the data is accessed. You can use a PL/SQL function to catch exceptions, or use inline views and hints to force a specific access pattern. Either way, your queries are more complicated and slower, and you have to make sure that everybody understands these problems before they use the data.

And with a generic solution the optimizer will suck because it knows nothing about your data. Oracle can't predict how many rows will be returned by where attr_name = 'temperature' and is_number(value) = 28.4. But it can make a very good guess for where temperature = 28.4. You may have significantly more bad plans (i.e. slow queries) with generic columns.

俯瞰星空 2024-12-28 16:41:03

感谢您的快速回复和良好指导。我从这两个答案中得到了一些概念,并决定采用混合模型。我不知道我是否在写入路径中。我想要对模型的评论。下面我用 MySQL 代码片段描述完整的概念模型。

概念模型

  1. 为了动态性 - (之前没有定义列)我创建了 4 个表,如下 -
    • geolocation(locid int, name varchar, Geometry Spatial_type) - 存储特定位置的信息,可以使用空间特征进行定义。
    • met_loc_event(loceventid int, locid* int, record_time timestamp, height float) - 这是为了识别高度突然发生的地方的特定事件。
    • metfeatures(featureid int, name varchar, type varchar) - 使用数据类型存储功能(即列)详细信息,该类型字段将有助于转换根据需要提供数据。
    • metstore(loceventid* int, featureid* int, value varchar) - 存储特定时间某个要素的原子值。
      到目前为止,我设计了一个列方向来存储表的动态特性。但正如您所建议的,这对于查询数据库来说并不是一个好的设计(有些不会像算术函数一样工作)。如果我们考虑性能,这也不好。

  2. 为了高效的查询需求(避免过多连接并避免在查询期间转换值)-我使用一些辅助视图扩展模型,我编写存储过程来从存储的数据库生成视图。
    • 首先,我在 met_loc_event、metfeatures 和 metstore 表的帮助下为每个功能创建了视图(通过从功能表中获取值,因此条目数最初将不是功能视图的内容)。这些视图根据要素类型存储locid、record_time、高度和种姓值
    • 接下来,我根据这些视图创建了一个名为 metrelview 的面向行的视图 - 它由所有关系数据按行组成,就像普通表一样。我计划向视图发起查询,这样查询性能会得到提高。
    • 只要特征表中有任何插入、更新或删除操作,就需要执行此视图生成过程。

下面是我为视图生成开发的 MySQL 过程

CREATE PROCEDURE `buildModel`()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE fid INTEGER;
    DECLARE fname VARCHAR(45);
    DECLARE ftype VARCHAR(45);
    DECLARE cur_fatures CURSOR FOR SELECT `featureid`, `name`, `type` FROM `metfeatures`;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    SET @viewAlias = 'v_';
    SET @metRelView = "metrelview";
    SET @stmtCols = "";
    SET @stmtJoin = "";

    START TRANSACTION;

    OPEN cur_fatures;
    read_loop: LOOP
        FETCH cur_fatures INTO fid, fname, ftype;
        IF done THEN
            LEAVE read_loop;
        END IF;
        IF fname IS NOT NULL THEN
            SET @featureView = CONCAT(@viewAlias, LOWER(fname));
            IF ftype = 'float' THEN
                SET @featureCastStr = "`value`+0.0";
            ELSEIF ftype = 'int' THEN
                SET @featureCastStr = "CAST(`value` AS SIGNED)";
            ELSE
                SET @featureCastStr = "`value`";
            END IF;

            SET @stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", @featureView, "`");
            SET @stmtCreateView = CONCAT("CREATE VIEW `", @featureView, "` AS SELECT le.`loceventid` AS loceventid, le.`locid`, le.`rectime`, le.`height`, ", @featureCastStr, " AS value FROM `metlocevent` le JOIN `metstore` ms ON (le.`loceventid`=ms.`loceventid`) WHERE ms.`featureid`=", fid);
            PREPARE stmt FROM @stmtDeleteView;
            EXECUTE stmt;
            PREPARE stmt FROM @stmtCreateView;
            EXECUTE stmt;

            SET @stmtCols = CONCAT(@stmtCols, ", ", @featureView, ".`value` AS ", @featureView);
            SET @stmtJoin = CONCAT(@stmtJoin, " ", "LEFT JOIN ", @featureView, " ON (le.`loceventid`=", @featureView,".`loceventid`)");
        END IF;
    END LOOP;

    SET @stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", @metRelView, "`");
    SET @stmtCreateView = CONCAT("CREATE VIEW `", @metRelView, "` AS SELECT le.`loceventid`, le.`locid`, le.`rectime`, le.`height`", @stmtCols, " FROM `metlocevent` le", @stmtJoin);

    PREPARE stmt FROM @stmtDeleteView;
    EXECUTE stmt;
    PREPARE stmt FROM @stmtCreateView;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    CLOSE cur_fatures;

    COMMIT;
END;

NB - 我尝试使用功能表中的任何事件调用该过程,以便每件事都应该自动化。但是由于MySQL不支持使用函数或触发器进行动态查询,因此我无法自动执行

在我最终确定为可接受的模型之前,我也希望得到批评,我不是 DBA,所以,如果您能帮助我如何提高模型的性能,将对我非常有帮助。

Thank you for the quick response and good guidance. I have gotten some concepts from the both answers and decided to go with a mix model. I don't know whether I am in the write path or not. I want comments on the model. Below I am describing the complete conceptual model with MySQL code snippet.

Conceptual model

  1. For dynamicity - (no of column is not defined previously) I have created 4 tables as follows -
    • geolocation(locid int, name varchar, geometry spatial_type) - to store information of a particular location, may be defined with spatial feature.
    • met_loc_event(loceventid int, locid* int, record_time timestamp, height float) - this is to identify a perticular event in a place with sudden height.
    • metfeatures(featureid int, name varchar, type varchar) - to store feature (ie. Column) details with a data type, that type field will help to cast data as required.
    • metstore(loceventid* int, featureid* int, value varchar) - to store an atom value for a feature at a particular time.
      Up to that part I design a column orientation to store a dynamic nature of table. But as you suggest this is not a good design for quering (some will not work like arithmetic functions) the database. This is also not good if we consider performance.

  2. For efficient query needs (to avoid to much joining and to avoid casting value during query) - I extend the model with some helper view, I write store procedure to generate views from the stored database.
    • First I created views for each feature (by taking value from feature table, so no of entry will be no of feature view initially) with the help of met_loc_event, metfeatures and metstore tables. These views store locid, record_time, height, and caste value according to feature type
    • Next from these views, I created a row oriented view named metrelview - which consist of all relation data row wise as like normal table. I have planned to fire query to the view, so the query performance will be improved.
    • This view generation procedure needs to execute whenever any insert, update or delete operation will be there in features table.

Below is the MySQL procedure that I have developed for the view generation

CREATE PROCEDURE `buildModel`()
BEGIN
    DECLARE done INT DEFAULT FALSE;
    DECLARE fid INTEGER;
    DECLARE fname VARCHAR(45);
    DECLARE ftype VARCHAR(45);
    DECLARE cur_fatures CURSOR FOR SELECT `featureid`, `name`, `type` FROM `metfeatures`;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    SET @viewAlias = 'v_';
    SET @metRelView = "metrelview";
    SET @stmtCols = "";
    SET @stmtJoin = "";

    START TRANSACTION;

    OPEN cur_fatures;
    read_loop: LOOP
        FETCH cur_fatures INTO fid, fname, ftype;
        IF done THEN
            LEAVE read_loop;
        END IF;
        IF fname IS NOT NULL THEN
            SET @featureView = CONCAT(@viewAlias, LOWER(fname));
            IF ftype = 'float' THEN
                SET @featureCastStr = "`value`+0.0";
            ELSEIF ftype = 'int' THEN
                SET @featureCastStr = "CAST(`value` AS SIGNED)";
            ELSE
                SET @featureCastStr = "`value`";
            END IF;

            SET @stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", @featureView, "`");
            SET @stmtCreateView = CONCAT("CREATE VIEW `", @featureView, "` AS SELECT le.`loceventid` AS loceventid, le.`locid`, le.`rectime`, le.`height`, ", @featureCastStr, " AS value FROM `metlocevent` le JOIN `metstore` ms ON (le.`loceventid`=ms.`loceventid`) WHERE ms.`featureid`=", fid);
            PREPARE stmt FROM @stmtDeleteView;
            EXECUTE stmt;
            PREPARE stmt FROM @stmtCreateView;
            EXECUTE stmt;

            SET @stmtCols = CONCAT(@stmtCols, ", ", @featureView, ".`value` AS ", @featureView);
            SET @stmtJoin = CONCAT(@stmtJoin, " ", "LEFT JOIN ", @featureView, " ON (le.`loceventid`=", @featureView,".`loceventid`)");
        END IF;
    END LOOP;

    SET @stmtDeleteView = CONCAT("DROP VIEW IF EXISTS `", @metRelView, "`");
    SET @stmtCreateView = CONCAT("CREATE VIEW `", @metRelView, "` AS SELECT le.`loceventid`, le.`locid`, le.`rectime`, le.`height`", @stmtCols, " FROM `metlocevent` le", @stmtJoin);

    PREPARE stmt FROM @stmtDeleteView;
    EXECUTE stmt;
    PREPARE stmt FROM @stmtCreateView;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    CLOSE cur_fatures;

    COMMIT;
END;

N.B. - I tried to call the procedure with any event in features table, so that every thing should be automated. But as MySQL is not supported dynamic query with function or trigger, I cant do it automatically

I also want criticism before i finalize as accepted model, I am not a DBA so, if you can help me how to improve performance for the model will be very helpful for me.

好久不见√ 2024-12-28 16:41:03

这听起来像是一项家庭作业,其基本主题是:放弃严格的范式设计原则的用例。

解决这个难题的方法是开发一个三阶段解决方案。第一阶段是使用灵活的 AttributeType、AttributeValue 方法的运行时适应性,以便可以捕获快速传入的数据并以准结构化方式临时放置在某处。第 2 阶段涉及对运行时数据的分析,以了解必须使用附加列和验证表来扩展模型的哪些位置,以适应任何新属性。第三阶段是将尚未导入的数据导入到修订后的模型中,该模型永远不会放松其严格的数据类型和声明性引用完整性约束。

正如他们所说:朋友们,生活就是一种权衡。

This sounds like a homework assignment whose underlying subject is: use-cases for abandoning strict normal-form design principles.

The solution to this conundrum is to develop a three-stage solution. Stage 1 is runtime adaptability using the flexible AttributeType, AttributeValue approach, so that rapidly incoming data can be captured and put somewhere temporarily in a quasi-structured manner. Stage 2 involves the analysis of that runtime data to see where the model must be extended with additional columns and validation tables to accommodate any new attributes. Stage 3 is the importing of the as-yet-unimported data into the revised model, which never relaxes its strict datatyping and declarative referential integrity constraints.

As they say: Life, friends, is a trade-off.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文