如何正确构建物联网传感器数据库模型?
我正在从事一个车间设备数据收集项目,旨在分析历史生产订单和实时数据(靠近操作员的 HMI)。
数据库实际状态:
数据从不同的设备(不同的协议)中提取出来,放入SQL Server中,结构如下:
PROCESS表:作为主表,每当一个批次(生产单位) )启动后,会创建一个 ProcessID 以及各种信息:
<表类=“s-表”> <标题> 进程ID 房间 设备ID 批次ID 程序 操作员 开始 结束 <正文> 209486 房间1 1010 985985 食谱A 吉姆 2022.04.05 13:58:02 2022.04.05 15:58:02设备系列表:为每个设备系列(搅拌机、烤箱等)创建一个表,其中每 5 秒收集一次其传感器值(湿度、温度、速度等)。以下是上面的 BatchID 示例,其中 ProcessID = 设备系列表上的混合 ID - dbo.Mixer:
<表类=“s-表”> <标题> 混合ID 设备ID 湿度 温度 速度 日期时间 <正文> 209486 1010 2.5 70 250 2022.04.05 13:58:02 209486 1010 2.6 73 215 2022.04.05 13:58:07 .... ... .... .... .... ....
因此,数据库由一个主 PROCESS 表和几个设备系列表构成,这些表分别是在项目开发期间创建(dbo.mixer、dbo.oven 等)。 有以下数据流:SQLServer(源)- RDS 服务器- Power BI。
实际状况及存在的问题随着
项目的发展,出现了 2 个问题:
手动工作:在源中插入新表和列(在现有表中)导致需要在 RDS 服务器和 Power 中进行手动更改商业智能。每次开发新的设备通信并且是新的设备系列时,都会创建一个新表,或者如果我们需要在现有表中引入新传感器,因为传感器是表的标题。
实时数据实际的架构使得实时仪表板的实现变得困难。
带着这两大问题我们目前分析新的系统架构应该是: SQLServer(源)- DataLake - Snowflake(数据仓库)- Power BI(或任何应用程序)。 但是,这并不能解决 1) 中定义的手动工作。对于这个问题,我们希望将源重组为仅 2 个表:进程(相等)和传感器表(新)。这个新表将是一个狭窄的大表,其中包含所有不同设备传感器(超过 60 个设备)的数十亿时间戳,结构如下: 。 dbo.Sensors:
ProcessId | EquipmentID | SensorID | SensorValue | DateTime |
---|---|---|---|---|
209486 | 1010 | 1 | 2.5 | 2022.04.05 13:58:02 |
209486 | 1010 | 2 | 70 | 2022.04.05 13:58:02 |
209486 | 1010 | 3 | 250 | 2022.04.05 13:58:02 |
具有相应的传感器尺寸表(可以在 DataWarehouse 中创建):
SensorID | EquipmentID | SensorName | SensorUnit |
---|---|---|---|
1 | 1010 | 湿度 | % |
2 | 1010 | 温度 | ℃ |
3 | 1010 | 速度 | rpm |
那么,这是重组源和数据的更好方法吗?创建这个巨大的高桌子而不是继续当前的结构?至少它会解决新表或新列输入的问题。 另一方面,由于不断插入越来越多的设备和传感器,这张桌子的尺寸将变得巨大。
希望有人能为我们指出正确的方向。
I'm working on a shop floor Equipment Data Collection project which aims to analyze production orders historically and with real-time data (HMI close to the operator).
Actual database status:
Data is extracted from different equipment (with different protocols) and placed in an SQL server with the following structure:
PROCESS table: As the main table, whenever a batch (production unit) is started, a ProcessID is created as well as varied information:
ProcessID Room EquipmentID BatchID Program Operator Start End 209486 Room1 1010 985985 RecipeA Jim 2022.04.05 13:58:02 2022.04.05 15:58:02 Equipment family table: For each equipment family (mixers, ovens etc.) a table is created in which its sensor values (humidity, temperature, speed etc.) are collected every 5 seconds. Here is an example the BatchID above, where ProcessID = Mix ID on the equipment family table - dbo.Mixer :
MixID EquipmentID Humidity Temperature Speed DateTime 209486 1010 2.5 70 250 2022.04.05 13:58:02 209486 1010 2.6 73 215 2022.04.05 13:58:07 .... .... .... .... .... ....
So, the database is structured with a main PROCESS table and several equipment family tables that are being created during the project development (dbo.mixer, dbo.oven etc).
have the following data flow: SQLServer(source) - RDS Server - Power BI.
Problems of actual status & doubts
With the development of the project, 2 problems arise:
MANUAL WORK: Insertion, in the source, of new tables and columns (in existing tables) causes the need of manual alteration in the RDS server and in Power BI. Every time a new equipment communication is developed and is a new equipment family, a new table is created or if we need to introduce a new sensor in an existing table since the sensors are headers of the table.
Real-time data The actual architecture makes it difficult to implement real time dashboarding.
With these two big problems we are currently analyzing that the new system architecture should be:
SQLServer(source) - DataLake - Snowflake(DataWarehouse) - Power BI (or any application).
However, this won't solve the manual work defined in 1). For this problem we are looking to restructure the source to just 2 tables: Process (equal) and Sensors table(new). This new table would be a narrow big big big table with billions of timestamps of all the different equipment sensors (over 60 equipment), structured as follows:
. dbo.Sensors:
ProcessId | EquipmentID | SensorID | SensorValue | DateTime |
---|---|---|---|---|
209486 | 1010 | 1 | 2.5 | 2022.04.05 13:58:02 |
209486 | 1010 | 2 | 70 | 2022.04.05 13:58:02 |
209486 | 1010 | 3 | 250 | 2022.04.05 13:58:02 |
with a corresponding Sensor Dimension Table (could be created at DataWarehouse) :
SensorID | EquipmentID | SensorName | SensorUnit |
---|---|---|---|
1 | 1010 | Humidity | % |
2 | 1010 | Temperature | ºC |
3 | 1010 | Speed | rpm |
So, would it be a better way to restructure source and create this giant tall table rather than continuing current structure? At least it will solve the problem of new table or new columns input.
On the other hand, the size of this table will be enormous given that more and more equipment and more sensors are continually being inserted.
Hoping someone might point us in the right direction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论