数据库:何时拆分为单独的表?
假设我有两种不同类型的传感器:一种监视模拟电压(例如温度传感器),另一种测量某些东西是否打开或关闭(开关传感器)。
我无法决定是否拥有一张表:
[Sensor]
Id : PK
UpperLimit : FLOAT
UpperLimitAlertDelay : INT
LowerLimit : FLOAT
LowerLimitAlertDelay : INT
IsAnalog : BOOL
[SensorReading]
Id : PK
SensorId : FK
AnalogValue : FLOAT
IsOn : BOOL
或将其全部分成单独的表:
[AnalogSensor]
Id : PK
UpperLimit : FLOAT
UpperLimitAlertDelay : INT
LowerLimit : FLOAT
LowerLimitAlertDelay : INT
[AnalogSensorReadings]
Id : PK
AnalogSensorId : FK
Value : FLOAT
[SwitchSensor]
Id : PK
OnTooLongAlertDelay : INT
[SwitchSensorReadings]
Id : PK
SwitchSensorId : FK
IsOn : BOOL
目前我将其作为一张表,并且在不将其用作模拟传感器时使用“UpperLimitAlertDelay”作为“OnTooLongAlertDelay”。
在代码中,我通过传感器表上的布尔标志进行区分,并创建适当的对象(即 AnalogSensor 或 SwitchSensor),但我想知道在数据库级别将其分离出来是否会更整洁/更合适。
您会根据什么经验法则做出此类决定?它们在一个层面上是不同的实体,但在另一个层面上你可以说它们都只是传感器。
这常常是我在创建数据库时永远无法决定采取什么方向的地方。也许每当我使用 bool 来区分哪些字段意味着/应该使用时,它真的应该是一个单独的表吗?
关于这个主题或这个具体问题的一般想法值得赞赏。
谢谢!
编辑:一些进一步的信息。
开关传感器监控诸如门是否打开、冰箱压缩机是否运行、电器是否打开等信息。
可以在任何传感器上生成图表和报告,以便它们的使用方式相同;只是数据将根据类型打开/关闭或模拟值。
所以基本上他们的待遇都是一样的。
在读数表中,一个传感器的一个读数始终为一行。
到目前为止,这些观点似乎都很主观——我想这两种方式都各有利弊。
上述信息是否改变了任何人的观点?
谢谢! 标记。
Say if I have two different types of sensors: one monitors analog voltage (such as on a temperature sensor) and one measures whether something is on or off (switch sensor).
I can't decide whether to have one table:
[Sensor]
Id : PK
UpperLimit : FLOAT
UpperLimitAlertDelay : INT
LowerLimit : FLOAT
LowerLimitAlertDelay : INT
IsAnalog : BOOL
[SensorReading]
Id : PK
SensorId : FK
AnalogValue : FLOAT
IsOn : BOOL
OR separate it all out into separate tables:
[AnalogSensor]
Id : PK
UpperLimit : FLOAT
UpperLimitAlertDelay : INT
LowerLimit : FLOAT
LowerLimitAlertDelay : INT
[AnalogSensorReadings]
Id : PK
AnalogSensorId : FK
Value : FLOAT
[SwitchSensor]
Id : PK
OnTooLongAlertDelay : INT
[SwitchSensorReadings]
Id : PK
SwitchSensorId : FK
IsOn : BOOL
At the moment I have it as one table and I use the "UpperLimitAlertDelay" as the "OnTooLongAlertDelay" when not using it as the analog sensor.
In the code I differentiate by the boolean flag on the Sensor table and create the appropriate object (i.e. AnalogSensor or SwitchSensor) but I'm wondering if it'd be neater / more proper at the database level to separate it out.
What rule of thumb would you use for this kind of decision? They are different entities on one level, but on another level you could say they are both just sensors.
This is often where I can never decide what direction to take when creating a database. Maybe whenever I use a bool to differentiate what fields mean / should be used, it should really be a separate table?
General thoughts on this topic or this sepcific problem appreciated.
Thanks!
EDIT: Some further information.
The switch sensors monitor things like whether a door is open, a fridge compressor is running, whether an appliance is turned on, etc.
Graphs and reports can be generated on any sensor so they are used in the same way; it's just the data will either be on/off or an analog value depending on the type.
So basically they are generally treated the same.
In the readings table, it is always one row for ONE reading off of ONE sensor.
So far the opinions seem to be quiet subjective - I guess there are just pros and cons to both ways.
Does the information above change anybody's opinion?
Thanks!
Mark.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这是与您的其他问题?
在这种情况下,答案已在 中提供数据模型。
如果不是同一个应用程序/数据库,或者这个问题没有得到充分回答,请发帖或评论。例如。根据之前的信息,我对其进行了建模,以便
SensorType
表区分Sensor
(模拟值或布尔值)...但我们可以:在传感器级别区分它,
或将
Reading
分为子类型:ReadingAnalog
和ReadingSwitch
。这可以使生成图形等的程序变得更容易。Is this the same application/database as your other question ?
In which case, the answer has been supplied in that Data Model.
If it isn't the same app/db, or if this question has not been adequately answered, pls post or comment. Eg. Based on previous info, I modelled it so that the
SensorType
table differentiatesSensor
(analogue or boolean) ... but we could:differentiate it at the Sensor level,
or make
Reading
into subtypes:ReadingAnalog
andReadingSwitch
. That could make it a little easier for the programs that produce graphs, etc.表通常被分成逻辑上不同的“事物”,因此您不会两次拥有相同的“事物”。例如,您不希望:
因为您将传感器和它的读数混合到一行中。传感器与其读数是不同的东西:
我不喜欢的一件事是将“传感器”分成两个表。传感器就是传感器,就是这样。就像顾客就是顾客,歌曲就是歌曲一样。您将有一张歌曲表,而不是一张古典歌曲表,还有一张表存放其余所有歌曲。如果您确实将传感器分成两个表,则可以想象有两个具有相同
ID
的传感器。传感器是唯一的实体,它们应该位于同一个表中,并且都具有唯一的 ID。传感器是模拟的还是数字的这一事实是传感器的属性。您的问题很独特 - 您的传感器可以具有不同逻辑格式的读数;有些是模拟浮点值,其他是数字布尔值。当并非所有传感器读数都适合相同的逻辑列数据类型(即 float 与 bool)时,您正在努力解决如何存储传感器的“读数”。归根结底是实用性,以及什么对系统最有利。
您可以将所有读数存储在浮点数列中:
但现在您必须知道将
0
、1
的浮点值解释为逻辑,
关闭
。这很难做到吗?我个人不这么认为。确实,它没有充分利用数据库引擎中可用的数据类型,但我并不关心。您将把SensorReadings
与Sensors
连接起来,这样您就可以使用IsAnalog
列来帮助您解释。换句话说:将为您提供一个非常容易解析的结果集:
您甚至可以创建一个辅助视图(或只是一个查询),将读数解码为
AnalogReading
和DigitalReading
:这会给你:
这取决于谁必须处理结果。我可以很容易地想象代码首先检查“IsAnalog”列,然后根据需要读出
AnalogReading
或DigitalReading
。你可以按照你最初的建议去做;将它们分成多个表。但现在问题变成了:如何访问数据?在我看来,如果我有这个传感器读数系统,在某些时候我将不得不对它们做一些事情 - 将它们展示给用户。现在我必须跳过障碍才能重新加入数据:
给你
除了现在“Id”也很难解码,因为两个不同的读数可以具有相同的“ID”。读物就是读物,应该是独一无二的。
您可能正在寻找的折衷方案正是您最初拥有的。
是的,这会给您带来很多
(null)
值 - 但是将表重新连接在一起的费用是一个实际问题,必须在您的设计决策中考虑。我认为它就像 Windows 中的注册表。
键
包含一个值
。您并不真正关心该值如何存储,只要您可以读取它即可,因为类型在逻辑上是这样的。为了在数据库中实现这一点,我将使用多个数据类型列,并根据需要读取它们。Tables are usually split up to into logically distinct "things", so you don't have the same "things" twice. For example, you wouldn't want:
Because you're mixing the sensor and it's readings into one row. A sensor is a different thing than its readings:
One thing i wouldn't so is split up "sensors" into two tables. A sensor is a sensor, that is the thing that it is. Like a customer is a customer, or a song is a song. You would have a table of songs, not a table of classical songs and another table for all the rest. If you do split up sensors into two tables, you can conceivable have two sensors with the same
ID
. Sensors are unique entities, they should be in the same table, and all have a unique ID.The fact that a sensor is analog or digital is the property of a sensor.Your question is unique - your sensors can have Readings in different logical formats; some are analog floating point values, other are digital boolean values. You're struggling with how to store the "readings" of a sensor when not all sensor readings fit in the same logical column data type (i.e. float vs bool). It comes down to practicality, and what is best for the system.
You could store all the readings in a floating point number column:
But now you have to know to interpret a floating point value of
0
,1
as a logicalon
,off
. Is that difficult to do? i don't personally think so. True, it's not making full use of the data types available in the database engine, but i don't really care. You're going to joinSensorReadings
withSensors
, so you'll have theIsAnalog
column available to help you interpret. In other words:Will give you a pretty easy to parse results set:
You could even create a helper view (or just a query), that decodes the reading as
AnalogReading
andDigitalReading
:This would give you:
It depends who has to deal with the results. i can easily imagine code first checking the "IsAnalog" column, then reading out either the
AnalogReading
orDigitalReading
as appropriate.You could do what you originally suggested; split them up into multiple tables. But now the problem becomes: How do you access the data? It seems to me, if i were having this system of sensor readings, at some point i'm going to have to do something with them - show them to a user. Now i'd have to jump through hoops to rejoin the data:
giving you
Except now the "Id" is also hard to decode, because two different readings can have the same "ID". A reading is a reading, and should be unique.
A compromise you are probably looking for, is what you originally had.
Yes this leaves you with a lot of
(null)
values - but the expense of joining tables back together is a practical problem that has to factor into your design decision.i think of it like the registry in Windows. A
key
contains avalue
. You don't really care how that value is stored, as long as you can read it, as the type is logically is. To accomplish that in a database i would use multiple data-type columns, and read them as appropriate.问题是:从你的系统的角度来看,它们是同一件事吗?如果是这样,它们属于一张表。如果不是,则它们属于两个表。
通常这是理所当然的。 “雇员”和“保险计划”是两个不同的东西。 “名为 Bob 的员工”和“名为 Sally 的员工”是同一事物的两个实例。
有时更棘手。 “卡车”和“船”是两个不同的东西,还是它们都只是“车辆”的子类型?这取决于您的系统的观点。如果你出售它们,它们可能是同一件事。你可能不关心一个浮动而另一个不浮动,你只关心它们的成本是多少以及你有多少库存等等。也就是说,您保留有关它们的相同数据,并在相同的查询中使用它们。但是,如果您的系统管理一个捕鱼船队,对于船来说,您关心的是船员是谁、他们的工资是多少以及他们今天捕到了多少鱼,而对于卡车来说,您关心的是诸如它何时会出现在码头上之类的事情。码头接今天的渔获和你必须向货运公司支付每磅多少钱,它们可能是两件截然不同的事情。
确定它们是相同的迹象是:
如果这些不是真的,那么它们可能不是同样的事情。
也就是说,如果你发现对于类型 1,字段 A 将有值,字段 B 始终为空,而对于类型 2,字段 A 将为空,字段 B 将有值,那么他们就无法通过数据测试。
如果您发现将它们放在一张表中,则通常必须对所有查询添加类型检查,以便只能得到正确的查询,然后它们将无法通过数据测试。如果您得出结论,如果将它们放在两个单独的表中,则必须不断编写对两个表进行联接或并集的查询来获取这两个表,那么它们就通过了数据测试。
在您的示例中,您没有告诉我们您打算如何处理数据,因此我无法讨论查询测试。显然,两种类型的传感器有不同的数据——数字与开/关——所以马上就对两种不同的东西进行投票。但随后我们会回到这对您的系统有何实际影响。如果对于温度探头,您将生成温度随时间变化的图表或监视它们是否在特定范围内,而对于开/关开关,您将在它们打开时触发一个过程,并在它们关闭时停止它,它们可能是两个不同的事物。如果在这两种情况下,您都会在任何给定时间生成值报告(无论是数字还是开/关),那么它们可能是同一件事。
我倾向于认为它们可能有所不同,但在不了解更多信息的情况下,我真的不能说。
The question is: From the point of view of your system, are they the same thing? If so, they belong in one table. If not, they belong in two tables.
Usually this is a no-brainer. "Employee" and "Insurance Plan" are two different things. "Employee named Bob" and "Employee named Sally" are two instances of the same thing.
Sometimes it's trickier. Are "Truck" and "Boat" two different things, or are they both just subtypes of "Vehicle"? It depends on the point of view of your system. If you're selling them, they're probably the same thing. You probably don't care that one floats and the other doesn't, you just care how much they cost and how many you have in stock and the like. That is, you keep the same data about them and you use them in the same queries. But if your system manages a fishing fleet and for Boat you care about things like who the crew members are and how much they are paid and how many fish they caught today, and for Truck you care about things like when it will show up at the dock to pick up today's catch and how much you have to pay the trucking company per pound, they are likely two very different things.
Sure signs that they ARE the same thing are:
If these are not true, they are probably not the same thing.
That is, if you find that for type 1, field A will have a value and field B will always be null, while for type 2, field A will be null and field B will have a value, then they fail the data test.
If you find that if you put them in one table, you will routinely have to add a type check to all queries so you only get the right one, then they fail the data test. If you conclude that if you put them in two separate tables you would have to constantly write queries that do a join or a union on the two tables to pick up both, then they pass the data test.
In your example, you haven't told us what you intend to do with the data, so I can't discuss the query test. Apparently you have different data for the two types of sensors -- a number versus an on/off -- so right away that's a vote for two different things. But then we get back to how this actually matters to your system. If for temperature probes you will produce graphs of temperature over time or monitor whether they are within certain ranges, and for on/off switches you will trigger a process when they go on and stop it when they go off, they would probably be two different things. If in both cases you will produce reports of the value -- whether a number or an on/off -- at any given time, then they could be the same thing.
I'm inclined to think they're probably different, but without knowing more, I can't really say.
通常,您希望数据库设计中的冗余尽可能少。去查找普通形式,任何低于 BCNF 的东西通常都很难维护。有些应用程序使用更多的冗余来获得更高的读取性能,但却牺牲了清晰度和写入性能,例如数据仓库。
连接可能会很慢,但是当相同的信息存储两次时,它们比不一致的数据要好。
因此我建议使用较低的那个。假设您的传感器不再与完美的时间戳链接:突然,第一个布局建议严重失败。
Generally you want as little redundancy in database design as possible. Go look up Normal Forms, anything below BCNF is usually hard to maintain. Some applications use more redundancy to achieve more performance for reads, but sacrifice clarity and write performance for it, such as data warehouses.
Joins might be slow, but they are better than inconsistent data when the same information is stored twice.
I would therefore advise to use the lower one. Assume your sensors are not linked anymore with perfect time-stamps: Suddenly, the first layout suggestion fails badly.
我建议按照规范化规则进行操作。
根据您的需要,您可以选择 no-sql 数据库。
I suggest to act according to the normalisation rules.
Depending on your needs, you can choose a no-sql database.
这是执行对象关系映射时需要做出的相当标准的设计决策。
您提供的第一个选项称为“每个层次结构表”,第二个选项是“每个具体类表”。还有其他变体,例如将抽象类映射到它们自己的表。一些 OR 框架(例如 hibernate)提供现成的代码来实现其中一些模式。
在某些搜索中使用其中一些关键字应该可以为您提供更多信息。有很多权衡需要考虑。我想首先要考虑的是您可能拥有多少种不同类型的传感器。
另一件需要考虑的事情是报告。如果您要编写大量查询所有类型传感器的报告,那么每个类一个表将需要联合,这可能是不需要的。
This is a fairly standard design decision that needs to be made when peforming object-relational mapping.
The first option you present is known as table-per-hierarchy, and the second is table-per-concrete-class. There are other variations, such as mapping abstract classes to their own tables. Some O-R frameworks (e.g. hibernate) provide ready made code to implement some of these patterns.
Using some of these keywords in some searches should give you further information. There are lots of trade-offs to consider. I guess one of the first to think about is how many different types of sensor are you likely to have.
Another thing to consider is reporting. If you are going to writing a lot of reports that query all types of sensors, then a table-per-class would require unions, which may not be desired.
我怀疑这将取决于与未显示的其他实体的关系。如果有很多实体与一种类型的传感器相关,但与另一种类型的传感器无关,那么将它们分开可能是有意义的 - 否则,我会倾向于使用更简单的设计(即两个表方法,而不是比四表方法)。
我建议进行一些更改:
I suspect it will depend on relationships to other entities that are not shown. If there are a lot of entities that are related to one type of sensor but not the other, then it might make sense to split them out - otherwise, I would be inclined to use the simpler design (ie. the two table approach, rather than the four table approach).
A couple of changes I would suggest: