以标准化方式表示尺寸单位
假设您要向数据库写入某物长 30 米,或 50 英尺,或温度为 50 开尔文,速度为 50 公里每小时。 你会如何表示单位?
澄清一下,有两点:
- 任何类型的单位,而不是预定义的、明确定义的它们的子集。
- 我的问题更多地与单位本体的存在相关。我采用数据库示例是因为它是我第一个想到的,但用 XML 或 JSON 表示单位之类的情况同样可能发生。
Suppose you want to write into a database that something is 30 meters long, or 50 feet, or the temperature was 50 kelvin, the speed was 50 kilometers per hour.
How would you represent the units ?
To clarify, two points:
- any kind of units, not a predefined, well defined subset of them.
- my question is more relative to the existence of an ontology of units. I took the database example because it was the first that crossed my mind, but scenarios like representing the unit in XML or JSON are equally likely.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
关系数据库设计的基本概念之一是给定列中的所有值都应该表示某种逻辑上兼容的数据类型。形式上,一列只有一个类型,并且类型中的任何两个值都可以在相等谓词中相互比较。这是类型理论的重要组成部分。
因此,如果测量值不具有可比性(即长度与温度),则不应将它们存储在同一列中。
您可能需要查看 ISO 2955,“信息处理 - 表示”
字符集有限的系统中的 SI 和其他单位。”
另请参阅“Joe Celko 的 SQL 编程风格< /a>”,第 4 章,比例和测量。
One of the fundamental concepts of relational database design is that all values in a given column should represent some logically compatible type of data. Formally, a column has exactly one single type, and any two values in a type can be compared to each other in an equality predicate. This is a crucial part of type theory.
So if the measurements are not comparable, i.e. length vs. temperature, you shouldn't store them in the same column.
You might want to look at ISO 2955, "Information processing - Representation
of SI and other units in Systems with limited Character sets."
Also see "Joe Celko's SQL Programming Style," chapter 4, Scales and Measurements.
关系理论认为,每个相关变量(“表”)都有一个关联的谓词,用于定义其中元组的含义。该谓词应该是数据库正式文档的一部分,这样实际查阅文档的人就没有任何借口“误解了某些内容”(当然,除非文档不完整)。
在该谓词中包括单位的定义(例如“人的长度......是英尺。”,“测量的温度是......开尔文”,......)实现了完整性并避免不得不诉诸那些相当难看的东西属性(“列”)名称。
我不明白为什么“仅存储数字”(以所有用户都同意的标准单位)会“不容易”。
如果 foobaricity 作为一个单位存在,并且有人提出了一个新的单位 fluffyperception,那么某人首先必须正式建立 foobaricity 的数量和 fluffyperception 的数量之间的对应关系,否则他所说的任何内容都不会/不能被任何人理解。
编辑
我看到添加了这个:
“我需要保留原来单位的信息。”
没有什么可以阻止你这样做。 “规范化”值旁边有两个额外的列(原始数量和原始单位名称)。您可以根据需要将“原始单位名称”限制为强或松。
Relational theory has it that each relvar ("table") has an associated predicate that defines the meaning of the tuples therein. That predicate ought to be part of the formal documentation of the database, such that no one who actually consults the documentation can have any excuse for "having misunderstood something" (unless the documentation is incomplete of course).
Including the definition of units in that predicate (e.g. "The length of person ... is FEET.", "The measured temperature was ... KELVIN", ...) achieves that completeness and avoids having to resort to those rather ugly attribute ("column") names.
I don't understand why "just storing the numbers" (in a standard unit that is agreed upon by all users) would be "not easy".
If foobaricity exists as a unit, and someone comes up with a new unit fluffyperception, then that someone will first have to formally establish the correspondance between quantities of foobaricity and quantities of fluffyperception anyway, or nothing he states will/can be understood by anyone.
EDIT
I saw this added :
"I need to preserve the information about the original unit."
Nothing stops you from doing that. Two extra columns (original quantity and original unit name) alongside the "canonicalized" value. You can constrain "original unit name" as strong or as lax as you want.
您是否有特定的原因以不同类型的单位存储数量,而不是转换为某些“规范”单位(例如公制)?插入数据时,您需要将输入数量转换为规范单位。当读取数据时,您可以将其转换为您需要的任何输出单位。
这种方法在很多方面都比将数据存储在不同的单位中更简单,但是您会丢失有关指定数据的原始单位的信息。
Do you have a specific reason to store quantities in different types of units, instead of converting into some "canonical" units (e.g., the metric system)? When inserting data, you'd convert the input quantity into the canonical unit. And when reading data, you'd convert into whatever output unit you need.
This approach is simpler in many ways than storing data in different units, but you lose the information about the original unit in which the data was specified.
我将在列名称中包含单位(例如 LengthInMeters、WeightInKilograms、AnnoyingnessInFishSlapsPerSecond 等),然后将数字存储在列中。
理想情况下,能够将单位定义为列的(适当)属性会很好,但我不知道有任何数据库允许这样做。由于单位包含在列名称中,未来的开发人员很难对此感到困惑。
我遇到过在第二列中包含单位的数据库解决方案,但由于没有标准化的单位表示方式,因此最终要么是一个带有“ft.”、“feet”、“Feet”等值的文本字段。 ,或者是存储可能单位(也是文本)的表的 FK。无论哪种方式,运行 SUM 或 AVG 查询(或任何计算)都会成为一场噩梦,特别是如果您允许具有不同单位的值存储在同一列中。
I would include the units in the column name (e.g. LengthInMeters, WeightInKilograms, AnnoyingnessInFishSlapsPerSecond etc.), and then just store the numbers in the column.
Ideally, it would be nice to be able to define the unit as a (proper) property of the column, but I don't know of any database that allows this. With the unit included in the column name, it's difficult for future developers to become confused about this.
I've run into DB solutions that include the unit in a second column, but since there's no standardized way of representing units, this ends up being either a text field with values like "ft.", "feet" "Feet" etc., or else an FK to a table that stores possible units (also text). Either way, running SUM or AVG queries (or any calculation) becomes a nightmare, especially if you allow values with different units to be stored in the same column.