表示关系数据库中的数字范围 (MySQL)
我试图了解是否有任何标准的最佳实践方法可以在关系数据库(在本例中为 MySQL)中对数字范围进行建模,以及这实际上是否是明智的做法。
我将解释引发该问题的任务的背景。
我目前正在设计一个数据库,该数据库将模拟向客户分配标识符池的情况。
潜在标识符池的范围从 0 到大约 2^30
可以为给定客户分配任意数量的标识符,从单个标识符到多个连续块中的数百万个标识符。
给定的标识符只能分配给单个客户(即,它是一对多关系)
显然,将有一个客户表和一个包含客户键的标识符表。
复杂性在于如何对标识符进行建模:
选项一是用一行代表单个标识符。这将导致表中可能出现大量行,但会使得搜索谁拥有哪个标识符以及给定标识符是否正在使用变得微不足道。
第二个(我认为更有希望)选择是让一行代表一系列具有最小值和最大值的值。这将使查询变得更加复杂(我假设用于检查标识符是否正在使用的查询将查询“最小值低于 X”和“最大值高于 X”的范围),但会导致远行数更少,可能更容易管理和更新。
我欢迎任何关于这是否是一个好方法的意见,如果不是,是否有我缺少的明显更好的方法。
I'm trying to understand if there are any standard best practice approaches for modelling number ranges in a Relational Database (In this case MySQL) and if this is in fact a sensible thing to do.
I shall explain the task that prompted the question for context.
I'm currently in the process of designing a database which will model the allocation of a pool of Identifiers to Customers.
The pool of potential Identifiers has a range from 0 to about 2^30
A given customer could be allocated any number of Identifiers from a single Identifier to millions in multiple contiguous blocks.
A given Identifier may only be allocated to a single customer (i.e. it is a one to many relationship)
Clearly there will be a Customer table and and an Identifier table containing the Customer key.
The complexity comes with how to model the Identifiers:
Option one would be to have a row represent single identifier. This will result in a potentially huge number of rows in the table, but would make searching for who owns which identifier and if a given identifier is in use trivial.
The second (and I think more promising) option would be to have a row represent a range of values with a minimum and maximum value. This would make queries a bit more complex (I'm assuming the query for checking if an identifier was in use would be to query for ranges with "Minimum lower than X" and a "Maximum higher than X") but would result in far fewer rows and would likely be easier to manage and update.
I would welcome any views on if this is a good approach and if not if there is an obvious better approach that I am missing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果范围不相交,则您可以将它们存储为
INT
值对:要查询号码所属的客户,请使用以下命令:
If the ranges do not intersect, then you may store them as pairs of
INT
values:To query the customer a number belongs to, use this:
如果我理解正确,您需要使用多个范围,这可能会变得棘手。您可能想查看 PostgreSQL 9.2 范围类型。它们看起来与您想要做的事情相关。
在现实世界中,范围可以重叠、相互包含或不重叠,并且它们可以是开放的或封闭的,使得范围检查查询可能复杂且容易出错。范围类型消除了大部分复杂性,并且它们由索引本身支持。
https://wiki.postgresql.org/images/7 /73/Range-types-pgopen-2012.pdf
最良好的祝愿,
尼克
If I understand you correctly you're needing to work with multiple ranges, which could get tricky. You might want to look at PostgreSQL 9.2 range types. They look relevant to what you're trying to do.
In the real world ranges can overlap, contain each other or not overlap, and they can be open or closed, making range-checking queries potentially complex and error-prone. Range types remove most of this complexity and they're supported natively by indexing.
https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf
Best wishes,
Nick
通常,我不会仅仅为了减少行数而尝试减少行数 - 原则上,只要您的查询命中,具有 10 亿行的索引良好的表应该与具有 100 行的表一样快索引。
我会更多地处理您可能想要运行的实际查询,并在此基础上设计解决方案。例如,您想要列出属于单个客户的所有 ID 吗?您想检查哪个客户拥有多个 ID 吗?您想知道客户拥有多少个 ID 吗?
如果您有“范围”表,则后者有点棘手 - 您必须计算每个范围中的 IP 数量,而不是执行
“从 customer = 1 的范围中选择 count(*)”
为客户,并将它们相加。不是火箭科学,但在现实世界中可能会慢一些......Normally, I wouldn't try to reduce the number of rows just for the sake of it - in principle, a well-indexed table with a billion rows should be just as quick as a table with 100 rows, as long as your queries hit the index.
I'd work some more on the actual queries you are likely to want to run, and design the solution on that basis. For instance, would you want to list all the IDs that belong to a single customer? Would you want to check which customer owns several IDs? Would you want to find how many IDs a customer owns?
The latter is a little tricky if you have "range" tables - instead of doing
"select count(*) from ranges where customer = 1"
, you'd have to calculate the number IPs in each range for the customer, and add them up. Not rocket science, but might be slower in the real world...如果您创建一个像这样的表
table ids
来简单地检查可用密钥
现在您可以通过执行To check a free range do
If you make a table like so
table ids
Now you can simply check for a free key by doing
To check a free range do
一种可能性是使用正则表达式来表示标识符池,根据需要在字符串和数字之间进行转换。这里的问题是找到给定标识符列表的正则表达式。这可以使用 Aho-Corasick 算法自动化。仅当这些 ID 池看起来大部分相同时,这才实用。显然,如果它们是随机分配的,那么将很难找到比一长串 ORd 文字更好的正则表达式。
One possibility is to use a regular expression to represent the pool of identifiers, casting between strings and numbers as needed. The problem here is to find a regular expression for a given list of identifiers. This might be automated using the Aho–Corasick algorithm. This is only practical if these pools of IDs mostly look the same. Obviously if they are randomly assigned, then its going to be hard to find a regular expression much better than a long list of ORd literals.