我们如何找到 MySQL 中顺序编号的间隙?
我们有一个数据库,其中有一个表,其值是从另一个系统导入的。有一个自增列,并且没有任何重复值,但有缺失值。例如,运行此查询:
select count(id) from arrc_vouchers where id between 1 and 100
应返回 100,但它返回 87。我可以运行一个查询来返回缺失数字的值吗?例如,可能存在 id 1-70 和 83-100 的记录,但没有任何 id 为 71-82 的记录。我想返回71、72、73等,
可以吗?
We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(16)
更好的答案
JustPlainMJS 提供了一个更好的答案表现。
(不是尽可能快)答案
这是一个适用于任何大小的表(不仅仅是 100 行)的版本:
gap_starts_at
- 当前间隙中的第一个 idgap_ends_at
- 当前间隙中的最后一个 IDA better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
gap_starts_at
- first id in current gapgap_ends_at
- last id in current gap这正好帮助我找到超过 80k 行的表中的间隙:
结果:
请注意,
expected
和got
列的顺序至关重要。如果您知道
YourCol
不是从 1 开始,这并不重要,您可以替换为
新结果:
如果您需要对丢失的 ID 执行某种 shell 脚本任务,您可以还可以使用此变体直接生成一个可以在 Bash 中迭代的表达式。
这会产生如下输出,然后
您可以将其复制并粘贴到 bash 终端中的 for 循环中,以便为每个 ID 执行命令
。这与上面的内容相同,只是它既可读又可执行。通过更改上面的“CONCAT”命令,可以生成其他编程语言的语法。或者甚至是 SQL。
This just worked for me to find the gaps in a table with more than 80k rows:
Result:
Note that the order of columns
expected
andgot
is critical.If you know that
YourCol
doesn't start at 1 and that doesn't matter, you can replacewith
New result:
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
This produces an output like so
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.
如果您使用 MariaDB 数据库,则可以使用 < 更快 (800%) 的选项a href="https://mariadb.com/kb/en/mariadb/sequence/" rel="nofollow noreferrer">序列存储引擎:
If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
一个快速而肮脏的查询应该可以解决这个问题:
这将为您提供一个表格,显示上面缺少 ids 的 id 和 next_id em> 存在,以及之间缺少多少个...例如,
A quick-and-dirty query that should do the trick:
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
创建一个包含 100 行和一列(包含值 1-100)的临时表。
外部 将此表连接到您的 arrc_vouchers 表,并选择 arrc_vouchers id 为 null 的单列值。
这应该有效:
Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
如果存在一个序列,两个数字之间的间隙最大为 1(例如
1,3,5,6) 那么可以使用的查询是:
source1
id
If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
source1
id
另一种需要查询 + 一些代码执行某些处理的解决方案是:
请注意,查询不包含任何我们知道 MySQL 规划器无法高效处理的子选择。
这将返回每个centralValue (cValue) 的一个条目,该条目没有较小的值(lValue) 或较大的值(rValue),即:
无需进一步详细说明(我们将在下一段中看到它们),此输出意味着:
所以基本的想法是对同一个表进行 RIGHT 和 LEFT 连接,看看我们是否可以每个值都有相邻值(即,如果中心值为“3”,则我们检查左侧是否有 3-1=2,右侧是否有 3+1),并且当 ROW 在右侧或左侧有 NULL 值时,我们就知道那里没有相邻值。
我的表的完整原始输出是:
一些注释:
An alternative solution that requires a query + some code doing some processing would be:
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
Without going into further details (we'll see them in next paragraphs) this output means that:
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
Some notes:
我尝试了以不同的方式,我发现的最佳性能就是这么简单query:
... 一个左连接检查下一个id是否存在,只有如果没有找到下一个if,则子查询查找存在的下一个id以找到间隙的末尾。我这样做是因为使用等于 (=) 的查询比大于 (>) 的查询性能更好操作员。
使用 sqlfiddle 与其他查询相比,它没有显示出不同的性能,但是在真实的数据库中,上面的查询结果比其他查询快 3 倍。
架构:
遵循下面我为比较性能而进行的所有查询:
您可以使用此 SQL Fiddle:
http://sqlfiddle.com/#!9/ 6bdca7/1
I tried it in a different manner, and the best performance that I found was this simple query:
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
Follow below all the queries that I made to compare the performance:
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1
它可能不相关,但我一直在寻找类似的东西来列出数字序列中的间隙,并发现这篇文章有多种不同的解决方案,具体取决于您正在寻找的内容。我正在寻找序列中的第一个可用间隙(即下一个可用数字),这似乎工作正常。
从 2005 年开始,那里讨论了其他几个场景和解决方案!
如何查找使用 SQL 的序列中缺少值
It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL
一个简单但有效的解决方案来查找丢失的自动增量值:
A simple, yet effective, solution to find the missing auto-increment values:
虽然这些看起来都有效,但是当有 50,000 条记录时,结果集返回的时间非常长。
我使用了这个,它找到了间隙或下一个可用的(上次使用的 + 1),并且从查询中返回的速度要快得多。
Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
基于马特给出的答案,此存储过程允许您指定要测试以查找非连续记录的表和列名称 - 从而回答原始问题并演示如何使用 @var 来表示表和/或列存储过程。
Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use @var to represent tables &/or columns in a stored procedure.
另一个简单的答案可以找出差距。我们执行仅选择奇数的查询,然后将其右连接到包含所有偶数的查询。只要你没有丢失 id 1;这应该会给你一个完整的列表,说明差距从哪里开始。
您仍然需要查看数据库中的该位置来找出差距有多少个数字。我发现这种方法比建议的解决方案更容易,并且更容易根据独特的情况进行定制。
Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
这对我有用:
This works for me:
从用户933161发布的评论,
更好的是它不会对记录列表的末尾产生误报。 (我不确定为什么这么多人使用左外连接。)
另外,
如果 # 是间隙的起始值,则将填充该起始值。 (如果某些字段不能为空,则必须添加具有虚拟值的字段。)
您可以交替查询起始值和填充每个起始值,直到起始值查询返回空集。
当然,只有当您使用足够小的数据集并且手动迭代是合理的时,这种方法才有用。我对 phpMyAdmin 这样的事情了解不够,无法想出自动化更大的方法设置有更多更大的间隙。
Starting from the comment posted by user933161,
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.