根据位掩码查找mysql中的重复记录
我有一个 mysql 表,用于存储传感器的维护日志。我想设计一个查询,查找出于相同原因修复/维护给定传感器的实例。 (重复出现的问题查找器。)
我的表(简化)如下所示:
id name mask
== ==== ====
11 alpha 0011
12 alpha 0010
13 alpha 0100
14 beta 0001
掩码字段是一个位掩码,其中每个位置代表特定类型的修复。我能够成功地弄清楚如何比较位掩码(每个这个问题),但尝试将其合并到查询中被证明比我想象的更困难。
对于上述示例记录,只有 id 11 和 12 适用,因为它们在第三个掩码位置都有 1
。
这是我尝试过的方法以及为什么它不起作用:
1. Never finishes...
This query seems to run forever, I don't think it is working the way I want.
SELECT t1.id, t1.name
FROM data t1
LEFT OUTER JOIN data t2
ON (CONV(t1.mask,2,10) & CONV(t2.mask,2,10) > 0)
GROUP BY t1.name
HAVING COUNT(*) >1;
2.查询不完整...
我正在考虑创建一个视图,只解析表中实际有多个条目的传感器。我不知道接下来该去哪里。
SELECT COUNT(t1.name) AS times, t1.name, t1.id, t1.mask
FROM data AS t1
GROUP BY t1.name ASC
HAVING times > 1;
对此有什么建议吗?
I have a mysql table which stores maintenance logs for sensors. I'd like to design a query that finds instances where a given sensor was repaired/maintained for the same reason. (Recurring problem finder.)
My table (simplified) looks like this:
id name mask
== ==== ====
11 alpha 0011
12 alpha 0010
13 alpha 0100
14 beta 0001
The mask field is a bitmask where each position represents a particular type of repair. I was able to successfully figure out how to compare the bitmask (per this question) but trying to incorporate it into a query is proving more difficult than I thought.
Given the above sample records, only id's 11 and 12 apply, since they both have a 1
in the third mask position.
Here's what I've tried and why it didn't work:
1. Never finishes...
This query seems to run forever, I don't think it is working the way I want.
SELECT t1.id, t1.name
FROM data t1
LEFT OUTER JOIN data t2
ON (CONV(t1.mask,2,10) & CONV(t2.mask,2,10) > 0)
GROUP BY t1.name
HAVING COUNT(*) >1;
2. Incomplete query...
I was thinking of creating a view, to only parse the sensors that actually have more than one entry in the table. I wasn't sure where to go from here.
SELECT COUNT(t1.name) AS times, t1.name, t1.id, t1.mask
FROM data AS t1
GROUP BY t1.name ASC
HAVING times > 1;
Any suggestions on this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于数据库结构的设计没有考虑到 RDBM 的实际情况(可能不是你的做法,我只是必须说明这一点......),尽管可以编写一个能够完成的查询,但性能总是很差。
Jim 是正确的,查询结果是笛卡尔积。如果该查询以未分组和未过滤的方式返回,您可能会得到
(SELECT POW(COUNT(*), 2) FROM data)
结果。此外,任何形式的外连接都是不必要的,因此标准的内连接就是您想要的(并不是说它应该在性能方面产生影响,只是更合适)。另外,连接的另一个条件
t1.id != t2.id
也是必要的,以免每个记录与其自身匹配。您的不完整查询:
我无法告诉您其中哪一个表现最好。如果
data.name
已建立索引(我希望如此),则任一查询的成本都应该相当低。前者将缓存子选择的副本,而后者将针对索引执行多个查询。一种非常基本的优化(同时保持整个表结构不变)是将掩码字段转换为无符号整数数据类型,从而节省对 CONV() 的多次调用。
当然,进一步分解
数据确实更有意义。 现在,在
成为
name
和value
上战略性放置的索引使查询 一个蛋糕我希望这有帮助。
Since the database structure was not designed with the realities of RDBMs in mind (probably not your doing, I just have to make the point anyway…), the performance will always be poor, though it is possible to write a query that will finish.
Jim is correct in that the query results in a cartesian product. If that query were to be returned ungrouped and unfiltered, you could expect
(SELECT POW(COUNT(*), 2) FROM data)
results. Also, any form of outer join is unnecessary, so a standard inner join is what you want here (not that it ought to make a difference in terms of performance, it's just more appropriate).Also another condition of the join,
t1.id != t2.id
is necessary, lest each record match itself.Your incomplete query:
Off the top of my head I can't tell you which of those would perform best. If
data.name
is indexed (and I would hope it is), the cost for either query ought to be rather low. The former will cache a copy of the subselect, whereas the latter will perform multiple queries against the index.One very basic optimization (while leaving the table structure as a whole untouched) would be to convert the mask field to an unsigned integer data type, thereby saving many calls to
CONV()
.becomes
Of course, breaking the data down further does make more sense. Instead of storing a bitmask in one record, break out all the ones bits into separate records
would become
Now, a strategically placed index on
name
andvalue
makes the query a piece of cakeI hope that this helps.
将掩码位分解为实际列。 RDMB 不喜欢位字段。
Break the mask bits out in real columns. RDMBs don't like bit fields.
您的连接会产生表与其自身的笛卡尔积。将“t1.name=t2.name”添加到连接中,给出一堆(小得多)的笛卡尔积,每个唯一的名称一个,这将大大加快速度。
Your join results in a cartesian product of the table with itself. Add `t1.name=t2.name' to the join, giving a bunch of (much) smaller cartesian products, one per unique name, which will speed things up considerably.