根据位掩码查找mysql中的重复记录

发布于 2024-09-12 11:52:13 字数 1113 浏览 8 评论 0原文

我有一个 mysql 表，用于存储传感器的维护日志。我想设计一个查询，查找出于相同原因修复/维护给定传感器的实例。（重复出现的问题查找器。）

我的表（简化）如下所示：

id    name     mask
==    ====     ====
11    alpha    0011
12    alpha    0010
13    alpha    0100
14    beta     0001

掩码字段是一个位掩码，其中每个位置代表特定类型的修复。我能够成功地弄清楚如何比较位掩码（每个这个问题），但尝试将其合并到查询中被证明比我想象的更困难。

对于上述示例记录，只有 id 11 和 12 适用，因为它们在第三个掩码位置都有 1。

这是我尝试过的方法以及为什么它不起作用：

1. Never finishes...
This query seems to run forever, I don't think it is working the way I want.

SELECT t1.id, t1.name
FROM data t1
LEFT OUTER JOIN data t2
ON (CONV(t1.mask,2,10) & CONV(t2.mask,2,10) > 0)
GROUP BY t1.name
HAVING COUNT(*) >1;

2.查询不完整...
我正在考虑创建一个视图，只解析表中实际有多个条目的传感器。我不知道接下来该去哪里。

SELECT COUNT(t1.name) AS times, t1.name, t1.id, t1.mask
FROM data AS t1
GROUP BY t1.name ASC
HAVING times > 1;

对此有什么建议吗？

原文

I have a mysql table which stores maintenance logs for sensors. I'd like to design a query that finds instances where a given sensor was repaired/maintained for the same reason. (Recurring problem finder.)

My table (simplified) looks like this:

id    name     mask
==    ====     ====
11    alpha    0011
12    alpha    0010
13    alpha    0100
14    beta     0001

The mask field is a bitmask where each position represents a particular type of repair. I was able to successfully figure out how to compare the bitmask (per this question) but trying to incorporate it into a query is proving more difficult than I thought.

Given the above sample records, only id's 11 and 12 apply, since they both have a 1 in the third mask position.

Here's what I've tried and why it didn't work:

1. Never finishes...
This query seems to run forever, I don't think it is working the way I want.

SELECT t1.id, t1.name
FROM data t1
LEFT OUTER JOIN data t2
ON (CONV(t1.mask,2,10) & CONV(t2.mask,2,10) > 0)
GROUP BY t1.name
HAVING COUNT(*) >1;

2. Incomplete query...
I was thinking of creating a view, to only parse the sensors that actually have more than one entry in the table. I wasn't sure where to go from here.

SELECT COUNT(t1.name) AS times, t1.name, t1.id, t1.mask
FROM data AS t1
GROUP BY t1.name ASC
HAVING times > 1;

Any suggestions on this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぽ尐不点ル 2024-09-19 11:52:13

由于数据库结构的设计没有考虑到 RDBM 的实际情况（可能不是你的做法，我只是必须说明这一点......），尽管可以编写一个能够完成的查询，但性能总是很差。

Jim 是正确的，查询结果是笛卡尔积。如果该查询以未分组和未过滤的方式返回，您可能会得到 (SELECT POW(COUNT(*), 2) FROM data) 结果。此外，任何形式的外连接都是不必要的，因此标准的内连接就是您想要的（并不是说它应该在性能方面产生影响，只是更合适）。

另外，连接的另一个条件 t1.id != t2.id 也是必要的，以免每个记录与其自身匹配。

SELECT t1.id, t1.name
FROM data t1
JOIN data t2
     ON t1.name = t2.name
     AND t1.id != t2.id     //
WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0
GROUP BY t1.name
HAVING COUNT(*) > 1;

您的不完整查询：

SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE t1.name IN (SELECT t2.name FROM data t2 GROUP BY t2.name HAVING COUNT(*) > 1);

SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE EXISTS (SELECT 1 FROM data t2 WHERE t2.name = t1.name GROUP BY t2.name HAVING COUNT(*) > 1);

我无法告诉您其中哪一个表现最好。如果 data.name 已建立索引（我希望如此），则任一查询的成本都应该相当低。前者将缓存子选择的副本，而后者将针对索引执行多个查询。

一种非常基本的优化（同时保持整个表结构不变）是将掩码字段转换为无符号整数数据类型，从而节省对 CONV() 的多次调用。

WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0

当然，进一步分解

WHERE t1.mask & t2.mask > 0

数据确实更有意义。现在，在

id    name     mask
==    ====     ====
11    alpha    1101

成为

id    name    value
==    ====    =====
11    alpha       1
12    alpha       4
13    alpha       8

name 和 value 上战略性放置的索引使查询一个蛋糕

SELECT name, value
FROM data
GROUP BY name, value
HAVING COUNT(*) > 1;

我希望这有帮助。

Since the database structure was not designed with the realities of RDBMs in mind (probably not your doing, I just have to make the point anyway…), the performance will always be poor, though it is possible to write a query that will finish.

Jim is correct in that the query results in a cartesian product. If that query were to be returned ungrouped and unfiltered, you could expect (SELECT POW(COUNT(*), 2) FROM data) results. Also, any form of outer join is unnecessary, so a standard inner join is what you want here (not that it ought to make a difference in terms of performance, it's just more appropriate).

Also another condition of the join, t1.id != t2.id is necessary, lest each record match itself.

SELECT t1.id, t1.name
FROM data t1
JOIN data t2
     ON t1.name = t2.name
     AND t1.id != t2.id     //
WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0
GROUP BY t1.name
HAVING COUNT(*) > 1;

Your incomplete query:

SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE t1.name IN (SELECT t2.name FROM data t2 GROUP BY t2.name HAVING COUNT(*) > 1);

SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE EXISTS (SELECT 1 FROM data t2 WHERE t2.name = t1.name GROUP BY t2.name HAVING COUNT(*) > 1);

Off the top of my head I can't tell you which of those would perform best. If data.name is indexed (and I would hope it is), the cost for either query ought to be rather low. The former will cache a copy of the subselect, whereas the latter will perform multiple queries against the index.

One very basic optimization (while leaving the table structure as a whole untouched) would be to convert the mask field to an unsigned integer data type, thereby saving many calls to CONV().

WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0

becomes

WHERE t1.mask & t2.mask > 0

Of course, breaking the data down further does make more sense. Instead of storing a bitmask in one record, break out all the ones bits into separate records

id    name     mask
==    ====     ====
11    alpha    1101

would become

id    name    value
==    ====    =====
11    alpha       1
12    alpha       4
13    alpha       8

Now, a strategically placed index on name and value makes the query a piece of cake

SELECT name, value
FROM data
GROUP BY name, value
HAVING COUNT(*) > 1;

I hope that this helps.

回复收藏 0 原文

南渊 2024-09-19 11:52:13

将掩码位分解为实际列。 RDMB 不喜欢位字段。

回复收藏 0 原文

掐死时间 2024-09-19 11:52:13

您的连接会产生表与其自身的笛卡尔积。将“t1.name=t2.name”添加到连接中，给出一堆（小得多）的笛卡尔积，每个唯一的名称一个，这将大大加快速度。

回复收藏 0 原文

~没有更多了~

关于作者

羁拥

暂无简介

文章

669 人气

关注发私信

友情链接

文江博客

根据位掩码查找mysql中的重复记录

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

根据位掩码查找mysql中的重复记录

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。