如何使用 MySQL 和 PHP 比较一组数字并获得最相关的结果?
考虑一下:
set A: 1 2 3 4 set B: 3 4 5 6 set C: 4 5 6 7 set D: 1
我想将 D 与其余的进行比较,并得到一组最相关的数字。 结果应按以下顺序排列:4(因为 D 与 A 有一个公共数,并且 4 在 A 中,并且也在 B 和 C 中)、3(因为 D 与 A 有公共数,并且 3 在 A 和 B 中), 2(因为 D 与 A 具有共同的数字,并且 2 也在 A 中),然后是 5、6、7。
在 PHP/MySQL 中是否有某种算法可以有效地执行此操作?我不想重新发明轮子,而且数据库最终会有大量的集合。
Consider this:
set A: 1 2 3 4 set B: 3 4 5 6 set C: 4 5 6 7 set D: 1
I want to compare D with the rest and get as a result a set of numbers as most relevant.
The result should be in this order: 4 (as D has a common number with A and 4 is in A and also in B and C), 3 (as D has a common number with A and 3 is in A and B), 2 (as D has a common number with A and 2 is also in A), then 5, 6, 7.
Is there some algorithm to do this in an efficient way in PHP/MySQL? I don't want to reinvent the wheel, and also the database would eventually have a huge number of sets..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个例子并不能构成完整的规范。 您的答案会有什么不同
例如,如果集合的集合也包含在内,这将使 3 成为与
D
具有非空交集的集合中最常出现的值,那么 ?所以这是我的假设:给定一个目标集(原始示例中的
D
):在原始示例中,
A
与D
重叠,因此全域 {1, 2, 3, 4, 5, 6, 7} 被划分为重叠的 {1, 2 , 3, 4} 和不重叠的 {5, 6, 7}。值频率为 {1:2、2:1、3:2、4:3、5:2、6:2、7:1}。结合这些事实给出重叠频率 {1:2, 2:1, 3:2, 4:3} 和非重叠频率 {5:2, 6:2, 7:1},从而产生顺序 4, 3, 1、2,然后是 5、6、7。(我注意到您没有为 1 分配相关性。如果有意的话,这可能是从最终排序中删除目标集的值的最后一步。)在我的调整后例如,频率变为{1:4, 2:3, 3:4, 4:3, 5:2, 6:2, 7:1}。这给出了重叠频率 {1:4, 2:3, 3:4, 4:3} 和非重叠频率 {5:2, 6:2, 7:1},从而产生顺序 1, 3, 2, 4 后跟 5、6、7。
该算法的伪代码为:
初始化
重叠
和universe
为空集和频率
为空哈希。对于集合集合中的每个集合
s
(目标集合t
除外):2.1。将
universe
设置为s
和universe
的并集
2.2。如果
s
与t
相交至少有一个元素:<前><代码>2.2.1。将“overlapping”设置为“overlapping”和“s”的并集
2.3。对于
s
中的每个元素e
:<前><代码>2.3.1。如果“e”是“频率”中的键
2.3.1.1.然后将“频率”中“e”的值(计数)增加 1
2.3.1.2。否则将“频率”中“e”的值(计数)初始化为 1
将
nonOverlapping
设置为universe
和overlapping的差异
对
universe< 的元素进行排序/code> 按
频率
中的值作为结果的第一部分。将
nonOverlapping
的元素附加到结果中,也按其在频率
中的值排序。(如果您确实打算消除
t
的元素,我会将其作为 4 中的后处理步骤来执行。)One example doesn't make a complete specification. For example, how would your answer be different if the collection of sets also included
which would make 3 the most frequently-occurring value among sets that have non-empty intersection with
D
? So here are my assumptions:Given a target set (
D
in your original example):In your original example,
A
overlaps withD
, so the universe {1, 2, 3, 4, 5, 6, 7} is partitioned into overlapping {1, 2, 3, 4} and non-overlapping {5, 6, 7}. The value frequencies are {1:2, 2:1, 3:2, 4:3, 5:2, 6:2, 7:1}. Combining these facts gives overlapping frequencies {1:2, 2:1, 3:2, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 4, 3, 1, 2 followed by 5, 6, 7. (I notice that you didn't assign a relevance to 1. If deliberate, that can be a final step of removing values of the target set from the final ordering.)In my adjusted example, the frequencies become {1:4, 2:3, 3:4, 4:3, 5:2, 6:2, 7:1}. That gives overlapping frequencies {1:4, 2:3, 3:4, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 1, 3, 2, 4 followed by 5, 6, 7.
Pseudo-code for this algorithm is:
Initialize
overlapping
anduniverse
to be empty sets andfrequency
to be an empty hash.For each set
s
in the collection of sets (other thant
, the target set):2.1. Set
universe
to the union ofs
anduniverse
2.2. If
s
intersected witht
has at least one element:2.3. For each element
e
ins
:Set
nonOverlapping
to the difference ofuniverse
andoverlapping
Sort the elements of
universe
by their values infrequency
as the first part of the result.Append to the result the elements of
nonOverlapping
, also sorted by their values infrequency
.(If you did intend for elements of
t
to be eliminated, I'd do that as a post-processing step in 4.)在 SQL 中,我假设您有一个名为“集合”的表,其中有 2 列,e 表示元素,s 表示集合名称。
解释:
选择组D的元素。
选择与先前选择的组有共同成员的所有组。
然后从这些集合中选择所有元素,并按出现次数对它们进行排序(如 joel 建议的那样)
In SQL, I'll assume you have a table called sets, with 2 columns, e for the elements and s for the set name.
explanation:
selects the elements of group D.
selects all the groups that have common members with the previously selected group.
and then you select all elements from these sets, and order them by the number of appearances (as joel suggested)