如何使用 MySQL 和 PHP 比较一组数字并获得最相关的结果?

发布于 2024-08-14 01:00:56 字数 320 浏览 5 评论 0原文

考虑一下:

set A: 1 2 3 4
set B:     3 4 5 6
set C:       4 5 6 7
set D: 1

我想将 D 与其余的进行比较,并得到一组最相关的数字。 结果应按以下顺序排列:4(因为 D 与 A 有一个公共数,并且 4 在 A 中,并且也在 B 和 C 中)、3(因为 D 与 A 有公共数,并且 3 在 A 和 B 中), 2(因为 D 与 A 具有共同的数字,并且 2 也在 A 中),然后是 5、6、7。

在 PHP/MySQL 中是否有某种算法可以有效地执行此操作?我不想重新发明轮子,而且数据库最终会有大量的集合。

Consider this:

set A: 1 2 3 4
set B:     3 4 5 6
set C:       4 5 6 7
set D: 1

I want to compare D with the rest and get as a result a set of numbers as most relevant.
The result should be in this order: 4 (as D has a common number with A and 4 is in A and also in B and C), 3 (as D has a common number with A and 3 is in A and B), 2 (as D has a common number with A and 2 is also in A), then 5, 6, 7.

Is there some algorithm to do this in an efficient way in PHP/MySQL? I don't want to reinvent the wheel, and also the database would eventually have a huge number of sets..

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

如梦亦如幻 2024-08-21 01:00:56

一个例子并不能构成完整的规范。 您的答案会有什么不同

set E: 1 2 3
set F: 1   3

例如,如果集合的集合也包含在内,这将使 3 成为与 D 具有非空交集的集合中最常出现的值,那么 ?所以这是我的假设:

给定一个目标集(原始示例中的D):

  1. “重叠集”(与目标集有非空交集的集)中的值比不相关的值更相关在那些重叠的集合中。
  2. 在陈述1的约束下,相关性由出现频率决定。

在原始示例中,AD 重叠,因此全域 {1, 2, 3, 4, 5, 6, 7} 被划分为重叠的 {1, 2 , 3, 4} 和不重叠的 {5, 6, 7}。值频率为 {1:2、2:1、3:2、4:3、5:2、6:2、7:1}。结合这些事实给出重叠频率 {1:2, 2:1, 3:2, 4:3} 和非重叠频率 {5:2, 6:2, 7:1},从而产生顺序 4, 3, 1、2,然后是 5、6、7。(我注意到您没有为 1 分配相关性。如果有意的话,这可能是从最终排序中删除目标集的值的最后一步。)

在我的调整后例如,频率变为{1:4, 2:3, 3:4, 4:3, 5:2, 6:2, 7:1}。这给出了重叠频率 {1:4, 2:3, 3:4, 4:3} 和非重叠频率 {5:2, 6:2, 7:1},从而产生顺序 1, 3, 2, 4 后跟 5、6、7。

该算法的伪代码为:

  1. 初始化重叠universe为空集和频率 为空哈希。

  2. 对于集合集合中的每个集合s(目标集合t除外):

    2.1。将 universe 设置为 suniverse

    的并集

    2.2。如果st相交至少有一个元素:

    <前><代码>2.2.1。将“overlapping”设置为“overlapping”和“s”的并集

    2.3。对于 s 中的每个元素 e

    <前><代码>2.3.1。如果“e”是“频率”中的键

    2.3.1.1.然后将“频率”中“e”的值(计数)增加 1
    2.3.1.2。否则将“频率”中“e”的值(计数)初始化为 1

  3. nonOverlapping设置为universeoverlapping的差异

  4. universe< 的元素进行排序/code> 按 频率 中的值作为结果的第一部分。

  5. nonOverlapping 的元素附加到结果中,也按其在频率中的值排序。

(如果您确实打算消除 t 的元素,我会将其作为 4 中的后处理步骤来执行。)

One example doesn't make a complete specification. For example, how would your answer be different if the collection of sets also included

set E: 1 2 3
set F: 1   3

which would make 3 the most frequently-occurring value among sets that have non-empty intersection with D? So here are my assumptions:

Given a target set (D in your original example):

  1. Values in "overlapping sets" (sets that have non-empty intersection with the target set) are more relevant that values not in those overlapping sets.
  2. Under the constraint of statement 1, relevance is determined by frequency of occurrence.

In your original example, A overlaps with D, so the universe {1, 2, 3, 4, 5, 6, 7} is partitioned into overlapping {1, 2, 3, 4} and non-overlapping {5, 6, 7}. The value frequencies are {1:2, 2:1, 3:2, 4:3, 5:2, 6:2, 7:1}. Combining these facts gives overlapping frequencies {1:2, 2:1, 3:2, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 4, 3, 1, 2 followed by 5, 6, 7. (I notice that you didn't assign a relevance to 1. If deliberate, that can be a final step of removing values of the target set from the final ordering.)

In my adjusted example, the frequencies become {1:4, 2:3, 3:4, 4:3, 5:2, 6:2, 7:1}. That gives overlapping frequencies {1:4, 2:3, 3:4, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 1, 3, 2, 4 followed by 5, 6, 7.

Pseudo-code for this algorithm is:

  1. Initialize overlapping and universe to be empty sets and frequency to be an empty hash.

  2. For each set s in the collection of sets (other than t, the target set):

    2.1. Set universe to the union of s and universe

    2.2. If s intersected with t has at least one element:

    2.2.1. Set `overlapping` to the union of `overlapping` and `s`
    

    2.3. For each element e in s:

    2.3.1. If 'e' is a key in `frequency`
    
        2.3.1.1. Then increase the value (count) for `e` in `frequency` by 1
        2.3.1.2. Else initialize the value (count) for `e` in `frequency` to 1
    
  3. Set nonOverlapping to the difference of universe and overlapping

  4. Sort the elements of universe by their values in frequency as the first part of the result.

  5. Append to the result the elements of nonOverlapping, also sorted by their values in frequency.

(If you did intend for elements of t to be eliminated, I'd do that as a post-processing step in 4.)

温柔嚣张 2024-08-21 01:00:56

在 SQL 中,我假设您有一个名为“集合”的表,其中有 2 列,e 表示元素,s 表示集合名称。

select e,count(*) as c from sets where s in
(select s from sets where e in (select e from sets where s='D') group by s)
group by e order by c desc

解释:

(select e from sets where s='D')

选择组D的元素。

(select s from sets where e in (select e from sets where s='D') group by s)

选择与先前选择的组有共同成员的所有组。

然后从这些集合中选择所有元素,并按出现次数对它们进行排序(如 joel 建议的那样)

In SQL, I'll assume you have a table called sets, with 2 columns, e for the elements and s for the set name.

select e,count(*) as c from sets where s in
(select s from sets where e in (select e from sets where s='D') group by s)
group by e order by c desc

explanation:

(select e from sets where s='D')

selects the elements of group D.

(select s from sets where e in (select e from sets where s='D') group by s)

selects all the groups that have common members with the previously selected group.

and then you select all elements from these sets, and order them by the number of appearances (as joel suggested)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文