当前位置：文江博客话题详情

PHP/MySQL - 分析多个集合中的公共集合

发布于 2024-12-22 17:01:26 字数 1009 浏览 0 评论 0原文

假设我有两个表：people 和 families。

families 有两个字段 - id 和 name。 name 字段包含家族姓氏。

people 具有三个字段 - id、family_id 和 name - family_id 是该人所属家庭的 id。 name 字段是该人的名字。

这基本上是一种一对多的关系，一个家庭有很多人。

我想要获取一个名称集列表，按家族中最大名称集出现次数最多的顺序排序。

这可能没有多大意义......

为了进一步解释我想要什么，我们可以对每组名称进行评分。 “分数”是数组大小 * 跨族出现的次数。

例如，假设两个名字“John”和“Jane”都存在于三个家庭中 - 该组的“分数”将为 2*3 = 6。

我怎样才能获得一组名称的数组以及该组的“分数” '，按每组得分排序？

示例结果集（我将其放在表格布局中，但这可能是 PHP 中的多维数组） - 请注意，这只是随机想到的，并不反映任何统计名称数据。

names              | occurrences | score
Ben, Lucy          | 4           | 8
Jane, John         | 3           | 6
James, Rosie, Jack | 2           | 6
Charlie, Jane      | 2           | 4

只是为了澄清，我对以下集合不感兴趣：

出现次数为 1（显然，只有一个家庭）。
集合大小为1（只是一个通用名称）。

我希望我已经解释了我有些复杂的问题 - 如果有人需要澄清，请说。

原文

Let's say I have two tables, people and families.

families has two fields - id and name. The name field contains the family surname.

people has three fields - id, family_id and name - The family_id is the id of the family that that person belongs to. The name field is that person's first name.

It's basically a one to many relationship with one family having many people.

I want to get a lists of name sets, ordered by the highest occurrence of the largest set of names across families.

That probably doesn't make much sense...

To explain what I want further, we can score each set of names. The 'score' is the array size * number of occurrences across families.

For example, let's say two names, 'John' and 'Jane' both existed in three families - That set's 'score' would be 2*3 = 6.

How could I get an array of sets of names, and the set's 'score', ordered by each set's score?

Sample Result Set (I've put it in a table layout, but this could be a multi-dimensional array in PHP) - Note this is just randomly thought up and doesn't reflect any statistical name data.

names              | occurrences | score
Ben, Lucy          | 4           | 8
Jane, John         | 3           | 6
James, Rosie, Jack | 2           | 6
Charlie, Jane      | 2           | 4

Just to clarify, I'm not interested in sets where:

The number of occurrences is 1 (obviously, just one family).
The set size is 1 (just a common name).

I hope I have explained my somewhat complex problem - if anyone needs clarification please say.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

黯然 2024-12-29 17:01:26

好的，明白了：

<?php
require_once('query.lib.php');

$db=new database(DB_TYPE,DB_HOST,DB_USER,DB_PASS,DB_MISC);
$qry=new query('set names utf8',$db);

//Base query, this filters out names that are in just one family
$sql='select name, cast(group_concat(family order by family) as char) as famlist, count(*) as num from people group by name having num>0 order by num desc';
$qry=new query($sql,$db);

//$qry->result is something like 
/*
Array
(
    [name] => Array
        (
            [0] => cathy
            [1] => george
            [2] => jack
            [3] => john
            [4] => jane
            [5] => winston
            [6] => peter
        )

    [famlist] => Array
        (
            [0] => 2,4,5,6,8
            [1] => 2,3,4,5,8
            [2] => 1,3,5,7,8
            [3] => 1,2,3,6,7
            [4] => 2,4,7,8
            [5] => 1,2,6,8
            [6] => 1,3,6
        )

    [num] => Array
        (
            [0] => 5
            [1] => 5
            [2] => 5
            [3] => 5
            [4] => 4
            [5] => 4
            [6] => 3
        )

)

$qry->rows=7
*/

//Initialize
$names=$qry->result['name'];
$rows=$qry->rows;
$lists=array();
for ($i=0;$i<$rows;$i++) $lists[$i]=explode(',',$qry->result['famlist'][$i]);

//Walk the list and populate pairs - this filters out pairs, that are specific to only one family
$tuples=array();
for ($i=0;$i<$rows;$i++) {
  for ($j=$i+1;$j<$rows;$j++) {
    $isec=array_intersect($lists[$i],$lists[$j]);
    if (sizeof($isec)>1) {
      //Every tuple consists of the name-list, the family list, the length and the latest used name 
      $tuples[]=array($names[$i].'/'.$names[$j],$isec,2,$j);
    }
  }
}

//Now walk the tuples again rolling forward, until there is nothing left to do
//We do not use a for loop just for style
$i=0;
while ($i<sizeof($tuples)) {
  $tuple=$tuples[$i];
  //Try to combine this tuple with all later names
  for ($j=$tuple[3]+1;$j<$rows;$j++) {
    $isec=array_intersect($tuple[1],$lists[$j]);
    if (sizeof($isec)>0) $tuples[]=array($tuple[0].'/'.$names[$j],$isec,$tuple[2]+1,$j);
  }
  $i++;
}

//We have all the tuples, now we just need to extract the info and prepare to sort - some dirty trick here!
$final=array();
while (sizeof($tuples)>0) {
  $tuple=array_pop($tuples);
  //name list is in $tuple[0]
  $list=$tuple[0];
  //count is sizeof($tuple[1])
  $count=sizeof($tuple[1]);
  //length is in $tuple[2]
  $final[]=$tuple[2]*$count."\t$count\t$list";
}

//Sorting and output is all that is left
rsort($final);
print_r($final);
?>

很抱歉，我刚刚意识到我使用了一个无法在此处获取的查询库，但是从注释中，您将可以轻松地创建数组，如“初始化”部分中所示。

基本上，我所做的就是从我保留当前名称列表中所有名称所属的家族数组的对开始，然后将其与所有尚未尝试的名称相交。

OK, got it:

<?php
require_once('query.lib.php');

$db=new database(DB_TYPE,DB_HOST,DB_USER,DB_PASS,DB_MISC);
$qry=new query('set names utf8',$db);

//Base query, this filters out names that are in just one family
$sql='select name, cast(group_concat(family order by family) as char) as famlist, count(*) as num from people group by name having num>0 order by num desc';
$qry=new query($sql,$db);

//$qry->result is something like 
/*
Array
(
    [name] => Array
        (
            [0] => cathy
            [1] => george
            [2] => jack
            [3] => john
            [4] => jane
            [5] => winston
            [6] => peter
        )

    [famlist] => Array
        (
            [0] => 2,4,5,6,8
            [1] => 2,3,4,5,8
            [2] => 1,3,5,7,8
            [3] => 1,2,3,6,7
            [4] => 2,4,7,8
            [5] => 1,2,6,8
            [6] => 1,3,6
        )

    [num] => Array
        (
            [0] => 5
            [1] => 5
            [2] => 5
            [3] => 5
            [4] => 4
            [5] => 4
            [6] => 3
        )

)

$qry->rows=7
*/

//Initialize
$names=$qry->result['name'];
$rows=$qry->rows;
$lists=array();
for ($i=0;$i<$rows;$i++) $lists[$i]=explode(',',$qry->result['famlist'][$i]);

//Walk the list and populate pairs - this filters out pairs, that are specific to only one family
$tuples=array();
for ($i=0;$i<$rows;$i++) {
  for ($j=$i+1;$j<$rows;$j++) {
    $isec=array_intersect($lists[$i],$lists[$j]);
    if (sizeof($isec)>1) {
      //Every tuple consists of the name-list, the family list, the length and the latest used name 
      $tuples[]=array($names[$i].'/'.$names[$j],$isec,2,$j);
    }
  }
}

//Now walk the tuples again rolling forward, until there is nothing left to do
//We do not use a for loop just for style
$i=0;
while ($i<sizeof($tuples)) {
  $tuple=$tuples[$i];
  //Try to combine this tuple with all later names
  for ($j=$tuple[3]+1;$j<$rows;$j++) {
    $isec=array_intersect($tuple[1],$lists[$j]);
    if (sizeof($isec)>0) $tuples[]=array($tuple[0].'/'.$names[$j],$isec,$tuple[2]+1,$j);
  }
  $i++;
}

//We have all the tuples, now we just need to extract the info and prepare to sort - some dirty trick here!
$final=array();
while (sizeof($tuples)>0) {
  $tuple=array_pop($tuples);
  //name list is in $tuple[0]
  $list=$tuple[0];
  //count is sizeof($tuple[1])
  $count=sizeof($tuple[1]);
  //length is in $tuple[2]
  $final[]=$tuple[2]*$count."\t$count\t$list";
}

//Sorting and output is all that is left
rsort($final);
print_r($final);
?>

I am sorry I just realized I use a query lib that I can't source in here, but from the comment you will easily be able to create the arrays as in the section "Initialize".

Basically what I do is starting with the pairs I keep an array of the families all the names in the current name list belong to, then intersect it with all not-yet tried names.

回复收藏 0 原文

守不住的情 2024-12-29 17:01:26

这行得通吗？

SELECT
    f.name AS 'surname',
    GROUP_CONCAT(DISTINCT p.name ORDER BY p.name) AS 'names',
    COUNT(DISTINCT p.name) AS 'distinct_names',
    COUNT(p.id) AS 'occurrences',
    COUNT(DISTINCT p.name) * COUNT(p.id) AS 'score'
FROM
    families f
    LEFT JOIN people p ON ( f.id = p.family_id )
GROUP BY
    f.id
ORDER BY
    f.name

Will this work?

SELECT
    f.name AS 'surname',
    GROUP_CONCAT(DISTINCT p.name ORDER BY p.name) AS 'names',
    COUNT(DISTINCT p.name) AS 'distinct_names',
    COUNT(p.id) AS 'occurrences',
    COUNT(DISTINCT p.name) * COUNT(p.id) AS 'score'
FROM
    families f
    LEFT JOIN people p ON ( f.id = p.family_id )
GROUP BY
    f.id
ORDER BY
    f.name

回复收藏 0 原文

~没有更多了~