PostgreSQL 中多个表的多个总和/计数

发布于 2024-11-26 19:06:10 字数 2085 浏览 3 评论 0原文

我已经在这个网站上搜索了一些建议，但还没有完全得到我想要的东西。我怀疑我只是缺少一个语法/标点符号问题。

我使用 phpPgAdmin 开发一个数据库，该数据库跟踪与正在研究的狒狒种群相关的大量信息。我正在尝试进行查询，以确定对于每只狒狒，我们为它们收集了多少不同类型的组织样本，以及我们为每只狒狒收集了多少不同类型的 DNA 样本。有三个表与我的问题：

表：“传记”包含有关该组中所有动物的基本信息，尽管名称是我在这里关心的全部。

name | birth
-----+-----------
A21  | 1968-07-01
AAR  | 2002-03-30
ABB  | 1998-09-10
ABD  | 2005-03-15
ABE  | 1986-01-01

表：“babtissue”跟踪多年来收集的不同组织的信息，包括以下三列。该表中的一些行代表我们不再拥有的组织样本，但仍然在数据库的其他地方被引用，因此“可用”列帮助我们筛选我们仍然拥有的样本。

name | sample_type | avail
-----+-------------+------
A21  | BLOOD       | Y
A21  | BLOOD       | Y
A21  | TISSUE      | N
ABB  | BLOOD       | Y
ABB  | TISSUE      | Y

表：“dna”与 babtissue 类似。

name | sample_type | avail
-----+-------------+------
ABB  | GDNA        | N
ABB  | WGA         | Y
ACC  | WGA         | N
ALE  | GDNA        | Y
ALE  | GDNA        | Y

总而言之，我正在尝试编写一个查询，该查询将返回传记中的每个名字，并在一列中告诉我每个人有多少个“BLOOD”、“TISSUE”、“GDNA”和“WGA”样本。就像...

name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21  | 2          | 0           | 0     | 0    | ?
AAR  | 0          | 0           | 0     | 0    | ?
ABB  | 1          | 1           | 0     | 1    | ?
ACC  | 0          | 0           | 0     | 0    | ?
ALE  | 0          | 0           | 2     | 0    | ?

（对上面奇怪的格式表示歉意，我不太熟悉这种写法）

我尝试过的查询的最新版本：

select b.name,  
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,   
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,   
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,  
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas  
from biograph b  
left join babtissue t on b.name=t.name  
left join dna d on b.name=d.name  
where b.name is not NULL  
group by b.name  
order by b.name

这样做时我没有收到任何错误，但是我知道它给我的数字是错误的——太高了。我认为这与我使用多个联接有关，并且我的联接语法需要更改一些内容。

有什么想法吗？

原文

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.

I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:

Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.

name | birth
-----+-----------
A21  | 1968-07-01
AAR  | 2002-03-30
ABB  | 1998-09-10
ABD  | 2005-03-15
ABE  | 1986-01-01

Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.

name | sample_type | avail
-----+-------------+------
A21  | BLOOD       | Y
A21  | BLOOD       | Y
A21  | TISSUE      | N
ABB  | BLOOD       | Y
ABB  | TISSUE      | Y

Table: "dna" is similar to babtissue.

name | sample_type | avail
-----+-------------+------
ABB  | GDNA        | N
ABB  | WGA         | Y
ACC  | WGA         | N
ALE  | GDNA        | Y
ALE  | GDNA        | Y

Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...

name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21  | 2          | 0           | 0     | 0    | ?
AAR  | 0          | 0           | 0     | 0    | ?
ABB  | 1          | 1           | 0     | 1    | ?
ACC  | 0          | 0           | 0     | 0    | ?
ALE  | 0          | 0           | 2     | 0    | ?

(Apologies for the weird formatting above, I'm not very familiar with writing this way)

The latest version of the query that I've tried:

select b.name,  
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,   
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,   
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,  
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas  
from biograph b  
left join babtissue t on b.name=t.name  
left join dna d on b.name=d.name  
where b.name is not NULL  
group by b.name  
order by b.name

I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

那支青花 2024-12-03 19:06:10

数字太高，因为您要加入 babtissue，然后还要加入 dna，这会导致重复。

你可以尝试打破它。我不知道这个语法是否适用于您的数据库，但我相信它遵循 ANSI 标准，所以试一试......

SELECT
    SQ.name,
    SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
    SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
    SQ.gdnas,
    SQ.wgas
FROM
    (
    SELECT
        B.name,
        SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
        SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
    FROM
        biograph B
    LEFT JOIN dna D ON D.name = B.name
    GROUP BY
        B.name
    ) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name

名称真的可以为 NULL 吗？

The numbers are too high because you're joining to babtissue and then also to dna, which is going to cause duplicates.

You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...

SELECT
    SQ.name,
    SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
    SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
    SQ.gdnas,
    SQ.wgas
FROM
    (
    SELECT
        B.name,
        SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
        SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
    FROM
        biograph B
    LEFT JOIN dna D ON D.name = B.name
    GROUP BY
        B.name
    ) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name

Can the name really be NULL?

回复收藏 0 原文

你与昨日 2024-12-03 19:06:10

我不知道“avail”列，但这应该为您提供您正在寻找的其他列：

SELECT  b.name,
        COALESCE (t.bloodsamps,  0) AS bloodsamps,
        COALESCE (t.tissuesamps, 0) AS tissuesamps
        COALESCE (d.gdnas, 0) AS gdnas 
        COALESCE (d.wgas,  0) AS wgas
    FROM biograph b
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'BLOOD'  THEN 1 ELSE 0 END) AS bloodsamps,
                SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
            FROM babtissue
            WHERE avail = 'Y'
            GROUP BY name
        ) t
        ON (t.name = b.name)
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
                SUM(CASE WHEN sample_type = 'WGA'  THEN 1 ELSE 0 END) AS wgas
            FROM dna
            WHERE avail = 'Y'
            GROUP BY name
        ) d
        ON (d.name = b.name)
;

I don't know about the "avail" column, but this should give you the other columns you're looking for:

SELECT  b.name,
        COALESCE (t.bloodsamps,  0) AS bloodsamps,
        COALESCE (t.tissuesamps, 0) AS tissuesamps
        COALESCE (d.gdnas, 0) AS gdnas 
        COALESCE (d.wgas,  0) AS wgas
    FROM biograph b
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'BLOOD'  THEN 1 ELSE 0 END) AS bloodsamps,
                SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
            FROM babtissue
            WHERE avail = 'Y'
            GROUP BY name
        ) t
        ON (t.name = b.name)
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
                SUM(CASE WHEN sample_type = 'WGA'  THEN 1 ELSE 0 END) AS wgas
            FROM dna
            WHERE avail = 'Y'
            GROUP BY name
        ) d
        ON (d.name = b.name)
;

回复收藏 0 原文

~没有更多了~