PostgreSQL 中多个表的多个总和/计数
我已经在这个网站上搜索了一些建议,但还没有完全得到我想要的东西。我怀疑我只是缺少一个语法/标点符号问题。
我使用 phpPgAdmin 开发一个数据库,该数据库跟踪与正在研究的狒狒种群相关的大量信息。我正在尝试进行查询,以确定对于每只狒狒,我们为它们收集了多少不同类型的组织样本,以及我们为每只狒狒收集了多少不同类型的 DNA 样本。有三个表与我的问题:
表:“传记”包含有关该组中所有动物的基本信息,尽管名称是我在这里关心的全部。
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
表:“babtissue”跟踪多年来收集的不同组织的信息,包括以下三列。该表中的一些行代表我们不再拥有的组织样本,但仍然在数据库的其他地方被引用,因此“可用”列帮助我们筛选我们仍然拥有的样本。
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
表:“dna”与 babtissue 类似。
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
总而言之,我正在尝试编写一个查询,该查询将返回传记中的每个名字,并在一列中告诉我每个人有多少个“BLOOD”、“TISSUE”、“GDNA”和“WGA”样本。就像...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
(对上面奇怪的格式表示歉意,我不太熟悉这种写法)
我尝试过的查询的最新版本:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
这样做时我没有收到任何错误,但是我知道它给我的数字是错误的——太高了。我认为这与我使用多个联接有关,并且我的联接语法需要更改一些内容。
有什么想法吗?
I've searched through several suggestions on this site and haven't quite been able to get what I'm after. I suspect there's just a syntax/punctuation issue that I'm just missing.
I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem:
Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here.
name | birth
-----+-----------
A21 | 1968-07-01
AAR | 2002-03-30
ABB | 1998-09-10
ABD | 2005-03-15
ABE | 1986-01-01
Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around.
name | sample_type | avail
-----+-------------+------
A21 | BLOOD | Y
A21 | BLOOD | Y
A21 | TISSUE | N
ABB | BLOOD | Y
ABB | TISSUE | Y
Table: "dna" is similar to babtissue.
name | sample_type | avail
-----+-------------+------
ABB | GDNA | N
ABB | WGA | Y
ACC | WGA | N
ALE | GDNA | Y
ALE | GDNA | Y
Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. Something like...
name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21 | 2 | 0 | 0 | 0 | ?
AAR | 0 | 0 | 0 | 0 | ?
ABB | 1 | 1 | 0 | 1 | ?
ACC | 0 | 0 | 0 | 0 | ?
ALE | 0 | 0 | 2 | 0 | ?
(Apologies for the weird formatting above, I'm not very familiar with writing this way)
The latest version of the query that I've tried:
select b.name,
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas
from biograph b
left join babtissue t on b.name=t.name
left join dna d on b.name=d.name
where b.name is not NULL
group by b.name
order by b.name
I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
数字太高,因为您要加入
babtissue
,然后还要加入dna
,这会导致重复。你可以尝试打破它。我不知道这个语法是否适用于您的数据库,但我相信它遵循 ANSI 标准,所以试一试......
名称真的可以为 NULL 吗?
The numbers are too high because you're joining to
babtissue
and then also todna
, which is going to cause duplicates.You can try to break it up. I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot...
Can the name really be NULL?
我不知道“avail”列,但这应该为您提供您正在寻找的其他列:
I don't know about the "avail" column, but this should give you the other columns you're looking for: