SQL 加速我的更新 - PostgresQL 中的 Left Join
我正在尝试使用两个数据帧之间共享的两个 id(patent_id、encounter_id)将一个数据帧连接到另一个数据帧。两个数据帧都在这些 id 上建立索引。
这是左侧:
tnx_prophy=# \d diagnosis
Table "public.diagnosis"
Column | Type | Collation | Nullable | Default
-------------------------------+------+-----------+----------+---------
patient_id | text | | |
encounter_id | text | | |
code_system | text | | |
code | text | | |
principal_diagnosis_indicator | text | | |
date | text | | |
Indexes:
"idx_pt_enc_dx" btree (patient_id, encounter_id)
这是右侧:
tnx_prophy=# \d encounter
Table "public.encounter"
Column | Type | Collation | Nullable | Default
--------------+------+-----------+----------+---------
encounter_id | text | | |
patient_id | text | | |
type | text | | |
enc_type | text | | |
Indexes:
"idx_pt_enc_enc" btree (patient_id, encounter_id)
数据集很大(~500m 行?),但我的 UPDATE 和 JOIN 函数似乎花费的时间比我想要的要长得多。是的,我想更新(不仅仅是生成临时表)
tnx_prophy=# ALTER TABLE diagnosis ADD COLUMN enc_type text;
ALTER TABLE
tnx_prophy=# UPDATE diagnosis
tnx_prophy-# SET enc_type = encounter.enc_type
tnx_prophy-# FROM encounter
tnx_prophy-# WHERE (diagnosis.patient_id, diagnosis.encounter_id) = (encounter.patient_id, encounter.encounter_id);
关于如何更快地完成此操作有什么建议吗?或者我是否明显搞乱了这里的语法?如果有人可以提供帮助,非常感谢!
I am trying to join one dataframe to another using two id's (patient_id, encounter_id) that are shared between them. Both dataframes are indexed on these ids.
Here is the LHS:
tnx_prophy=# \d diagnosis
Table "public.diagnosis"
Column | Type | Collation | Nullable | Default
-------------------------------+------+-----------+----------+---------
patient_id | text | | |
encounter_id | text | | |
code_system | text | | |
code | text | | |
principal_diagnosis_indicator | text | | |
date | text | | |
Indexes:
"idx_pt_enc_dx" btree (patient_id, encounter_id)
Here is the RHS:
tnx_prophy=# \d encounter
Table "public.encounter"
Column | Type | Collation | Nullable | Default
--------------+------+-----------+----------+---------
encounter_id | text | | |
patient_id | text | | |
type | text | | |
enc_type | text | | |
Indexes:
"idx_pt_enc_enc" btree (patient_id, encounter_id)
The datasets are large (~500m rows?), but my UPDATE and JOIN function seems to be taking much longer than I would like. And yes, I would like to update (not just generate a temporary table)
tnx_prophy=# ALTER TABLE diagnosis ADD COLUMN enc_type text;
ALTER TABLE
tnx_prophy=# UPDATE diagnosis
tnx_prophy-# SET enc_type = encounter.enc_type
tnx_prophy-# FROM encounter
tnx_prophy-# WHERE (diagnosis.patient_id, diagnosis.encounter_id) = (encounter.patient_id, encounter.encounter_id);
Any advice on how to do this faster? Or am I messing up the syntax here somehow obvious? Thanks a ton if anyone can help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
结果:
仔细观察
Merge Join (cost=0.30..47.59 rows=7 width=140) (actual time=0.139..0.232 rows=8loops=1)
行:8 行已更新,但只有4条记录!!发生这种情况是因为您的查找表中的搜索键不是
唯一
。每条记录都会更新两次,(并且顺序未定义......)!Result:
Take a good look at the
Merge Join (cost=0.30..47.59 rows=7 width=140) (actual time=0.139..0.232 rows=8 loops=1)
line: 8 rows are updated, but there are only 4 records!! This happens because the search-key into your look up table is not
unique
. Every record is updated twice, (and the order is undefined ...)!