如何在三个条件下连接两个表?

发布于 2024-11-25 07:46:21 字数 3213 浏览 1 评论 0原文

希望有人能弄清楚我在这里做错了什么。这项任务看起来很简单,但显然超出了我的能力范围。

我有两个表,我正在尝试附加降水数据。两个表中需要匹配三个条件才能获得每个现场的正确降水数据,例如年份、纬度和经度。我尝试使用以下查询(以及其他我什至不记得的失败查询):

SELECT f.*, g.* 
  FROM fieldSites f  LEFT OUTER JOIN gpcp_precipitation2 g
    ON f.date = g.year 
   AND f.d_lat = g.lat
 WHERE f.d_lon = g.lon; 

这个超时了

,并且:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

这个对我来说也超时了。

我想转储到 .csv 文件,但现在我只想执行成功的查询。

这是我的表格:

left table: fieldSites

siteId  d_lat  d_lon   year  data1  data2  country      
  1     -13.75  18.75   2009  0.598  0.351  Angola       
  1     -13.75  18.75   2008  0.654  0.330  Angola       
  1     -13.75  18.75   2007  0.489  0.381  Angola       
  1     -13.75  18.75   2006  0.554  0.389  Angola       
  1     -13.75  18.75   2005  0.321  0.321  Angola       
  1     -13.75  18.75   2004  0.598  0.351  Angola       
  1     -13.75  18.75   2003  0.654  0.330  Angola       
  1     -13.75  18.75   2002  0.489  0.381  Angola       
  1     -13.75  18.75   2001  0.554  0.389  Angola       
  2     -78.75  163.75  2009  0.285  0.155  Antarctica   
  2     -78.75  163.75  2008  0.285  0.155  Antarctica   
  2     -78.75  163.75  2007  0.285  0.155  Antarctica   
  2     -78.75  163.75  2006  0.285  0.155  Antarctica   
  2     -78.75  163.75  2005  0.285  0.155  Antarctica   
...1052 sites, 11 years, 11496 rows

right table: gpcp_precipitation2

siteId   lat    lon   year  precipitation
1        81.5   1.25  2009  93.36571912   
1        81.5   1.25  2008  93.36571912   
1        81.5   1.25  2007  93.36571912   
1        81.5   1.25  2006  93.36571912   
1        81.5   1.25  2005  93.36571912   
1        81.5   1.25  2004  93.36571912   
1        81.5   1.25  2003  93.36571912   
1        81.5   1.25  2002  93.36571912   
1        81.5   1.25  2001  93.36571912   
1        81.5   1.25  2000  93.36571912   
1        81.5   3.75  2009  93.36571912 
1        81.5   3.75  2008  93.36571912   
1        81.5   3.75  2007  93.36571912

... 92300 rows  

我想要的是:

siteId  d_lat  d_lon   year  data1  data2  country      precipitation  
  1     13.75  18.75   2009  0.598  0.351  Angola       144.286
  1     13.75  18.75   2008  0.654  0.330  Angola       114.970
  1     13.75  18.75   2007  0.489  0.381  Angola       70.000
  1     13.75  18.75   2006  0.554  0.389  Angola       174.179
  1     13.75  18.75   2005  0.321  0.321  Angola       174.743
  1     13.75  18.75   2004  0.598  0.351  Angola       70.506
  1     13.75  18.75   2003  0.654  0.330  Angola       173.716
  1     13.75  18.75   2002  0.489  0.381  Angola       74.162
  1     13.75  18.75   2001  0.554  0.389  Angola       139.445
  2     78.75  163.75  2009  0.285  0.155  Antarctica   0
  2     78.75  163.75  2008  0.285  0.155  Antarctica   0
  2     78.75  163.75  2007  0.285  0.155  Antarctica   0
  2     78.75  163.75  2006  0.285  0.155  Antarctica   0

我在做一些完全愚蠢的事情吗?我很困惑。 非常感谢您的任何建议。

hope someone can figure out what I'm doing wrong here. The task seems quite simple, but is apparently beyond me.

I have two tables and I am trying to append precipitation data. Three conditions need to match in the two tables to get the correct precipitation data for each field site, e.g. year, latitude, and longitude. I tried by using the following queries (among other failed queries that I can't even remember any more):

SELECT f.*, g.* 
  FROM fieldSites f  LEFT OUTER JOIN gpcp_precipitation2 g
    ON f.date = g.year 
   AND f.d_lat = g.lat
 WHERE f.d_lon = g.lon; 

this one timed out

and:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

this one also timed out on me.

I'd like to dump to a .csv file, but right now I just want to execute a successful query.

Here are my tables:

left table: fieldSites

siteId  d_lat  d_lon   year  data1  data2  country      
  1     -13.75  18.75   2009  0.598  0.351  Angola       
  1     -13.75  18.75   2008  0.654  0.330  Angola       
  1     -13.75  18.75   2007  0.489  0.381  Angola       
  1     -13.75  18.75   2006  0.554  0.389  Angola       
  1     -13.75  18.75   2005  0.321  0.321  Angola       
  1     -13.75  18.75   2004  0.598  0.351  Angola       
  1     -13.75  18.75   2003  0.654  0.330  Angola       
  1     -13.75  18.75   2002  0.489  0.381  Angola       
  1     -13.75  18.75   2001  0.554  0.389  Angola       
  2     -78.75  163.75  2009  0.285  0.155  Antarctica   
  2     -78.75  163.75  2008  0.285  0.155  Antarctica   
  2     -78.75  163.75  2007  0.285  0.155  Antarctica   
  2     -78.75  163.75  2006  0.285  0.155  Antarctica   
  2     -78.75  163.75  2005  0.285  0.155  Antarctica   
...1052 sites, 11 years, 11496 rows

right table: gpcp_precipitation2

siteId   lat    lon   year  precipitation
1        81.5   1.25  2009  93.36571912   
1        81.5   1.25  2008  93.36571912   
1        81.5   1.25  2007  93.36571912   
1        81.5   1.25  2006  93.36571912   
1        81.5   1.25  2005  93.36571912   
1        81.5   1.25  2004  93.36571912   
1        81.5   1.25  2003  93.36571912   
1        81.5   1.25  2002  93.36571912   
1        81.5   1.25  2001  93.36571912   
1        81.5   1.25  2000  93.36571912   
1        81.5   3.75  2009  93.36571912 
1        81.5   3.75  2008  93.36571912   
1        81.5   3.75  2007  93.36571912

... 92300 rows  

What I want is this:

siteId  d_lat  d_lon   year  data1  data2  country      precipitation  
  1     13.75  18.75   2009  0.598  0.351  Angola       144.286
  1     13.75  18.75   2008  0.654  0.330  Angola       114.970
  1     13.75  18.75   2007  0.489  0.381  Angola       70.000
  1     13.75  18.75   2006  0.554  0.389  Angola       174.179
  1     13.75  18.75   2005  0.321  0.321  Angola       174.743
  1     13.75  18.75   2004  0.598  0.351  Angola       70.506
  1     13.75  18.75   2003  0.654  0.330  Angola       173.716
  1     13.75  18.75   2002  0.489  0.381  Angola       74.162
  1     13.75  18.75   2001  0.554  0.389  Angola       139.445
  2     78.75  163.75  2009  0.285  0.155  Antarctica   0
  2     78.75  163.75  2008  0.285  0.155  Antarctica   0
  2     78.75  163.75  2007  0.285  0.155  Antarctica   0
  2     78.75  163.75  2006  0.285  0.155  Antarctica   0

Am I doing something completely stupid? I am stumped.
Thanks so much for any advice.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

音盲 2024-12-02 07:46:21
Select fieldSites.*, precipitation.*
From fieldSites
Inner Join gpcp_precipitation2 As precipitation On precipitation.siteId = fieldSites.siteId
Where
    fieldSites.d_year = precipitation.year And
    fieldSites.d_lat = precipitation.lat And
    fieldSites.d_lon = precipitation.lon

如果该查询超时,则说明存在索引问题,而不一定是查询问题。这为您在 where 子句中提供了几个要过滤的谓词,因此它应该会大大减少您的联接,但您可能需要一个在两个表上都包含 siteId、year、lat 和 lon 的索引。

Select fieldSites.*, precipitation.*
From fieldSites
Inner Join gpcp_precipitation2 As precipitation On precipitation.siteId = fieldSites.siteId
Where
    fieldSites.d_year = precipitation.year And
    fieldSites.d_lat = precipitation.lat And
    fieldSites.d_lon = precipitation.lon

If that query is timing out, you have an indexing problem, not necessarily a query problem. This gives you several predicates in the where clause to filter on, so it should reduce your joins quite a bit, but you may need an index that includes siteId, year, lat, and lon on both tables.

-小熊_ 2024-12-02 07:46:21

(date, d_lat, d_lon) 索引添加到第一个表,并将 (year, lat, lon) 索引添加到第二个表。然后,尝试连接。

根据您的评论,我建议您使用第二个查询:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON  fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year ;

既然您已经添加了一些索引,您还可以发布上述查询计划吗? (使用 EXPLAIN SELECT ...

Add an index on (date, d_lat, d_lon) to the first table and a (year, lat, lon) index to the second table. Then, try the joins.

From your comments, I suggest you use the second query:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON  fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year ;

Can you also post the query plan for the above, now that you have added some indexes? (use EXPLAIN SELECT ... )

我们只是彼此的过ke 2024-12-02 07:46:21
SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

最后一行:

AND fieldSites.date = gpcp_precipitation2.year);

根据您的表格,这应该是 fieldSites.year
这是拼写错误还是错误?

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

The last line:

AND fieldSites.date = gpcp_precipitation2.year);

According to your tables this should be fieldSites.year
Is that a typo or the error?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文