大表上的 NATURAL JOIN

发布于 2024-12-09 15:34:48 字数 899 浏览 0 评论 0原文

我正在两个大表上执行简单的自然连接。

  • Polygons 包含 68,000 行 (45 MB)
  • RoadsHydro 包含约 200 万行 (210 MB) 。

这是否意味着数据库引擎在内部执行自然连接时会生成 68,000*200 万行的数据集?如果是这样,那么所需的内存量必须是 45*210 MB,这比我的系统只有 1.5 GB 的内存量要大得多。

当我执行此查询时,5 分钟后我的系统崩溃(突然关闭)。它不能处理数据库中 250 MB 的数据吗?那么数据库有什么用呢?

"I am modifying the above Question to clear the doubts of readers. 29-02-2012 today."

似乎我的许多朋友都感到困惑,因为我在上面的问题中提到了“自然连接”一词。我使用的真实空间查询是:

select p.OID , r.OID
    from poygons as p , roadshydro as r
                Where st_intersects(p.the_geom , r.the_geom) ;

其中多边形&每个roadsHydro 表都有两个字段:OID、the_geom。显然,它是两个表的叉积,而不是某个公共键上的自然连接。

当我执行上述查询时,我监视主内存消耗。什么也没有发生。没有丝毫的内存消耗,也没有得到任何输出,但 CPU 使用率几乎是 100%。看来数据库根本没有进行任何计算。但是,如果我从查询中删除 where 子句,主内存消耗会逐渐变得过高(5-6 分钟后),导致系统崩溃和机器突然关闭。这就是我正在经历的。删除 where 子句有什么特别之处?为什么 postgres 无法执行查询!对这种行为感到惊讶。

I am performing a simple natural join on two big tables.

  • polygons contains 68,000 rows (45 MB)
  • roadshydro contains about 2 million rows(210 MB) .

Does that mean that the database engine makes a data set of 68,000*2 million rows while performing natural join internally? If so, then the amount of memory required must be 45*210 MB which is much larger than what my system has, which is only 1.5 GB.

When I executed this Query, after 5 minutes my system crashes (abrupty shuts down). Can't it handle 250 MB of data on the database? What good for are databases then?

"I am modifying the above Question to clear the doubts of readers. 29-02-2012 today."

It seems many of my friends got confused because i mention 'natural join' word in the Question above. The real spatial Query i was using is :

select p.OID , r.OID
    from poygons as p , roadshydro as r
                Where st_intersects(p.the_geom , r.the_geom) ;

where polygons & roadshydro tables each has two fields : OID , the_geom . Clearly , it is a cross product of two tables and not Natural Join on some common key.

I monitor the main memory consumption When i execute the above query. It happens nothing. There is not a slightest amount of memory consumption , neither i get any output ever but CPU usage is almost 100%. It seems database isnt doing any computation at all. However , if i remove the where clause from the query , the Main memory consumption gradually goes too high (after 5-6 minutes ) resulting into system crash and machine abruptly shut down. This is what i am experiencing. What so special about removing the where clause? why postgres is failing to execute the query !! Surprised at this behaviour.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夏夜暖风 2024-12-16 15:34:48

使用 NATURAL JOIN 构造没有什么意义。话虽如此,如果联接与两个表中的每个记录都匹配,您描述的查询只会生成两个表的乘积。

仅当两个表中的每个记录都有一个具有相同名称和相同值的字段时才会发生这种情况 - 这极不可能,但逻辑上并非不可能,或者如果两个表中没有与名称匹配的字段。

如果我是你,我会放弃NATURAL JOIN,转而使用简单的JOIN,指定您想要匹配的字段。

如果这能解决崩溃问题,那么一切都很好,但如果确实如此,我会感到惊讶。

There is very little point in using the NATURAL JOIN construct. That having been said, the query you describe would only produce the product of the two tables if the join matched every record in both tables.

That would only happen if there was a field in both tables with the same name and the same value for every record - this is extrememly unlikely, but not logically impossible OR if there are no fields in the 2 tables that match on name.

If I were you I would discard the NATURAL JOIN in favour of a plain JOIN, specifying the fields you want to match.

If that solves the crashing then all well and good, but it would be a surprise to me if it did.

泛滥成性 2024-12-16 15:34:48

它实际上取决于许多不同的因素,但最重要的是您正在使用的 DBMS 及其配置。

但为了消除最大的误解:DBMS 不必将所有行保存在内存中:它可以写入临时表(在硬盘上)并为您提供结果......慢慢地......所以如果它崩溃了,这不正常。

话又说回来,你为什么要求 68k*2M 行?那是 136,000,000,000 行!您确定不想直接连接某个键吗?

It really depends on many different factors, but most of all on the DBMS you are using and its configuration.

But to clear out the biggest misunderstanding: the DBMS does not have to hold all the rows in memory: it can write to a temporary table (on the harddisk) and serve you the result... slowly... so if it's crashing, that is not normal.

Then again, why are you asking 68k*2M rows? That is 136,000,000,000 rows! You sure you don't want a straight join on some key instead?

千柳 2024-12-16 15:34:48

由于我对这篇文章的评论受到了批评,我准备了一个例子来说明我对此主题的看法。

以下 Oracle 脚本说明了我认为使用 NATURAL JOIN 构造所固有的危险。我承认这是一个人为的例子,但为了防御性发展,我相信我的立场是正确的。

DROP TABLE TABLE1;
DROP TABLE TABLE2;

CREATE TABLE TABLE1 (
FIELD1   VARCHAR2(10),
FIELD2   VARCHAR2(10),
DESCR_T1 VARCHAR2(20)
);

CREATE TABLE TABLE2 (
FIELD1   VARCHAR2(10),
FIELD2   VARCHAR2(10),
DESCR_T2 VARCHAR2(20)
);

INSERT INTO TABLE1 VALUES('AAA','AAA',    'AAA_AAA_T1'   );
INSERT INTO TABLE1 VALUES('BBB','BBB',    'BBB_BBB_T1'   );
INSERT INTO TABLE1 VALUES('CCC','T1_CCC', 'CCC_T1_CCC_T1');
INSERT INTO TABLE1 VALUES('DDD','T1_DDD', 'DDD_T1_DDD_T1');
INSERT INTO TABLE1 VALUES('EEE',NULL    , 'EEE_NULL_T1'  );

INSERT INTO TABLE2 VALUES('AAA','AAA',    'AAA_AAA_T2'   );
INSERT INTO TABLE2 VALUES('BBB','BBB',    'BBB_BBB_T2'   );
INSERT INTO TABLE2 VALUES('CCC','T2_CCC', 'CCC_T1_CCC_T2');
INSERT INTO TABLE2 VALUES('DDD','T2_DDD', 'DDD_T1_DDD_T2');
INSERT INTO TABLE2 VALUES('EEE',NULL    , 'EEE_NULL_T2'  );

COMMIT;

-- try the following queries and review the results

SELECT 
  FIELD1, DESCR_T1, DESCR_T2 
FROM 
  TABLE1 NATURAL JOIN TABLE2;

SELECT 
  * 
FROM 
  TABLE1 NATURAL JOIN TABLE2;

SELECT 
  TABLE1.FIELD1, TABLE1.DESCR_T1, TABLE2.DESCR_T2 
FROM 
  TABLE1 JOIN 
    TABLE2 ON 
      TABLE2.FIELD1 = TABLE1.FIELD1 AND 
      TABLE2.FIELD2 = TABLE1.FIELD2;

SELECT * FROM 
  TABLE1 NATURAL JOIN TABLE2;

-- Issue the following statement then retry the previous 3 statements.
-- The 'NJs' silently change behaviour and produce radically different results
-- whereas the third requires hands-on attention.  I believe this third behaviour
-- is desirable.  (You could equally drop the column TABLE2.FIELD2 as dportas 
-- has suggested

-- ALTER TABLE TABLE2 RENAME COLUMN FIELD2 TO T2_FIELD2;

As I have been criticised for my comments on this post, I have prepared an example to illustrate my opinion on the subject.

The following Oracle script is an illustration of what I think is the danger inherent in the use of the NATURAL JOIN construct. I accept it is a contrived example but in the interests of defensive development I believe my position holds true.

DROP TABLE TABLE1;
DROP TABLE TABLE2;

CREATE TABLE TABLE1 (
FIELD1   VARCHAR2(10),
FIELD2   VARCHAR2(10),
DESCR_T1 VARCHAR2(20)
);

CREATE TABLE TABLE2 (
FIELD1   VARCHAR2(10),
FIELD2   VARCHAR2(10),
DESCR_T2 VARCHAR2(20)
);

INSERT INTO TABLE1 VALUES('AAA','AAA',    'AAA_AAA_T1'   );
INSERT INTO TABLE1 VALUES('BBB','BBB',    'BBB_BBB_T1'   );
INSERT INTO TABLE1 VALUES('CCC','T1_CCC', 'CCC_T1_CCC_T1');
INSERT INTO TABLE1 VALUES('DDD','T1_DDD', 'DDD_T1_DDD_T1');
INSERT INTO TABLE1 VALUES('EEE',NULL    , 'EEE_NULL_T1'  );

INSERT INTO TABLE2 VALUES('AAA','AAA',    'AAA_AAA_T2'   );
INSERT INTO TABLE2 VALUES('BBB','BBB',    'BBB_BBB_T2'   );
INSERT INTO TABLE2 VALUES('CCC','T2_CCC', 'CCC_T1_CCC_T2');
INSERT INTO TABLE2 VALUES('DDD','T2_DDD', 'DDD_T1_DDD_T2');
INSERT INTO TABLE2 VALUES('EEE',NULL    , 'EEE_NULL_T2'  );

COMMIT;

-- try the following queries and review the results

SELECT 
  FIELD1, DESCR_T1, DESCR_T2 
FROM 
  TABLE1 NATURAL JOIN TABLE2;

SELECT 
  * 
FROM 
  TABLE1 NATURAL JOIN TABLE2;

SELECT 
  TABLE1.FIELD1, TABLE1.DESCR_T1, TABLE2.DESCR_T2 
FROM 
  TABLE1 JOIN 
    TABLE2 ON 
      TABLE2.FIELD1 = TABLE1.FIELD1 AND 
      TABLE2.FIELD2 = TABLE1.FIELD2;

SELECT * FROM 
  TABLE1 NATURAL JOIN TABLE2;

-- Issue the following statement then retry the previous 3 statements.
-- The 'NJs' silently change behaviour and produce radically different results
-- whereas the third requires hands-on attention.  I believe this third behaviour
-- is desirable.  (You could equally drop the column TABLE2.FIELD2 as dportas 
-- has suggested

-- ALTER TABLE TABLE2 RENAME COLUMN FIELD2 TO T2_FIELD2;
荆棘i 2024-12-16 15:34:48

扩展 Hugh 的示例数据,这里是两个 NATURAL JOIN 查询的示例。希望可以看出,这些对于 Hugh 描述的问题是“安全的”,并且 NJ 版本实际上比 INNER JOIN 版本更简洁(并且在我看来更具可读性)。

SELECT *
FROM 
(SELECT FIELD1, DESCR_T1 FROM TABLE1) T1
NATURAL JOIN
(SELECT FIELD1, DESCR_T2 FROM TABLE2) T2;

SELECT * 
FROM 
(SELECT FIELD1, FIELD2, DESCR_T1 FROM TABLE1) T1
NATURAL JOIN
(SELECT FIELD1, FIELD2, DESCR_T2 FROM TABLE2) T2;

除非你编写马虎的代码,否则休所说的问题并不存在。如果你确实编写了马虎的代码,那么 INNER JOIN 也是“不安全的”。这种交流也许确实说明了自然连接并不总是能被很好地理解。这可能是一些人对他们产生无理怀疑的原因。

Extending Hugh's example data, here is an example of two NATURAL JOIN queries. Hopefully it can be seen that these are "safe" from the problem that Hugh described and that the NJ version is actually less verbose (and in my opinion more readable) than the INNER JOIN version.

SELECT *
FROM 
(SELECT FIELD1, DESCR_T1 FROM TABLE1) T1
NATURAL JOIN
(SELECT FIELD1, DESCR_T2 FROM TABLE2) T2;

SELECT * 
FROM 
(SELECT FIELD1, FIELD2, DESCR_T1 FROM TABLE1) T1
NATURAL JOIN
(SELECT FIELD1, FIELD2, DESCR_T2 FROM TABLE2) T2;

The problem Hugh is talking about does not exist unless you write sloppy code. If you do write sloppy code then INNER JOIN is "unsafe" too. What this exchange maybe does illustrate is that natural joins are not always well understood. That might be a reason why some people are unreasonably suspicious of them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文