1 前言
今天在生产中碰到了一个让我十分费解的 SQL,十分有趣。
2 现象
SQL 很好复现,就是逻辑看起来有点唬人
postgres=# create table test(id1 int,id2 int);CREATE TABLEpostgres=# insert into test values(1,3),(2,1),(3,1),(3,3);INSERT 0 4postgres=# select * from test;id1 | id2-----+-----1 | 32 | 13 | 13 | 3(4 rows)postgres=# create table test(id1 int,id2 int); CREATE TABLE postgres=# insert into test values(1,3),(2,1),(3,1),(3,3); INSERT 0 4 postgres=# select * from test; id1 | id2 -----+----- 1 | 3 2 | 1 3 | 1 3 | 3 (4 rows)postgres=# create table test(id1 int,id2 int); CREATE TABLE postgres=# insert into test values(1,3),(2,1),(3,1),(3,3); INSERT 0 4 postgres=# select * from test; id1 | id2 -----+----- 1 | 3 2 | 1 3 | 1 3 | 3 (4 rows)
业务 SQL 如下?? 此处用 test 表替代,真实情况表中字段存在一个父子关系,根据 parent_id 查找子 id
postgres=# select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test;b---tftt(4 rows)postgres=# explain select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test;QUERY PLAN--------------------------------------------------------------Seq Scan on test (cost=0.00..3.14 rows=4 width=1)SubPlan 2-> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4)(3 rows)postgres=# select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test; b --- t f t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=0.00..3.14 rows=4 width=1) SubPlan 2 -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4) (3 rows)postgres=# select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test; b --- t f t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (test.id1 = a.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=0.00..3.14 rows=4 width=1) SubPlan 2 -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4) (3 rows)
SQL 是 self-join ,a 是 test 表的一个别名。
让我们把子查询单独摘出来执行一下
postgres=# select 1 as one from test a where (test.id1 = a.id2);ERROR: invalid reference to FROM-clause entry for table "test"LINE 1: select 1 as one from test a where (test.id1 = a.id2);^HINT: Perhaps you meant to reference the table alias "a".postgres=# select 1 as one from test a where (test.id1 = a.id2); ERROR: invalid reference to FROM-clause entry for table "test" LINE 1: select 1 as one from test a where (test.id1 = a.id2); ^ HINT: Perhaps you meant to reference the table alias "a".postgres=# select 1 as one from test a where (test.id1 = a.id2); ERROR: invalid reference to FROM-clause entry for table "test" LINE 1: select 1 as one from test a where (test.id1 = a.id2); ^ HINT: Perhaps you meant to reference the table alias "a".
可以看到报错了,说明此处的 test 是取自外层的 test(即 from test),根据 test.id1 去判断 a.id2,于是返回如下结果
postgres=# select * from test;id1 | id2-----+-----1 | 3 ---true (id1=1,id2里面有,遍历)2 | 1 ---false(id1=2,id2里面没有,遍历)3 | 1 ---true (id1=3,id2里面有,遍历)3 | 3 ---true (id1=3,id2里面有,遍历)(4 rows)postgres=# select * from test; id1 | id2 -----+----- 1 | 3 ---true (id1=1,id2里面有,遍历) 2 | 1 ---false(id1=2,id2里面没有,遍历) 3 | 1 ---true (id1=3,id2里面有,遍历) 3 | 3 ---true (id1=3,id2里面有,遍历) (4 rows)postgres=# select * from test; id1 | id2 -----+----- 1 | 3 ---true (id1=1,id2里面有,遍历) 2 | 1 ---false(id1=2,id2里面没有,遍历) 3 | 1 ---true (id1=3,id2里面有,遍历) 3 | 3 ---true (id1=3,id2里面有,遍历) (4 rows)
现在让我们改写一下 SQL,修改一下别名
postgres=# select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test;b---tttt(4 rows)postgres=# explain select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test;QUERY PLAN--------------------------------------------------------------Seq Scan on test (cost=0.00..5.24 rows=4 width=1)SubPlan 2-> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4)(3 rows)postgres=# select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test; b --- t t t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=0.00..5.24 rows=4 width=1) SubPlan 2 -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4) (3 rows)postgres=# select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test; b --- t t t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (a.id1 = test.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=0.00..5.24 rows=4 width=1) SubPlan 2 -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=4) (3 rows)
这次可以看到,结果全部是真。老样子,也是相同的原理
postgres=# select 1 as one from test a where (a.id1 = test.id2);ERROR: invalid reference to FROM-clause entry for table "test"LINE 1: select 1 as one from test a where (a.id1 = test.id2);^HINT: Perhaps you meant to reference the table alias "a".postgres=# select 1 as one from test a where (a.id1 = test.id2); ERROR: invalid reference to FROM-clause entry for table "test" LINE 1: select 1 as one from test a where (a.id1 = test.id2); ^ HINT: Perhaps you meant to reference the table alias "a".postgres=# select 1 as one from test a where (a.id1 = test.id2); ERROR: invalid reference to FROM-clause entry for table "test" LINE 1: select 1 as one from test a where (a.id1 = test.id2); ^ HINT: Perhaps you meant to reference the table alias "a".
于是根据 test.id2 去探测 a.id1,于是返回如下结果
postgres=# select * from test;id1 | id2-----+-----1 | 3 ---true (id2=3,id1里面有,遍历)2 | 1 ---true (id2=1,id1里面有,遍历)3 | 1 ---true (id2=1,id1里面有,遍历)3 | 3 ---true (id2=3,id1里面有,遍历)(4 rows)postgres=# select * from test; id1 | id2 -----+----- 1 | 3 ---true (id2=3,id1里面有,遍历) 2 | 1 ---true (id2=1,id1里面有,遍历) 3 | 1 ---true (id2=1,id1里面有,遍历) 3 | 3 ---true (id2=3,id1里面有,遍历) (4 rows)postgres=# select * from test; id1 | id2 -----+----- 1 | 3 ---true (id2=3,id1里面有,遍历) 2 | 1 ---true (id2=1,id1里面有,遍历) 3 | 1 ---true (id2=1,id1里面有,遍历) 3 | 3 ---true (id2=3,id1里面有,遍历) (4 rows)
让我们再改写一下 SQL
postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;b---tttt(4 rows)postgres=# explain select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;QUERY PLAN--------------------------------------------------------------Seq Scan on test (cost=1.05..2.09 rows=4 width=1)InitPlan 1 (returns $0)-> Seq Scan on test a (cost=0.00..1.05 rows=1 width=0)Filter: (id1 = id2)(4 rows)postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- t t t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=1.05..2.09 rows=4 width=1) InitPlan 1 (returns $0) -> Seq Scan on test a (cost=0.00..1.05 rows=1 width=0) Filter: (id1 = id2) (4 rows)postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- t t t t (4 rows) postgres=# explain select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; QUERY PLAN -------------------------------------------------------------- Seq Scan on test (cost=1.05..2.09 rows=4 width=1) InitPlan 1 (returns $0) -> Seq Scan on test a (cost=0.00..1.05 rows=1 width=0) Filter: (id1 = id2) (4 rows)
这次执行计划变了,变成了 InitPlan,执行计划和结构都有所差异。那么 InitPlan 是什么意思?
This plan happens whenever there is a part of your query that can (or have to) be calculated before anything else, and it doesn’t depend on anything in the rest of your query.
只要查询的一部分可以(或必须)在其他任何内容之前计算,并且它不依赖于查询的其余部分中的任何内容,就会发生此计划。
A special case of SubPlan that only needs to run once.
SubPlan 的一种特殊情况,只需要运行一次。
这就有点像相关子连接和非相关子连接的说法,相关子连接在子查询语句中引用了外层表的列属性,这就导致外层表每获得一个元组,子查询就需要重新执行一次;而非相关子连接是指在子查询语句是独立的,和外层的表没有直接的关联,子查询可以单独执行一次,外层表可以重复利用子查询的执行结果。
因此上述执行计划就变成了 a 表先进行一次独立的子查询
postgres=# select * from test where id1 = id2;id1 | id2-----+-----3 | 3(1 row)postgres=# select exists (select 3,3) as b from test;b---tttt(4 rows)postgres=# delete from test;DELETE 4postgres=# insert into test values(5,4);INSERT 0 1postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;b---f(1 row)postgres=# insert into test values(3,4);INSERT 0 1postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;b---ff(2 rows)postgres=# insert into test values(4,4);INSERT 0 1postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test;b---ttt(3 rows)postgres=# select * from test where id1 = id2; id1 | id2 -----+----- 3 | 3 (1 row) postgres=# select exists (select 3,3) as b from test; b --- t t t t (4 rows) postgres=# delete from test; DELETE 4 postgres=# insert into test values(5,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- f (1 row) postgres=# insert into test values(3,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- f f (2 rows) postgres=# insert into test values(4,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- t t t (3 rows)postgres=# select * from test where id1 = id2; id1 | id2 -----+----- 3 | 3 (1 row) postgres=# select exists (select 3,3) as b from test; b --- t t t t (4 rows) postgres=# delete from test; DELETE 4 postgres=# insert into test values(5,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- f (1 row) postgres=# insert into test values(3,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- f f (2 rows) postgres=# insert into test values(4,4); INSERT 0 1 postgres=# select (exists (select 1 as one from test a where (a.id1 = a.id2))) as b from test; b --- t t t (3 rows)
可以看到,只要结果中有相等的 id1 和 id2,结果就会全部返回真。
那让我们又双叒叕改写下 SQL
postgres=# truncate table test;TRUNCATE TABLEpostgres=# insert into test values(1,3),(2,1),(3,1),(3,3);INSERT 0 4postgres=# explain select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;QUERY PLAN--------------------------------------------------------------------Seq Scan on test (cost=0.00..2.09 rows=4 width=1)SubPlan 1-> Result (cost=0.00..1.04 rows=4 width=0)One-Time Filter: (test.id1 = test.id2)-> Seq Scan on test a (cost=0.00..1.04 rows=4 width=0)(5 rows)postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;b---ffft(4 rows)postgres=# truncate table test; TRUNCATE TABLE postgres=# insert into test values(1,3),(2,1),(3,1),(3,3); INSERT 0 4 postgres=# explain select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; QUERY PLAN -------------------------------------------------------------------- Seq Scan on test (cost=0.00..2.09 rows=4 width=1) SubPlan 1 -> Result (cost=0.00..1.04 rows=4 width=0) One-Time Filter: (test.id1 = test.id2) -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=0) (5 rows) postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; b --- f f f t (4 rows)postgres=# truncate table test; TRUNCATE TABLE postgres=# insert into test values(1,3),(2,1),(3,1),(3,3); INSERT 0 4 postgres=# explain select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; QUERY PLAN -------------------------------------------------------------------- Seq Scan on test (cost=0.00..2.09 rows=4 width=1) SubPlan 1 -> Result (cost=0.00..1.04 rows=4 width=0) One-Time Filter: (test.id1 = test.id2) -> Seq Scan on test a (cost=0.00..1.04 rows=4 width=0) (5 rows) postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; b --- f f f t (4 rows)
这次多了一个 One-Time Filter,那么这个又是什么玩意?
A qualification used by a Result operation. If it is false, an empty result set can be returned without further work.
如果为 false,则可以返回空结果集,无需进一步工作。
让我们瞅瞅代码,在代码中有这么一段注释
* Result nodes are also used to optimise queries with constant* qualifications (ie, quals that do not depend on the scanned data),* such as:** select * from emp where 2 > 1** In this case, the plan generated is** Result (with 2 > 1 qual)* /* SeqScan (emp.*)** At runtime, the Result node evaluates the constant qual once,* which is shown by EXPLAIN as a One-Time Filter. If it's* false, we can return an empty result set without running the* controlled plan at all. If it's true, we run the controlled* plan normally and pass back the results.* Result nodes are also used to optimise queries with constant * qualifications (ie, quals that do not depend on the scanned data), * such as: * * select * from emp where 2 > 1 * * In this case, the plan generated is * * Result (with 2 > 1 qual) * / * SeqScan (emp.*) * * At runtime, the Result node evaluates the constant qual once, * which is shown by EXPLAIN as a One-Time Filter. If it's * false, we can return an empty result set without running the * controlled plan at all. If it's true, we run the controlled * plan normally and pass back the results.* Result nodes are also used to optimise queries with constant * qualifications (ie, quals that do not depend on the scanned data), * such as: * * select * from emp where 2 > 1 * * In this case, the plan generated is * * Result (with 2 > 1 qual) * / * SeqScan (emp.*) * * At runtime, the Result node evaluates the constant qual once, * which is shown by EXPLAIN as a One-Time Filter. If it's * false, we can return an empty result set without running the * controlled plan at all. If it's true, we run the controlled * plan normally and pass back the results.
逻辑很清晰,因此上述逻辑就好比这么一串 SQL
postgres=# select * from test where 2 > 1;id1 | id2-----+-----1 | 32 | 13 | 13 | 3(4 rows)postgres=# select * from test where 1 > 1;id1 | id2-----+-----(0 rows)postgres=# select exists(select 1 from test where 1 > 1)as b;b---f(1 row)postgres=# select exists(select 1 from test where 1 > 1)as b from test;b---ffff(4 rows)postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test;b---ffft(4 rows)postgres=# select * from test where 2 > 1; id1 | id2 -----+----- 1 | 3 2 | 1 3 | 1 3 | 3 (4 rows) postgres=# select * from test where 1 > 1; id1 | id2 -----+----- (0 rows) postgres=# select exists(select 1 from test where 1 > 1)as b; b --- f (1 row) postgres=# select exists(select 1 from test where 1 > 1)as b from test; b --- f f f f (4 rows) postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; b --- f f f t (4 rows)postgres=# select * from test where 2 > 1; id1 | id2 -----+----- 1 | 3 2 | 1 3 | 1 3 | 3 (4 rows) postgres=# select * from test where 1 > 1; id1 | id2 -----+----- (0 rows) postgres=# select exists(select 1 from test where 1 > 1)as b; b --- f (1 row) postgres=# select exists(select 1 from test where 1 > 1)as b from test; b --- f f f f (4 rows) postgres=# select (exists (select 1 as one from test a where (test.id1 = test.id2))) as b from test; b --- f f f t (4 rows)
因此此时的 SQL 逻辑就变成了这样:遍历 test 表,判断 id1 = id2 的行,所以结果是 false、false、false、true
3 小结
真是一段烧死脑细胞的神奇 SQL。不知道其他数据库中这个 SQL 是否是类似结果?感兴趣的读者可以私信我。当然文章中可能也有错误,欢迎指正 ~