🤘🏽 🤙 🎲 PostgreSQL 10を高速化する方法 👵🏾 🙅 ⛄️

（この記事ではPostgreSQL 10のNouveaulitésde PostgreSQL 10の例と説明を使用しました。

もちろん、PostgreSQLの第11バージョンが待機するのをすでに待っています。しかし、バージョン10では、かなり抜本的なパフォーマンスの改善がすでに現れていることが明らかになりました。最初に対処することは間違いなく理にかなっています。

「数十」のパフォーマンスは、一度にいくつかの方向で改善されました。この記事では、次の理由により加速に焦点を当てます。

スキャンテーブルとインデックスの並列化、
より効率的な集約、
クイック遷移表
複数列の統計によるクエリの高速化。

並行性から始めます。

PostgreSQL 10の同時実行性

バージョン9.6では、シーケンシャルテーブルの読み取り、結合、および集計の並列化はすでに機能していました。これは読み取り要求に関するもので、書き込み要求ではありません。 INSERT / UPDATE / DELETEも、CTEクエリ（共通テーブル式、一般テーブル式）の作成も、操作（ CREATE INDEX 、 VACUUM 、 ANALYZE ）の提供も、並列化をサポートしていませんでした。

バージョン10では、並列化が可能になります。

索引スキャン（ Index ScanおよびIndex Only Scan ）
Merge Join
並べ替え順序を維持しながら結果をGather Mergeする（ Gather Merge ）
準備されたリクエストの実行
非相関サブクエリの実行

Merge Join 、左右のテーブルが順序付けられてから、並行して比較されます。

バージョン9.6で導入されたGather Planノードは、すべてのバックグラウンドプロセスの結果をランダムな順序で収集します。 Gather Mergeは、すべてのバックグラウンドプロセスがソートされた結果を返す場合に適用されます。ノードは順序を維持します。

並行性の詳細については、 Robert Haasの Parallel Query v2を参照してください。

パラメータ

したがって、パラメーターはpostgresql.configに現れました。
min_parallel_table_scan_sizeは、スキャンを並列化する可能性が考慮されるテーブルデータの最小量を定義します。

min_parallel_index_scan_sizeは、インデックスデータの最小量を定義します。これを超えると、スキャンを並列化する可能性を考慮することができます。

max_parallel_workersは、DBMSが並列要求の処理に割り当てることができるバックグラウンドプロセスの最大数を定義します。デフォルトでは、このパラメーターは8です。

このパラメーターを増減するときは、 max_parallel_workers_per_gatherパラメーターを考慮することを忘れないでください

max_parallel_workers_per_gatherは、単一のGatherプランノードに割り当てることができる並列プロセスの最大数を定義します。デフォルトでは、パラメーターは2です。値0は、クエリの並列処理を無効にします。

準備する

PostgreSQL 10でテーブルt1を作成します。

 habr_10=# CREATE TABLE t1 AS SELECT row_number() OVER() AS id, generate_series%100 AS c_100, generate_series%500 AS c_500 FROM generate_series(1,20000000); SELECT 20000000 habr_10=# ALTER TABLE t1 ADD CONSTRAINT pk_t1 PRIMARY KEY (id); ALTER TABLE habr_10=# CREATE INDEX idx_t1 ON t1 (c_100); CREATE INDEX

max_parallel_workers_per_gatherパラメーターを変更します。

 postgres=# ALTER SYSTEM SET max_parallel_workers_per_gather TO 3; ALTER SYSTEM postgres=# SELECT pg_reload_conf(); pg_reload_conf ---------------- t (1 row)

PostgreSQL 9.6でも同じことを繰り返します。

並列ビットマップヒープスキャン

PostgreSQL 9.6では、読み取り時にparallel sequential scanテーブルスキャン（ parallel sequential scan ）のみをparallel sequential scanできましたが、インデックスアクセスはできませんでした。スケジューラは、並列化とインデックス使用のどちらかを選択する必要がありました。

parallel bitmap heap scanはPostgreSQL 10で使用できるため、スキャンプロセスは、読み込むデータページを示すメモリ内データ構造を作成します。バックグラウンドプロセスは、ページの一部を並行して読み取ることができます。

 habr_9_6=# EXPLAIN ANALYSE VERBOSE SELECT count(*), c_100 FROM t1 WHERE c_100 <10 GROUP BY c_100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=180449.79..180450.79 rows=100 width=12) (actual time=12663.666..12663.667 rows=10 loops=1) Output: count(*), c_100 Group Key: t1.c_100 -> Bitmap Heap Scan on public.t1 (cost=37387.68..170463.19 rows=1997321 width=4) (actual time=231.350..12097.624 rows=2000000 loops=1) Output: id, c_100, c_500 Recheck Cond: (t1.c_100 < 10) Rows Removed by Index Recheck: 13162468 Heap Blocks: exact=29054 lossy=79055 -> Bitmap Index Scan on idx_t1 (cost=0.00..36888.35 rows=1997321 width=0) (actual time=226.889..226.889 rows=2000000 loops=1) Index Cond: (t1.c_100 < 10) Planning time: 0.093 ms Execution time: 12663.698 ms (12 rows)

 habr_10=# EXPLAIN ANALYSE VERBOSE SELECT count(*), c_100 FROM t1 WHERE c_100 <10 GROUP BY c_100; QUERY PLAN ------------------------------------------------------------------------------------------------------- Finalize GroupAggregate (cost=158320.22..158323.47 rows=100 width=12) (actual time=9450.053..9450.060 rows=10 loops=1) Output: count(*), c_100 Group Key: t1.c_100 -> Sort (cost=158320.22..158320.97 rows=300 width=12) (actual time=9450.050..9450.052 rows=40 loops=1) Output: c_100, (PARTIAL count(*)) Sort Key: t1.c_100 Sort Method: quicksort Memory: 26kB -> Gather (cost=158276.87..158307.87 rows=300 width=12) (actual time=9449.733..9450.036 rows=40 loops=1) Output: c_100, (PARTIAL count(*)) Workers Planned: 3 Workers Launched: 3 -> Partial HashAggregate (cost=157276.87..157277.87 rows=100 width=12) (actual time=9380.225..9380.227 rows=10 loops=4) Output: c_100, PARTIAL count(*) Group Key: t1.c_100 Worker 0: actual time=9357.189..9357.191 rows=10 loops=1 Worker 1: actual time=9357.320..9357.322 rows=10 loops=1 Worker 2: actual time=9356.856..9356.858 rows=10 loops=1 -> Parallel Bitmap Heap Scan on public.t1 (cost=37775.94..154022.03 rows=650968 width=4) (actual time=181.108..9084.536 rows=500000 loops=4) Output: c_100 Recheck Cond: (t1.c_100 < 10) Rows Removed by Index Recheck: 2743963 Heap Blocks: exact=10792 lossy=16877 Worker 0: actual time=155.190..9113.397 rows=494347 loops=1 Worker 1: actual time=154.130..9053.253 rows=499488 loops=1 Worker 2: actual time=154.988..9021.038 rows=494091 loops=1 -> Bitmap Index Scan on idx_t1 (cost=0.00..37271.44 rows=2018000 width=0) (actual time=239.332..239.332 rows=2000000 loops=1) Index Cond: (t1.c_100 < 10) Planning time: 0.129 ms Execution time: 9455.530 ms (29 rows)

パラレルインデックスのみのスキャンとパラレルインデックススキャン

並列インデックスのみスキャン

インデックススキャンを並行して実行できるようになりました。 Gatherノードの存在に注意して、次のクエリによって返される実行計画を検討してください。

 habr_9_6=# EXPLAIN ANALYSE SELECT count(*) FROM t1 WHERE id > 10 AND id < 5000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=193908.66..193908.67 rows=1 width=8) (actual time=1726.007..1726.008 rows=1 loops=1) -> Index Only Scan using pk_t1 on t1 (cost=0.44..181438.64 rows=4988010 width=0) (actual time=0.017..1323.316 rows=4999989 loops=1) Index Cond: ((id > 10) AND (id < 5000000)) Heap Fetches: 4999989 Planning time: 0.904 ms Execution time: 1726.031 ms (6 rows)

 habr_10=# EXPLAIN ANALYSE SELECT count(*) FROM t1 WHERE id > 10 AND id < 5000000; QUERY PLAN ------------------------------------------------------------------------------------------ Finalize Aggregate (cost=153294.45..153294.46 rows=1 width=8) (actual time=1618.757..161 8.757 rows=1 loops=1) -> Gather (cost=153294.13..153294.44 rows=3 width=8) (actual time=1618.596..1618.751 rows=4 loops=1) Workers Planned: 3 Workers Launched: 3 -> Partial Aggregate (cost=152294.13..152294.14 rows=1 width=8) (actual time=16 10.488..1610.488 rows=1 loops=4) -> Parallel Index Only Scan using pk_t1 on t1 (cost=0.44..148255.01 rows= 1615648 width=0) (actual time=1.779..1274.247 rows=1249997 loops=4) Index Cond: ((id > 10) AND (id < 5000000)) Heap Fetches: 1258298 Planning time: 0.931 ms Execution time: 1619.854 ms (10 rows)

並列インデックススキャン
次に、このクエリによって返される実行計画を検討します。

 habr_9_6=# EXPLAIN ANALYSE SELECT count(c_100) FROM t1 WHERE id < 5000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (cost=181438.82..181438.83 rows=1 width=8) (actual time=1655.367..1655.368 rows=1 loops=1) -> Index Scan using pk_t1 on t1 (cost=0.44..168968.77 rows=4988019 width=4) (actual time=0.760..1137.062 rows=4999999 loops=1) Index Cond: (id < 5000000) Planning time: 0.055 ms Execution time: 1655.391 ms (5 rows)

 habr_10=# EXPLAIN ANALYSE SELECT count(c_100) FROM t1 WHERE id < 5000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=140773.27..140773.28 rows=1 width=8) (actual time=1675.122..1675.122 rows=1 loops=1) -> Gather (cost=140772.95..140773.26 rows=3 width=8) (actual time=1675.111..1675.119 rows=4 loops=1) Workers Planned: 3 Workers Launched: 3 -> Partial Aggregate (cost=139772.95..139772.96 rows=1 width=8) (actual time=1662.439..1662.439 rows=1 loops=4) -> Parallel Index Scan using pk_t1 on t1 (cost=0.44..135733.82 rows=1615651 width=4) (actual time=1.020..1335.593 rows=1250000 loops=4) Index Cond: (id < 5000000) Planning time: 0.060 ms Execution time: 1676.201 ms (9 rows)

バックグラウンドプロセスの監視

この章はPostgreSQLの高速化には直接適用されませんが、新しい並列化機能に並列プロセスを監視する新しい手段が追加されたため、ここでは関連があります。

バージョン10では、バージョン9.6と同様に、1つのセッションでリクエストを実行することにより、 pg_stat_activityを使用して他のセッションのバックグラウンドプロセスによって処理されたリクエストのテキストを読み取ることができpg_stat_activity 。

 habr_9_6=# -[ RECORD 1 ]----+------------------------------------------------------------------------ pid | 12789 application_name | psql backend_start | 2018-03-30 12:51:10.997649+03 query | SELECT pid,application_name,backend_start, query FROM pg_stat_activity; -[ RECORD 2 ]----+------------------------------------------------------------------------ pid | 12801 application_name | psql backend_start | 2018-03-30 12:52:57.486572+03 query | EXPLAIN (ANALYZE,BUFFERS,VERBOSE) SELECT COUNT(id) FROM t1; -[ RECORD 3 ]----+------------------------------------------------------------------------ pid | 12823 application_name | psql backend_start | 2018-03-30 12:54:32.775267+03 query | -[ RECORD 4 ]----+------------------------------------------------------------------------ pid | 12822 application_name | psql backend_start | 2018-03-30 12:54:32.778756+03 query | -[ RECORD 5 ]----+------------------------------------------------------------------------ pid | 12821 application_name | psql backend_start | 2018-03-30 12:54:32.782583+03 query

10-kでは、プロセスタイプ（ backend_type ）が表示されますが、その中にはバックグラウンドプロセスがあります。さらに、 stateフィールドはWHERE state='active'がアクティブなプロセスのみを残すのに役立ちます。

 habr_10=# SELECT pid,application_name,backend_start,backend_type,query FROM pg_stat_activity WHERE state='active'; -[ RECORD 1 ]----+----------------------------------------------------------------------------------------------------------- pid | 2225 application_name | psql backend_start | 2018-03-29 17:08:23.43802+03 backend_type | background worker query | EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT count(id) FROM t1; -[ RECORD 2 ]----+----------------------------------------------------------------------------------------------------------- pid | 462 application_name | psql backend_start | 2018-03-29 14:08:19.939538+03 backend_type | client backend query | SELECT pid,application_name,backend_start, backend_type, query FROM pg_stat_activity WHERE state='active'; -[ RECORD 3 ]----+----------------------------------------------------------------------------------------------------------- pid | 2224 application_name | psql backend_start | 2018-03-29 17:08:23.44016+03 backend_type | background worker query | EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT count(id) FROM t1; -[ RECORD 4 ]----+----------------------------------------------------------------------------------------------------------- pid | 2223 application_name | psql backend_start | 2018-03-29 17:08:23.442845+03 backend_type | background worker query | EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT count(id) FROM t1; -[ RECORD 5 ]----+----------------------------------------------------------------------------------------------------------- pid | 2090 application_name | psql backend_start | 2018-03-29 17:03:03.776892+03 backend_type | client backend query | EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT count(id) FROM t1;

WHERE state='active'がない場合、 walwriterやcheckpointerなどのサービスプロセスも表示され、リクエスト中は非アクティブであることがwalwriterました。

 -[ RECORD 1 ]----+--------------------------------------------------------------------------------------------- pid | 2825 application_name | backend_start | 2017-10-25 17:22:29.188114+03 backend_type | background worker state | query | -[ RECORD 2 ]----+--------------------------------------------------------------------------------------------- pid | 2823 application_name | backend_start | 2017-10-25 17:22:29.187815+03 backend_type | autovacuum launcher state | query | -[ RECORD 3 ]----+--------------------------------------------------------------------------------------------- pid | 2855 application_name | psql backend_start | 2018-03-29 18:18:09.743613+03 backend_type | client backend state | active query | SELECT pid,application_name,backend_start, backend_type, state, query FROM pg_stat_activity; -[ RECORD 4 ]----+--------------------------------------------------------------------------------------------- pid | 2821 application_name | backend_start | 2017-10-25 17:22:29.18081+03 backend_type | background writer state | query | -[ RECORD 5 ]----+--------------------------------------------------------------------------------------------- pid | 2820 application_name | backend_start | 2017-10-25 17:22:29.181031+03 backend_type | checkpointer state | query | -[ RECORD 6 ]----+--------------------------------------------------------------------------------------------- pid | 2822 application_name | backend_start | 2017-10-25 17:22:29.180576+03 backend_type | walwriter state | query |------

総利益

スペースを節約するために、複数のテーブルを含むOrdersのデータベースを作成するためのコードは提供しません。以下は、異なるグループ化セットでGROUP BYを使用したクエリの例です。

 EXPLAIN (ANALYZE, BUFFERS, COSTS off) SELECT GROUPING(client_type, country_code)::bit(2), GROUPING(client_type)::boolean g_type_cli, GROUPING(country_code)::boolean g_code_pays, cl.client_type, co.country_code, SUM(l.price*l.quantity) AS topay FROM orders c JOIN order_lines l ON (c.order_number = l.order_number) JOIN clients cl ON (c.client.id = cl.client_id) JOIN contacts co ON (cl.contact_id = co.contact_id) WHERE c.order_date BETWEEN '2014-01-01' AND '2014-12-31' GROUP BY CUBE (cl.client_type, co.country_code);

9.6と10ではクエリの処理が異なりますGroupAggregateでは、 GroupAggregateプランノードがGroupAggregateます。

  QUERY PLAN -------------------------------------------------------------------------------- GroupAggregate (actual time=2720.032..4971.515 rows=40 loops=1) Group Key: cl.type_client, co.code_pays Group Key: cl.type_client Group Key: () Sort Key: co.code_pays Group Key: co.code_pays Buffers: shared hit=8551 read=47879, temp read=32236 written=32218 -> Sort (actual time=2718.534..3167.936 rows=1226456 loops=1) Sort Key: cl.type_client, co.code_pays Sort Method: external merge Disk: 34664kB Buffers: shared hit=8551 read=47879, temp read=25050 written=25032 -> Hash Join (actual time=525.656..1862.380 rows=1226456 loops=1) Hash Cond: (l.numero_commande = c.numero_commande) Buffers: shared hit=8551 read=47879, temp read=17777 written=17759 -> Seq Scan on lignes_commandes l (actual time=0.091..438.819 rows=3141967 loops=1) Buffers: shared hit=2241 read=39961 -> Hash (actual time=523.476..523.476 rows=390331 loops=1) Buckets: 131072 Batches: 8 Memory Usage: 3162kB Buffers: shared hit=6310 read=7918, temp read=1611 written=2979 -> Hash Join (actual time=152.778..457.347 rows=390331 loops=1) Hash Cond: (c.client_id = cl.client_id) Buffers: shared hit=6310 read=7918, temp read=1611 written=1607 -> Seq Scan on commandes c (actual time=10.810..132.984 rows=390331 loops=1) Filter: ((date_commande >= '2014-01-01'::date) AND (date_commande <= '2014-12-31'::date)) Rows Removed by Filter: 609669 Buffers: shared hit=2241 read=7918 -> Hash (actual time=139.381..139.381 rows=100000 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 3522kB Buffers: shared hit=4069, temp read=515 written=750 -> Hash Join (actual time=61.976..119.724 rows=100000 loops=1) Hash Cond: (co.contact_id = cl.contact_id) Buffers: shared hit=4069, temp read=515 written=513 -> Seq Scan on contacts co (actual time=0.051..18.025 rows=110005 loops=1) Buffers: shared hit=3043 -> Hash (actual time=57.926..57.926 rows=100000 loops=1) Buckets: 65536 Batches: 2 Memory Usage: 3242kB Buffers: shared hit=1026, temp written=269 -> Seq Scan on clients cl (actual time=0.060..21.896 rows=100000 loops=1) Buffers: shared hit=1026 Planning time: 1.739 ms Execution time: 4985.385 ms (41 rows)

PostgreSQL 10では、ご覧のMixedAggregate 、 MixedAggregateプランノードが表示されます。つまり、ハッシュとソートを使用してGROUPING SETS （グループ化セット）を実行する機能です。 MixedAggregateを使用すると、クエリの実行が半分になります。

  QUERY PLAN -------------------------------------------------------------------------------- MixedAggregate (actual time=2640.531..2640.561 rows=40 loops=1) Hash Key: cl.type_client, co.code_pays Hash Key: cl.type_client Hash Key: co.code_pays Group Key: () Buffers: shared hit=8418 read=48015, temp read=17777 written=17759 -> Hash Join (actual time=494.339..1813.743 rows=1226456 loops=1) Hash Cond: (l.numero_commande = c.numero_commande) Buffers: shared hit=8418 read=48015, temp read=17777 written=17759 -> Seq Scan on lignes_commandes l (actual time=0.019..417.992 rows=3141967 loops=1) Buffers: shared hit=2137 read=40065 -> Hash (actual time=493.558..493.558 rows=390331 loops=1) Buckets: 131072 Batches: 8 Memory Usage: 3162kB Buffers: shared hit=6278 read=7950, temp read=1611 written=2979 -> Hash Join (actual time=159.207..429.528 rows=390331 loops=1) Hash Cond: (c.client_id = cl.client_id) Buffers: shared hit=6278 read=7950, temp read=1611 written=1607 -> Seq Scan on commandes c (actual time=2.562..103.812 rows=390331 loops=1) Filter: ((date_commande >= '2014-01-01'::date) AND (date_commande <= '2014-12-31'::date)) Rows Removed by Filter: 609669 Buffers: shared hit=2209 read=7950 -> Hash (actual time=155.728..155.728 rows=100000 loops=1) Buckets: 131072 Batches: 2 Memory Usage: 3522kB Buffers: shared hit=4069, temp read=515 written=750 -> Hash Join (actual time=73.906..135.779 rows=100000 loops=1) Hash Cond: (co.contact_id = cl.contact_id) Buffers: shared hit=4069, temp read=515 written=513 -> Seq Scan on contacts co (actual time=0.011..18.347 rows=110005 loops=1) Buffers: shared hit=3043 -> Hash (actual time=70.006..70.006 rows=100000 loops=1) Buckets: 65536 Batches: 2 Memory Usage: 3242kB Buffers: shared hit=1026, temp written=269 -> Seq Scan on clients cl (actual time=0.014..26.689 rows=100000 loops=1) Buffers: shared hit=1026 Planning time: 1.910 ms Execution time: 2642.349 ms (36 rows)

遷移表

トリガーがオペレーターレベルで機能する場合、 OLDとNEWは1行にのみ適用されるため使用できません。この場合、SQL標準は遷移テーブルを提供します。

バージョン10では、SQL標準に基づいてこの問題を解決できます。

以下に使用例を示します。

トリガーを持つメインテーブルと、メインから削除されたレコードを保存するアーカイブテーブルを作成します。

 habr_10=# CREATE TABLE main (c1 integer, c2 text); CREATE TABLE habr_10=# CREATE TABLE archive (id integer GENERATED ALWAYS AS IDENTITY, dlog timestamp DEFAULT now(), main_c1 integer, main_c2 text); CREATE TABLE

次に、ストアドプロシージャのコードを作成する必要があります。

 habr_10=# CREATE OR REPLACE FUNCTION log_delete() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO archive (main_c1, main_c2) SELECT c1, c2 FROM oldtable; RETURN null; END $$; CREATE FUNCTION

メインテーブルにトリガーを追加します。

 habr_10=# CREATE TRIGGER tr1 AFTER DELETE ON main REFERENCING OLD TABLE AS oldtable FOR EACH STATEMENT EXECUTE PROCEDURE log_delete(); CREATE TRIGGER

100万行を挿入して削除します。 EXPLAIN ANALYZEをEXPLAIN ANALYZE 、行を削除する時間とトリガーの時間を調べることができます。

 habr_10=# INSERT INTO main SELECT i, 'a_string'||i FROM generate_series(1, 1000000) i; INSERT 0 1000000 habr_10=# EXPLAIN (ANALYZE) DELETE FROM main; QUERY PLAN ------------------------------------------------------------------------------------------ Delete on main (cost=0.00..17642.13 rows=1127313 width=6) (actual time=1578.771..1578.77 1 rows=0 loops=1) -> Seq Scan on main (cost=0.00..17642.13 rows=1127313 width=6) (actual time=0.018..10 6.833 rows=1000000 loops=1) Planning time: 0.026 ms Trigger tr1: time=2494.337 calls=1 Execution time: 4075.228 ms (5 rows)

行の削除には約1.5秒かかりますが、トリガーは2.5秒かかります。

比較のために、これが以前に行われた方法です（トリガーを行レベルで構成）：

 habr_9_6=# CREATE TABLE main (c1 integer, c2 text); CREATE TABLE habr_9_6=# CREATE TABLE archive (id integer, dlog timestamp DEFAULT now(), main_c1 integer, main_c2 text); CREATE TABLE habr_9_6=# CREATE OR REPLACE FUNCTION log_delete() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO archive (main_c1, main_c2) VALUES (old.c1, old.c2); RETURN null; END $$; CREATE FUNCTION postgres=# CREATE TRIGGER tr1 AFTER DELETE ON main FOR EACH ROW EXECUTE PROCEDURE log_delete(); CREATE TRIGGER habr_9_6=# INSERT INTO main SELECT i, 'a_string'||i FROM generate_series(1, 1000000) i; INSERT 0 1000000 habr_9_6=# EXPLAIN ANALYZE DELETE FROM main; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Delete on main (cost=0.00..16369.00 rows=1000000 width=6) (actual time=2009.263..2009.263 rows=0 loops=1) -> Seq Scan on main (cost=0.00..16369.00 rows=1000000 width=6) (actual time=0.028..108.559 rows=1000000 loops=1) Planning time: 0.131 ms Trigger tr1: time=8572.522 calls=1000000 Execution time: 10649.182 ms (5 rows)

行レベルの操作モードでは、トリガーが10.7秒で100万行を削除し、そのうち8.6がトリガーされることがわかります。トリガーがオペレーターレベルで動作すると、4秒が取得され、そのうち1.5秒がトリガーの動作に費やされます。つまり、遷移表は生産性を向上させることができます。

遷移表への大きな関心はこれに関連しています。

このトピックの詳細については、次をご覧ください。

複数列の統計

これで、同じテーブルの複数の列の統計を作成できます。このため、列が強く相関している場合、実行計画の準備で推定を改善することができます。

例：

 habr_10=# CREATE TABLE multi (a INT, b INT); CREATE TABLE habr_10=# INSERT INTO multi SELECT i % 100, i % 100 FROM generate_series(1, 10000) s(i); INSERT 0 10000 habr_10=# ANALYZE multi; ANALYZE

データの分散は非常に簡単です。テーブル全体に均等に分散される値は100のみです。

列a ：

 habr_10=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM multi WHERE a = 1; QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on multi (cost=0.00..170.00 rows=100 width=8) (actual rows=100 loops=1) Filter: (a = 1) Rows Removed by Filter: 9900 Planning time: 0.063 ms Execution time: 0.496 ms (5 rows)

オプティマイザーは条件をチェックし、この条件の選択性は1％（行=挿入された10,000個のレコードのうち100個）であると結論付けます。

同様に、列b推定値を取得します。

次に、 ANDを使用して各列に同じ条件を適用します。

 habr_10=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM multi WHERE a = 1 AND b = 1; QUERY PLAN --------------------------------------------------------------------------------- Seq Scan on multi (cost=0.00..195.00 rows=1 width=8) (actual rows=100 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9900 Planning time: 0.116 ms Execution time: 2.154 ms (5 rows)

オプティマイザーは各条件の選択性を個別に推定し、上記と同じ1％の推定値を受け取ります。選択性の最終評価では、固有値の0.01％が得られます。つまり、非常に大幅に過小評価されます（ costとactual値の大きな差）。

スコアを改善するために、複数列の統計を作成できるようになりました。

 habr_10=# CREATE STATISTICS s1 (dependencies) ON a, b FROM multi; CREATE STATISTICS habr_10=# ANALYZE multi; ANALYZE

今確認してください：

 habr_10=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM multi WHERE a = 1 AND b = 1; QUERY PLAN ----------------------------------------------------------------------------------- Seq Scan on multi (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1) Filter: ((a = 1) AND (b = 1)) Rows Removed by Filter: 9900 Planning time: 0.086 ms Execution time: 0.525 ms (5 rows)

これで評価は適切です。

詳細については、多変量n固有係数の実装ページを参照してください。

続く

PostgreSQL 10を高速化する方法