Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
CREATE TABLE test_tablesample (id int, name text) WITH (fillfactor=10);
|
|
|
|
|
-- use fillfactor so we don't have to load too much data to get multiple pages
|
2015-05-15 14:37:10 -04:00
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
INSERT INTO test_tablesample
|
|
|
|
|
SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i);
|
2015-05-15 14:37:10 -04:00
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
SELECT t.id FROM test_tablesample AS t TABLESAMPLE SYSTEM (50) REPEATABLE (0);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (100.0/11) REPEATABLE (0);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (0);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (50) REPEATABLE (0);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (5.5) REPEATABLE (0);
|
|
|
|
|
|
|
|
|
|
-- 100% should give repeatable count results (ie, all rows) in any case
|
2015-05-15 14:37:10 -04:00
|
|
|
SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100);
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (1+2);
|
|
|
|
|
SELECT count(*) FROM test_tablesample TABLESAMPLE SYSTEM (100) REPEATABLE (0.4);
|
2015-05-15 14:37:10 -04:00
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
CREATE VIEW test_tablesample_v1 AS
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (10*2) REPEATABLE (2);
|
|
|
|
|
CREATE VIEW test_tablesample_v2 AS
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (99);
|
|
|
|
|
\d+ test_tablesample_v1
|
|
|
|
|
\d+ test_tablesample_v2
|
2015-05-15 14:37:10 -04:00
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
-- check a sampled query doesn't affect cursor in progress
|
2015-05-15 14:37:10 -04:00
|
|
|
BEGIN;
|
Fix some anomalies with NO SCROLL cursors.
We have long forbidden fetching backwards from a NO SCROLL cursor,
but the prohibition didn't extend to cases in which we rewind the
query altogether and then re-fetch forwards. I think the reason is
that this logic was mainly meant to protect plan nodes that can't
be run in the reverse direction. However, re-reading the query output
is problematic if the query is volatile (which includes SELECT FOR
UPDATE, not just queries with volatile functions): the re-read can
produce different results, which confuses the cursor navigation logic
completely. Another reason for disliking this approach is that some
code paths will either fetch backwards or rewind-and-fetch-forwards
depending on the distance to the target row; so that seemingly
identical use-cases may or may not draw the "cursor can only scan
forward" error. Hence, let's clean things up by disallowing rewind
as well as fetch-backwards in a NO SCROLL cursor.
Ordinarily we'd only make such a definitional change in HEAD, but
there is a third reason to consider this change now. Commit ba2c6d6ce
created some new user-visible anomalies for non-scrollable cursors
WITH HOLD, in that navigation in the cursor result got confused if the
cursor had been partially read before committing. The only good way
to resolve those anomalies is to forbid rewinding such a cursor, which
allows removal of the incorrect cursor state manipulations that
ba2c6d6ce added to PersistHoldablePortal.
To minimize the behavioral change in the back branches (including
v14), refuse to rewind a NO SCROLL cursor only when it has a holdStore,
ie has been held over from a previous transaction due to WITH HOLD.
This should avoid breaking most applications that have been sloppy
about whether to declare cursors as scrollable. We'll enforce the
prohibition across-the-board beginning in v15.
Back-patch to v11, as ba2c6d6ce was.
Discussion: https://postgr.es/m/3712911.1631207435@sss.pgh.pa.us
2021-09-10 13:18:32 -04:00
|
|
|
DECLARE tablesample_cur SCROLL CURSOR FOR
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (0);
|
|
|
|
|
|
2015-05-15 14:37:10 -04:00
|
|
|
FETCH FIRST FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (0);
|
2015-05-15 14:37:10 -04:00
|
|
|
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
|
|
|
|
|
FETCH FIRST FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
FETCH NEXT FROM tablesample_cur;
|
|
|
|
|
|
|
|
|
|
CLOSE tablesample_cur;
|
|
|
|
|
END;
|
|
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
EXPLAIN (COSTS OFF)
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (2);
|
|
|
|
|
EXPLAIN (COSTS OFF)
|
|
|
|
|
SELECT * FROM test_tablesample_v1;
|
|
|
|
|
|
|
|
|
|
-- check inheritance behavior
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select count(*) from person tablesample bernoulli (100);
|
|
|
|
|
select count(*) from person tablesample bernoulli (100);
|
|
|
|
|
select count(*) from person;
|
|
|
|
|
|
|
|
|
|
-- check that collations get assigned within the tablesample arguments
|
|
|
|
|
SELECT count(*) FROM test_tablesample TABLESAMPLE bernoulli (('1'::text < '0'::text)::int);
|
|
|
|
|
|
|
|
|
|
-- check behavior during rescans, as well as correct handling of min/max pct
|
|
|
|
|
select * from
|
|
|
|
|
(values (0),(100)) v(pct),
|
|
|
|
|
lateral (select count(*) from tenk1 tablesample bernoulli (pct)) ss;
|
|
|
|
|
select * from
|
|
|
|
|
(values (0),(100)) v(pct),
|
|
|
|
|
lateral (select count(*) from tenk1 tablesample system (pct)) ss;
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select pct, count(unique1) from
|
|
|
|
|
(values (0),(100)) v(pct),
|
|
|
|
|
lateral (select * from tenk1 tablesample bernoulli (pct)) ss
|
|
|
|
|
group by pct;
|
|
|
|
|
select pct, count(unique1) from
|
|
|
|
|
(values (0),(100)) v(pct),
|
|
|
|
|
lateral (select * from tenk1 tablesample bernoulli (pct)) ss
|
|
|
|
|
group by pct;
|
|
|
|
|
select pct, count(unique1) from
|
|
|
|
|
(values (0),(100)) v(pct),
|
|
|
|
|
lateral (select * from tenk1 tablesample system (pct)) ss
|
|
|
|
|
group by pct;
|
2015-05-15 14:37:10 -04:00
|
|
|
|
|
|
|
|
-- errors
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE FOOBAR (1);
|
|
|
|
|
|
Redesign tablesample method API, and do extensive code review.
The original implementation of TABLESAMPLE modeled the tablesample method
API on index access methods, which wasn't a good choice because, without
specialized DDL commands, there's no way to build an extension that can
implement a TSM. (Raw inserts into system catalogs are not an acceptable
thing to do, because we can't undo them during DROP EXTENSION, nor will
pg_upgrade behave sanely.) Instead adopt an API more like procedural
language handlers or foreign data wrappers, wherein the only SQL-level
support object needed is a single handler function identified by having
a special return type. This lets us get rid of the supporting catalog
altogether, so that no custom DDL support is needed for the feature.
Adjust the API so that it can support non-constant tablesample arguments
(the original coding assumed we could evaluate the argument expressions at
ExecInitSampleScan time, which is undesirable even if it weren't outright
unsafe), and discourage sampling methods from looking at invisible tuples.
Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable
within and across queries, as required by the SQL standard, and deal more
honestly with methods that can't support that requirement.
Make a full code-review pass over the tablesample additions, and fix
assorted bugs, omissions, infelicities, and cosmetic issues (such as
failure to put the added code stanzas in a consistent ordering).
Improve EXPLAIN's output of tablesample plans, too.
Back-patch to 9.5 so that we don't have to support the original API
in production.
2015-07-25 14:39:00 -04:00
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (NULL);
|
2015-05-15 14:37:10 -04:00
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) REPEATABLE (NULL);
|
|
|
|
|
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (-1);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (200);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (-1);
|
|
|
|
|
SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (200);
|
|
|
|
|
|
|
|
|
|
SELECT id FROM test_tablesample_v1 TABLESAMPLE BERNOULLI (1);
|
|
|
|
|
INSERT INTO test_tablesample_v1 VALUES(1);
|
|
|
|
|
|
|
|
|
|
WITH query_select AS (SELECT * FROM test_tablesample)
|
|
|
|
|
SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
|
|
|
|
|
|
|
|
|
|
SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
|
2017-02-24 01:51:46 -05:00
|
|
|
|
|
|
|
|
-- check partitioned tables support tablesample
|
|
|
|
|
create table parted_sample (a int) partition by list (a);
|
|
|
|
|
create table parted_sample_1 partition of parted_sample for values in (1);
|
|
|
|
|
create table parted_sample_2 partition of parted_sample for values in (2);
|
|
|
|
|
explain (costs off)
|
|
|
|
|
select * from parted_sample tablesample bernoulli (100);
|
|
|
|
|
drop table parted_sample, parted_sample_1, parted_sample_2;
|