as both a GROUP BY item and an output expression, the top-level Group
node should just copy up the evaluated expression value from its input,
rather than re-evaluating the expression. Aside from any performance
benefit this might offer, this avoids a crash when there is a sub-SELECT
in said expression.
comparison does not consider paths different when they differ only in
uninteresting aspects of sort order. (We had a special case of this
consideration for indexscans already, but generalize it to apply to
ordered join paths too.) Be stricter about what is a canonical pathkey
to allow faster pathkey comparison. Cache canonical pathkeys and
dispersion stats for left and right sides of a RestrictInfo's clause,
to avoid repeated computation. Total speedup will depend on number of
tables in a query, but I see about 4x speedup of planning phase for
a sample seven-table query.
maintained for each cache entry. A cache entry will not be freed until
the matching ReleaseSysCache call has been executed. This eliminates
worries about cache entries getting dropped while still in use. See
my posting to pg-hackers of even date for more info.
joins, and clean things up a good deal at the same time. Append plan node
no longer hacks on rangetable at runtime --- instead, all child tables are
given their own RT entries during planning. Concept of multiple target
tables pushed up into execMain, replacing bug-prone implementation within
nodeAppend. Planner now supports generating Append plans for inheritance
sets either at the top of the plan (the old way) or at the bottom. Expanding
at the bottom is appropriate for tables used as sources, since they may
appear inside an outer join; but we must still expand at the top when the
target of an UPDATE or DELETE is an inheritance set, because we actually need
a different targetlist and junkfilter for each target table in that case.
Fortunately a target table can't be inside an outer join... Bizarre mutual
recursion between union_planner and prepunion.c is gone --- in fact,
union_planner doesn't really have much to do with union queries anymore,
so I renamed it grouping_planner.
SQL92 semantics, including support for ALL option. All three can be used
in subqueries and views. DISTINCT and ORDER BY work now in views, too.
This rewrite fixes many problems with cross-datatype UNIONs and INSERT/SELECT
where the SELECT yields different datatypes than the INSERT needs. I did
that by making UNION subqueries and SELECT in INSERT be treated like
subselects-in-FROM, thereby allowing an extra level of targetlist where the
datatype conversions can be inserted safely.
INITDB NEEDED!
(Don't forget that an alias is required.) Views reimplemented as expanding
to subselect-in-FROM. Grouping, aggregates, DISTINCT in views actually
work now (he says optimistically). No UNION support in subselects/views
yet, but I have some ideas about that. Rule-related permissions checking
moved out of rewriter and into executor.
INITDB REQUIRED!
complaints about ungrouped variables. This is for consistency with
behavior elsewhere, notably the fact that the relname is reported as
an alias in these same complaints. Also, it'll work with subselect-
in-FROM where old code didn't.
for example, an SQL function can be used in a functional index. (I make
no promises about speed, but it'll work ;-).) Clean up and simplify
handling of functions returning sets.
macros where appropriate (the code used to have several different ways
of doing that, including Int32, Int8, UInt8, ...). Remove last few
references to float32 and float64 typedefs --- it's all float4/float8
now. The typedefs themselves should probably stay in c.h for a release
or two, though, to avoid breaking user-written C functions.
right thing with variable-free clauses that contain noncachable functions,
such as 'WHERE random() < 0.5' --- these are evaluated once per
potential output tuple. Expressions that contain only Params are
now candidates to be indexscan quals --- for example, 'var = ($1 + 1)'
can now be indexed. Cope with RelabelType nodes atop potential indexscan
variables --- this oversight prevents 7.0.* from recognizing some
potentially indexscanable situations.
from Param nodes, per discussion a few days ago on pghackers. Add new
expression node type FieldSelect that implements the functionality where
it's actually needed. Clean up some other unused fields in Func nodes
as well.
NOTE: initdb forced due to change in stored expression trees for rules.
memory contexts. Currently, only leaks in expressions executed as
quals or projections are handled. Clean up some old dead cruft in
executor while at it --- unused fields in state nodes, that sort of thing.
materialized tupleset is small enough) instead of a temporary relation.
This was something I was thinking of doing anyway for performance, and Jan
says he needs it for TOAST because he doesn't want to cope with toasting
noname relations. With this change, the 'noname table' support in heap.c
is dead code, and I have accordingly removed it. Also clean up 'noname'
plan handling in planner --- nonames are either sort or materialize plans,
and it seems less confusing to handle them separately under those names.
discussion of 5/19/00). pg_index is now searched for indexes of a
relation using an indexscan. Moreover, this is done once and cached
in the relcache entry for the relation, in the form of a list of OIDs
for the indexes. This list is used by the parser and executor to drive
lookups in the pg_index syscache when they want to know the properties
of the indexes. Net result: index information will be fully cached
for repetitive operations such as inserts.
key call sites are changed, but most called functions are still oldstyle.
An exception is that the PL managers are updated (so, for example, NULL
handling now behaves as expected in plperl and plpgsql functions).
NOTE initdb is forced due to added column in pg_proc.
WHERE in a place where it can be part of a nestloop inner indexqual.
As the code stood, it put the same physical sub-Plan node into both
indxqual and indxqualorig of the IndexScan plan node. That confused
later processing in the optimizer (which expected that tracing the
subPlan list would visit each subplan node exactly once), and would
probably have blown up in the executor if the planner hadn't choked first.
Fix by making the 'fixed' indexqual be a complete deep copy of the
original indexqual, rather than trying to share nodes below the topmost
operator node. This had further ramifications though, because we were
making the aforesaid list of sub-Plan nodes during SS_process_sublinks
which is run before construction of the 'fixed' indexqual, meaning that
the copy of the sub-Plan didn't show up in that list. Fix by rearranging
logic so that the sub-Plan list is built by the final set_plan_references
pass, not in SS_process_sublinks. This may sound like a mess, but it's
actually a good deal cleaner now than it was before, because we are no
longer dependent on the assumption that planning will never make a copy
of a sub-Plan node.
costs using the inner path's parent->rows count as the number of tuples
processed per inner scan iteration. This is wrong when we are using an
inner indexscan with indexquals based on join clauses, because the rows
count in a Relation node reflects the selectivity of the restriction
clauses for that rel only. Upshot was that if join clause was very
selective, we'd drastically overestimate the true cost of the join.
Fix is to calculate correct output-rows estimate for an inner indexscan
when the IndexPath node is created and save it in the path node.
Change of path node doesn't require initdb, since path nodes don't
appear in saved rules.
to simplify constant expressions and expand SubLink nodes into SubPlans
is done in a separate routine subquery_planner() that calls union_planner().
We formerly did most of this work in query_planner(), but that's the
wrong place because it may never see the real targetlist. Splitting
union_planner into two routines also allows us to avoid redundant work
when union_planner is invoked recursively for UNION and inheritance
cases. Upshot is that it is now possible to do something like
select float8(count(*)) / (select count(*) from int4_tbl) from int4_tbl
group by f1;
which has never worked before.
that the inputs to a given operator can be recursively simplified to
constants, it was evaluating the operator using the op's *original*
(unsimplified) arg list, so that any subexpressions had to be evaluated
again. A constant subexpression at depth N got evaluated N times.
Probably not very important in practical situations, but it made us look
real slow in MySQL's 'crashme' test...
represent the result of a binary-compatible type coercion. At runtime
it just evaluates its argument --- but during type resolution, exprType
will pick up the output type of the RelabelType node instead of the type
of the argument. This solves some longstanding problems with dropped
type coercions, an example being 'select now()::abstime::int4' which
used to produce date-formatted output, not an integer, because the
coercion to int4 was dropped on the floor.
selectivity functions and make the r-tree operators use them. The
estimation functions themselves are just stubs, unfortunately, but
perhaps someday someone will make them compute realistic estimates.
Change pg_am so that the optimizer can reliably tell the difference
between ordered and unordered indexes --- before it would think that
an r-tree index can be scanned in '<<' order, which is not right AFAIK.
Repair broken negator links for network_sup and related ops.
Initdb forced. This might be my last initdb force for 7.0 ... hope so
anyway ...
accesses versus sequential accesses, a (very crude) estimate of the
effects of caching on random page accesses, and cost to evaluate WHERE-
clause expressions. Export critical parameters for this model as SET
variables. Also, create SET variables for the planner's enable flags
(enable_seqscan, enable_indexscan, etc) so that these can be controlled
more conveniently than via PGOPTIONS.
Planner now estimates both startup cost (cost before retrieving
first tuple) and total cost of each path, so it can optimize queries
with LIMIT on a reasonable basis by interpolating between these costs.
Same facility is a win for EXISTS(...) subqueries and some other cases.
Redesign pathkey representation to achieve a major speedup in planning
(I saw as much as 5X on a 10-way join); also minor changes in planner
to reduce memory consumption by recycling discarded Path nodes and
not constructing unnecessary lists.
Minor cleanups to display more-plausible costs in some cases in
EXPLAIN output.
Initdb forced by change in interface to index cost estimation
functions.
SELECT a FROM t1 tx (a);
Allow join syntax, including queries like
SELECT * FROM t1 NATURAL JOIN t2;
Update RTE structure to hold column aliases in an Attr structure.
fields in JoinPaths --- turns out that we do need that after all :-(.
Also, rearrange planner so that only one RelOptInfo is created for a
particular set of joined base relations, no matter how many different
subsets of relations it can be created from. This saves memory and
processing time compared to the old method of making a bunch of RelOptInfos
and then removing the duplicates. Clean up the jointree iteration logic;
not sure if it's better, but I sure find it more readable and plausible
now, particularly for the case of 'bushy plans'.
nonoverlap_sets() and is_subset() to list.c, where they should have lived
to begin with, and rename to nonoverlap_setsi and is_subseti since they
only work on integer lists.
SELECT DISTINCT ON (expr [, expr ...]) targetlist ...
and there is a check to make sure that the user didn't specify an ORDER BY
that's incompatible with the DISTINCT operation.
Reimplement nodeUnique and nodeGroup to use the proper datatype-specific
equality function for each column being compared --- they used to do
bitwise comparisons or convert the data to text strings and strcmp().
(To add insult to injury, they'd look up the conversion functions once
for each tuple...) Parse/plan representation of DISTINCT is now a list
of SortClause nodes.
initdb forced by querytree change...
pghackers discussion of 5-Jan-2000. The amopselect and amopnpages
estimators are gone, and in their place is a per-AM amcostestimate
procedure (linked to from pg_am, not pg_amop).