Commit graph

1715 commits

Author SHA1 Message Date
Robert Haas
abd94bcac4 Use abbreviated keys for faster sorting of numeric datums.
Andrew Gierth, reviewed by Peter Geoghegan, with further tweaks by me.
2015-04-02 14:04:26 -04:00
Heikki Linnakangas
f770870d9e Move inet/cidr GiST opclass functions to correct place in header file.
They were accidentally placed under the GIN heading.

Andreas Karlsson
2015-04-01 19:20:45 +03:00
Alvaro Herrera
97690ea6e8 Change array_offset to return subscripts, not offsets
... and rename it and its sibling array_offsets to array_position and
array_positions, to account for the changed behavior.

Having the functions return subscripts better matches existing practice,
and is better suited to using the result value as a subscript into the
array directly.  For one-based arrays, the new definition is identical
to what was originally committed.

(We use the term "subscript" in the documentation, which is what we use
whenever we talk about arrays; but the functions themselves are named
using the word "position" to match the standard-defined POSITION()
functions.)

Author: Pavel Stěhule
Behavioral problem noted by Dean Rasheed.
2015-03-30 16:13:21 -03:00
Heikki Linnakangas
0633a60f4d Add index-only scan support to range type GiST opclass.
Andreas Karlsson
2015-03-30 13:22:38 +03:00
Heikki Linnakangas
3a20b0e7b6 Add index-only scan support to inet GiST opclass.
Andreas Karlsson
2015-03-28 15:11:53 +02:00
Heikki Linnakangas
d04c8ed904 Add support for index-only scans in GiST.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.

This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).

Anastasia Lubennikova, with additional fixes and modifications by me.
2015-03-26 19:12:00 +02:00
Alvaro Herrera
bdc3d7fa23 Return ObjectAddress in many ALTER TABLE sub-routines
Since commit a2e35b53c3, most CREATE and ALTER commands return the
ObjectAddress of the affected object.  This is useful for event triggers
to try to figure out exactly what happened.  This patch extends this
idea a bit further to cover ALTER TABLE as well: an auxiliary
ObjectAddress is returned for each of several subcommands of ALTER
TABLE.  This makes it possible to decode with precision what happened
during execution of any ALTER TABLE command; for instance, which
constraint was added by ALTER TABLE ADD CONSTRAINT, or which parent got
dropped from the parents list by ALTER TABLE NO INHERIT.

As with the previous patch, there is no immediate user-visible change
here.

This is all really just continuing what c504513f83 started.

Reviewed by Stephen Frost.
2015-03-25 17:17:56 -03:00
Bruce Momjian
1c7087af42 Add TOAST table to pg_shseclabel for long label use
Report by Andres Freund
2015-03-21 22:14:49 -04:00
Andres Freund
959277a4f5 Use 128-bit math to accelerate some aggregation functions.
On platforms where we support 128bit integers, use them to implement
faster transition functions for sum(int8), avg(int8),
var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
numeric as a transition type.

In some synthetic benchmarks this has been shown to provide significant
speedups.

Bumps catversion.

Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson
Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
    Oskari Saarenmaa, David Rowley
2015-03-20 10:29:32 +01:00
Alvaro Herrera
13dbc7a824 array_offset() and array_offsets()
These functions return the offset position or positions of a value in an
array.

Author: Pavel Stěhule
Reviewed by: Jim Nasby
2015-03-18 16:01:34 -03:00
Tom Lane
7b8b8a4331 Improve representation of PlanRowMark.
This patch fixes two inadequacies of the PlanRowMark representation.

First, that the original LockingClauseStrength isn't stored (and cannot be
inferred for foreign tables, which always get ROW_MARK_COPY).  Since some
PlanRowMarks are created out of whole cloth and don't actually have an
ancestral RowMarkClause, this requires adding a dummy LCS_NONE value to
enum LockingClauseStrength, which is fairly annoying but the alternatives
seem worse.  This fix allows getting rid of the use of get_parse_rowmark()
in FDWs (as per the discussion around commits 462bd95705 and
8ec8760fc8), and it simplifies some things elsewhere.

Second, that the representation assumed that all child tables in an
inheritance hierarchy would use the same RowMarkType.  That's true today
but will soon not be true.  We add an "allMarkTypes" field that identifies
the union of mark types used in all a parent table's children, and use
that where appropriate (currently, only in preprocess_targetlist()).

In passing fix a couple of minor infelicities left over from the SKIP
LOCKED patch, notably that _outPlanRowMark still thought waitPolicy
is a bool.

Catversion bump is required because the numeric values of enum
LockingClauseStrength can appear in on-disk rules.

Extracted from a much larger patch to support foreign table inheritance;
it seemed worth breaking this out, since it's a separable concern.

Shigeru Hanada and Etsuro Fujita, somewhat modified by me
2015-03-15 18:41:47 -04:00
Peter Eisentraut
bb8582abf3 Remove rolcatupdate
This role attribute is an ancient PostgreSQL feature, but could only be
set by directly updating the system catalogs, and it doesn't have any
clearly defined use.

Author: Adam Brightwell <adam.brightwell@crunchydatasolutions.com>
2015-03-06 23:42:38 -05:00
Alvaro Herrera
a2e35b53c3 Change many routines to return ObjectAddress rather than OID
The changed routines are mostly those that can be directly called by
ProcessUtilitySlow; the intention is to make the affected object
information more precise, in support for future event trigger changes.
Originally it was envisioned that the OID of the affected object would
be enough, and in most cases that is correct, but upon actually
implementing the event trigger changes it turned out that ObjectAddress
is more widely useful.

Additionally, some command execution routines grew an output argument
that's an object address which provides further info about the executed
command.  To wit:

* for ALTER DOMAIN / ADD CONSTRAINT, it corresponds to the address of
  the new constraint

* for ALTER OBJECT / SET SCHEMA, it corresponds to the address of the
  schema that originally contained the object.

* for ALTER EXTENSION {ADD, DROP} OBJECT, it corresponds to the address
  of the object added to or dropped from the extension.

There's no user-visible change in this commit, and no functional change
either.

Discussion: 20150218213255.GC6717@tamriel.snowman.net
Reviewed-By: Stephen Frost, Andres Freund
2015-03-03 14:10:50 -03:00
Tom Lane
b67f1ce181 Reduce json <=> jsonb casts from explicit-only to assignment level.
There's no reason to make users write an explicit cast to store a
json value in a jsonb column or vice versa.

We could probably even make these implicit, but that might open us up
to problems with ambiguous function calls, so for now just do this.
2015-03-03 11:26:04 -05:00
Noah Misch
b8a18ad485 Add transform functions for AT TIME ZONE.
This makes "ALTER TABLE tabname ALTER tscol TYPE ... USING tscol AT TIME
ZONE 'UTC'" skip rewriting the table when altering from "timestamp" to
"timestamptz" or vice versa.  While it would be nicer still to optimize
this in the absence of the USING clause given timezone==UTC, transform
functions must consult IMMUTABLE facts only.
2015-03-01 13:22:34 -05:00
Tom Lane
c063da1769 Add parse location fields to NullTest and BooleanTest structs.
We did not need a location tag on NullTest or BooleanTest before, because
no error messages referred directly to their locations.  That's planned
to change though, so add these fields in a separate housekeeping commit.

Catversion bump because stored rules may change.
2015-02-22 14:40:27 -05:00
Andres Freund
82a532b34d Force some system catalog table columns to be marked NOT NULL.
In a manual pass over the catalog declaration I found a number of
columns which the boostrap automatism didn't mark NOT NULL even though
they actually were. Add BKI_FORCE_NOT_NULL markings to them.

It's usually not critical if a system table column is falsely determined
to be nullable as the code should always catch relevant cases. But it's
good to have a extra layer in place.

Discussion: 20150215170014.GE15326@awork2.anarazel.de
2015-02-21 22:37:05 +01:00
Andres Freund
eb68379c38 Allow forcing nullness of columns during bootstrap.
Bootstrap determines whether a column is null based on simple builtin
rules. Those work surprisingly well, but nonetheless a few existing
columns aren't set correctly. Additionally there is at least one patch
sent to hackers where forcing the nullness of a column would be helpful.

The boostrap format has gained FORCE [NOT] NULL for this, which will be
emitted by genbki.pl when BKI_FORCE_(NOT_)?NULL is specified for a
column in a catalog header.

This patch doesn't change the marking of any existing columns.

Discussion: 20150215170014.GE15326@awork2.anarazel.de
2015-02-21 22:31:54 +01:00
Tom Lane
692bd09ad1 Use "#ifdef CATALOG_VARLEN" to protect nullable fields of pg_authid.
This gives a stronger guarantee than a mere comment against accessing these
fields as simple struct members.  Since rolpassword is in fact varlena,
it's not clear why these didn't get marked from the beginning, but let's
do it now.

Michael Paquier
2015-02-20 00:23:48 -05:00
Tom Lane
09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
Tom Lane
2fb7a75f37 Add pg_stat_get_snapshot_timestamp() to show statistics snapshot timestamp.
Per discussion, this could be useful for purposes such as programmatically
detecting a nonresponding stats collector.  We already have the timestamp
anyway, it's just a matter of providing a SQL-accessible function to fetch
it.

Matt Kelly, reviewed by Jim Nasby
2015-02-19 21:36:50 -05:00
Tom Lane
56a79a869b Split array_push into separate array_append and array_prepend functions.
There wasn't any good reason for a single C function to implement both
these SQL functions: it saved very little code overall, and it required
significant pushups to re-determine at runtime which case applied.  Redoing
it as two functions ends up with just slightly more lines of code, but it's
simpler to understand, and faster too because we need not repeat syscache
lookups on every call.

An important side benefit is that this eliminates the only case in which
different aliases of the same C function had both anyarray and anyelement
arguments at the same position, which would almost always be a mistake.
The opr_sanity regression test will now notice such mistakes since there's
no longer a valid case where it happens.
2015-02-18 20:53:33 -05:00
Heikki Linnakangas
c619c2351f Move pg_crc.c to src/common, and remove pg_crc_tables.h
To get CRC functionality in a client program, you now need to link with
libpgcommon instead of libpgport. The CRC code has nothing to do with
portability, so libpgcommon is a better home. (libpgcommon didn't exist
when pg_crc.c was originally moved to src/port.)

Remove the possibility to get CRC functionality by just #including
pg_crc_tables.h. I'm not aware of any extensions that actually did that and
couldn't simply link with libpgcommon.

This also moves the pg_crc.h header file from src/include/utils to
src/include/common, which will require changes to any external programs
that currently does #include "utils/pg_crc.h". That seems acceptable, as
include/common is clearly the right home for it now, and the change needed
to any such programs is trivial.
2015-02-09 11:17:56 +02:00
Tom Lane
3d660d33aa Fix assorted oversights in range selectivity estimation.
calc_rangesel() failed outright when comparing range variables to empty
constant ranges with < or >=, as a result of missing cases in a switch.
It also produced a bogus estimate for > comparison to an empty range.

On top of that, the >= and > cases were mislabeled throughout.  For
nonempty constant ranges, they managed to produce the right answers
anyway as a result of counterbalancing typos.

Also, default_range_selectivity() omitted cases for elem <@ range,
range &< range, and range &> range, so that rather dubious defaults
were applied for these operators.

In passing, rearrange the code in rangesel() so that the elem <@ range
case is handled in a less opaque fashion.

Report and patch by Emre Hasegeli, some additional work by me
2015-01-30 12:30:59 -05:00
Stephen Frost
c7cf9a2433 Add usebypassrls to pg_user and pg_shadow
The row level security patches didn't add the 'usebypassrls' columns to
the pg_user and pg_shadow views on the belief that they were deprecated,
but we havn't actually said they are and therefore we should include it.

This patch corrects that, adds missing documentation for rolbypassrls
into the system catalog page for pg_authid, along with the entries for
pg_user and pg_shadow, and cleans up a few other uses of 'row-level'
cases to be 'row level' in the docs.

Pointed out by Amit Kapila.

Catalog version bump due to system view changes.
2015-01-28 21:47:15 -05:00
Tom Lane
fd496129d1 Clean up some mess in row-security patches.
Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change
a variable inside PG_TRY and then use it in PG_CATCH without marking it
"volatile".  In this case though it seems saner to avoid that by doing
a single assignment before entering the TRY block.

I started out just intending to fix that, but the more I looked at the
row-security code the more distressed I got.  This patch also fixes
incorrect construction of the RowSecurityPolicy cache entries (there was
not sufficient care taken to copy pass-by-ref data into the cache memory
context) and a whole bunch of sloppiness around the definition and use of
pg_policy.polcmd.  You can't use nulls in that column because initdb will
mark it NOT NULL --- and I see no particular reason why a null entry would
be a good idea anyway, so changing initdb's behavior is not the right
answer.  The internal value of '\0' wouldn't be suitable in a "char" column
either, so after a bit of thought I settled on using '*' to represent ALL.
Chasing those changes down also revealed that somebody wasn't paying
attention to what the underlying values of ACL_UPDATE_CHR etc really were,
and there was a great deal of lackadaiscalness in the catalogs.sgml
documentation for pg_policy and pg_policies too.

This doesn't pretend to be a complete code review for the row-security
stuff, it just fixes the things that were in my face while dealing with
the bugs in RelationBuildRowSecurity.
2015-01-24 16:16:22 -05:00
Alvaro Herrera
972bf7d6f1 Tweak BRIN minmax operator class
In the union support proc, we were not checking the hasnulls flag of
value A early enough, so it could be skipped if the "allnulls" flag in
value B is set.  Also, a check on the allnulls flag of value "B" was
redundant, so remove it.

Also change inet_minmax_ops to not be the default opclass for type inet,
as a future inclusion operator class would be more useful and it's
pretty difficult to change default opclass for a datatype later on.
(There is no catversion bump for this catalog change; this shouldn't be
a problem.)

Extracted from a larger patch to add an "inclusion" operator class.

Author: Emre Hasegeli
2015-01-22 17:01:09 -03:00
Bruce Momjian
4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
Alvaro Herrera
72dd233d3e pg_event_trigger_dropped_objects: Add name/args output columns
These columns can be passed to pg_get_object_address() and used to
reconstruct the dropped objects identities in a remote server containing
similar objects, so that the drop can be replicated.

Reviewed by Stephen Frost, Heikki Linnakangas, Abhijit Menon-Sen, Andres
Freund.
2014-12-30 17:41:46 -03:00
Alvaro Herrera
a676201490 Add pg_identify_object_as_address
This function returns object type and objname/objargs arrays, which can
be passed to pg_get_object_address.  This is especially useful because
the textual representation can be copied to a remote server in order to
obtain the corresponding OID-based address.  In essence, this function
is the inverse of recently added pg_get_object_address().

Catalog version bumped due to the addition of the new function.

Also add docs to pg_get_object_address.
2014-12-30 15:41:50 -03:00
Alvaro Herrera
a609d96778 Revert "Use a bitmask to represent role attributes"
This reverts commit 1826987a46.

The overall design was deemed unacceptable, in discussion following the
previous commit message; we might find some parts of it still
salvageable, but I don't want to be on the hook for fixing it, so let's
wait until we have a new patch.
2014-12-23 15:35:49 -03:00
Alvaro Herrera
d7ee82e50f Add SQL-callable pg_get_object_address
This allows access to get_object_address from SQL, which is useful to
obtain OID addressing information from data equivalent to that emitted
by the parser.  This is necessary infrastructure of a project to let
replication systems propagate object dropping events to remote servers,
where the schema might be different than the server originating the
DROP.

This patch also adds support for OBJECT_DEFAULT to get_object_address;
that is, it is now possible to refer to a column's default value.

Catalog version bumped due to the new function.

Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
Freund, Abhijit Menon-Sen, Adam Brightwell.
2014-12-23 15:31:29 -03:00
Alvaro Herrera
1826987a46 Use a bitmask to represent role attributes
The previous representation using a boolean column for each attribute
would not scale as well as we want to add further attributes.

Extra auxilliary functions are added to go along with this change, to
make up for the lost convenience of access of the old representation.

Catalog version bumped due to change in catalogs and the new functions.

Author: Adam Brightwell, minor tweaks by Álvaro
Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
2014-12-23 10:22:09 -03:00
Alvaro Herrera
0ee98d1cbf pg_event_trigger_dropped_objects: add behavior flags
Add "normal" and "original" flags as output columns to the
pg_event_trigger_dropped_objects() function.  With this it's possible to
distinguish which objects, among those listed, need to be explicitely
referenced when trying to replicate a deletion.

This is necessary so that the list of objects can be pruned to the
minimum necessary to replicate the DROP command in a remote server that
might have slightly different schema (for instance, TOAST tables and
constraints with different names and such.)

Catalog version bumped due to change of function definition.

Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas,
Robert Haas.
2014-12-19 15:00:45 -03:00
Heikki Linnakangas
4520ba6769 Add point <-> polygon distance operator.
Alexander Korotkov, reviewed by Emre Hasegeli.
2014-12-15 17:06:21 +02:00
Andrew Dunstan
7e354ab9fe Add several generator functions for jsonb that exist for json.
The functions are:
    to_jsonb()
    jsonb_object()
    jsonb_build_object()
    jsonb_build_array()
    jsonb_agg()
    jsonb_object_agg()

Also along the way some better logic is implemented in
json_categorize_type() to match that in the newly implemented
jsonb_categorize_type().

Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
2014-12-12 15:31:14 -05:00
Andrew Dunstan
237a882443 Add json_strip_nulls and jsonb_strip_nulls functions.
The functions remove object fields, including in nested objects, that
have null as a value. In certain cases this can lead to considerably
smaller datums, with no loss of semantic information.

Andrew Dunstan, reviewed by Pavel Stehule.
2014-12-12 09:00:43 -05:00
Simon Riggs
618c9430a8 Event Trigger for table_rewrite
Generate a table_rewrite event when ALTER TABLE
attempts to rewrite a table. Provide helper
functions to identify table and reason.

Intended use case is to help assess or to react
to schema changes that might hold exclusive locks
for long periods.

Dimitri Fontaine, triggering an edit by Simon Riggs

Reviewed in detail by Michael Paquier
2014-12-08 00:55:28 +09:00
Alvaro Herrera
73c986adde Keep track of transaction commit timestamps
Transactions can now set their commit timestamp directly as they commit,
or an external transaction commit timestamp can be fed from an outside
system using the new function TransactionTreeSetCommitTsData().  This
data is crash-safe, and truncated at Xid freeze point, same as pg_clog.

This module is disabled by default because it causes a performance hit,
but can be enabled in postgresql.conf requiring only a server restart.

A new test in src/test/modules is included.

Catalog version bumped due to the new subdirectory within PGDATA and a
couple of new SQL functions.

Authors: Álvaro Herrera and Petr Jelínek

Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
Singer, Peter Eisentraut
2014-12-03 11:53:02 -03:00
Tom Lane
1511521a36 Minor cleanup of function declarations for BRIN.
Get rid of PG_FUNCTION_INFO_V1() macros, which are quite inappropriate
for built-in functions (possibly leftovers from testing as a loadable
module?).  Also, fix gratuitous inconsistency between SQL-level and
C-level names of the minmax support functions.
2014-12-02 14:07:54 -05:00
Tom Lane
866737c923 Add a #define for the inet overlaps operator.
Extracted from pending inet selectivity patch.  The rest of it isn't
quite ready to commit, but we might as well push this part so the patch
doesn't have to track the moving target of pg_operator.h.
2014-11-30 19:43:43 -05:00
Alvaro Herrera
816e10d800 Fix BRIN operator family definitions
The original definitions were leaving no room for cross-type operators,
so queries that compared a column of one type against something of a
different type were not taking advantage of the index.  Fix by making
the opfamilies more like the ones for Btree, and include a few
cross-type operator classes.

Catalog version bumped.

Per complaints from Hubert Lubaczewski, Mark Wong, Heikki Linnakangas.
2014-11-28 18:09:19 -03:00
Stephen Frost
143b39c185 Rename pg_rowsecurity -> pg_policy and other fixes
As pointed out by Robert, we should really have named pg_rowsecurity
pg_policy, as the objects stored in that catalog are policies.  This
patch fixes that and updates the column names to start with 'pol' to
match the new catalog name.

The security consideration for COPY with row level security, also
pointed out by Robert, has also been addressed by remembering and
re-checking the OID of the relation initially referenced during COPY
processing, to make sure it hasn't changed under us by the time we
finish planning out the query which has been built.

Robert and Alvaro also commented on missing OCLASS and OBJECT entries
for POLICY (formerly ROWSECURITY or POLICY, depending) in various
places.  This patch fixes that too, which also happens to add the
ability to COMMENT on policies.

In passing, attempt to improve the consistency of messages, comments,
and documentation as well.  This removes various incarnations of
'row-security', 'row-level security', 'Row-security', etc, in favor
of 'policy', 'row level security' or 'row_security' as appropriate.

Happy Thanksgiving!
2014-11-27 01:15:57 -05:00
Tom Lane
bac27394a1 Support arrays as input to array_agg() and ARRAY(SELECT ...).
These cases formerly failed with errors about "could not find array type
for data type".  Now they yield arrays of the same element type and one
higher dimension.

The implementation involves creating functions with API similar to the
existing accumArrayResult() family.  I (tgl) also extended the base family
by adding an initArrayResult() function, which allows callers to avoid
special-casing the zero-inputs case if they just want an empty array as
result.  (Not all do, so the previous calling convention remains valid.)
This allowed simplifying some existing code in xml.c and plperl.c.

Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
2014-11-25 12:21:28 -05:00
Heikki Linnakangas
0bd624d63b Distinguish XLOG_FPI records generated for hint-bit updates.
Add a new XLOG_FPI_FOR_HINT record type, and use that for full-page images
generated for hint bit updates, when checksums are enabled. The new record
type is replayed exactly the same as XLOG_FPI, but allows them to be tallied
separately e.g. in pg_xlogdump.
2014-11-24 11:09:08 +02:00
Tom Lane
c5111ea9ca Remove no-longer-needed phony typedefs in genbki.h.
Now that we have a policy of hiding varlena catalog fields behind
"#ifdef CATALOG_VARLEN", there is no need for their type names to be
acceptable to the C compiler.  And experimentation shows that it does
not matter to pgindent either.  (If it did, we'd have problems anyway,
since these typedefs are unreferenced so far as the C compiler is
concerned, and find_typedef fails to identify such typedefs.)

Hence, remove the phony typedefs that genbki.h provided to make
some varlena field definitions compilable.

In passing, rearrange #define's into what seemed a more logical order.
2014-11-20 13:16:14 -05:00
Heikki Linnakangas
2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00
Alvaro Herrera
85b506bbfc Get rid of SET LOGGED indexes persistence kludge
This removes ATChangeIndexesPersistence() introduced by f41872d0c1
which was too ugly to live for long.  Instead, the correct persistence
marking is passed all the way down to reindex_index, so that the
transient relation built to contain the index relfilenode can
get marked correctly right from the start.

Author: Fabrízio de Royes Mello
Review and editorialization by Michael Paquier
                                     and Álvaro Herrera
2014-11-15 01:19:49 -03:00
Fujii Masao
1871c89202 Add generate_series(numeric, numeric).
Платон Малюгин
Reviewed by Michael Paquier, Ali Akbar and Marti Raudsepp
2014-11-11 21:44:46 +09:00
Alvaro Herrera
7516f52594 BRIN: Block Range Indexes
BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

As new tuples are added into the index, the summary information is
updated (if the block range in which the tuple is added is already
summarized) or not; in the latter case, a subsequent pass of VACUUM or
the brin_summarize_new_values() function will create the summary
information.

For data types with natural 1-D sort orders, the summary info consists
of the maximum and the minimum values of each indexed column within each
page range.  This type of operator class we call "Minmax", and we
supply a bunch of them for most data types with B-tree opclasses.
Since the BRIN code is generalized, other approaches are possible for
things such as arrays, geometric types, ranges, etc; even for things
such as enum types we could do something different than minmax with
better results.  In this commit I only include minmax.

Catalog version bumped due to new builtin catalog entries.

There's more that could be done here, but this is a good step forwards.

Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera,
with contribution by Heikki Linnakangas.

Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas.
Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo.

PS:
  The research leading to these results has received funding from the
  European Union's Seventh Framework Programme (FP7/2007-2013) under
  grant agreement n° 318633.
2014-11-07 16:38:14 -03:00