postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-02 05:13:42 -05:00

Author	SHA1	Message	Date
Heikki Linnakangas	55566c9a74	Fix dangling smgr_owner pointer when a fake relcache entry is freed. A fake relcache entry can "own" a SmgrRelation object, like a regular relcache entry. But when it was free'd, the owner field in SmgrRelation was not cleared, so it was left pointing to free'd memory. Amazingly this apparently hasn't caused crashes in practice, or we would've heard about it earlier. Andres found this with Valgrind. Report and fix by Andres Freund, with minor modifications by me. Backpatch to all supported versions.	2014-03-07 13:28:52 +02:00
Tom Lane	7c31874945	Avoid getting more than AccessShareLock when deparsing a query. In make_ruledef and get_query_def, we have long used AcquireRewriteLocks to ensure that the querytree we are about to deparse is up-to-date and the schemas of the underlying relations aren't changing. Howwever, that function thinks the query is about to be executed, so it acquires locks that are stronger than necessary for the purpose of deparsing. Thus for example, if pg_dump asks to deparse a rule that includes "INSERT INTO t", we'd acquire RowExclusiveLock on t. That results in interference with concurrent transactions that might for example ask for ShareLock on t. Since pg_dump is documented as being purely read-only, this is unexpected. (Worse, it used to actually be read-only; this behavior dates back only to 8.1, cf commit ba4200246.) Fix this by adding a parameter to AcquireRewriteLocks to tell it whether we want the "real" execution locks or only AccessShareLock. Report, diagnosis, and patch by Dean Rasheed. Back-patch to all supported branches.	2014-03-06 19:31:05 -05:00
Bruce Momjian	0024a3a3b6	C comment update: relfilenode is only unique with a tablespace Report from Antonin Houska	2014-03-05 20:52:34 -05:00
Tom Lane	8cf0ad1ea3	Add comment that ec_relids excludes "child" EquivalenceClass members. This was already documented a few lines further down, but the comment just beside the field declaration could be misleading. Per gripe from Kyotaro Horiguchi.	2014-03-05 16:00:33 -05:00
Alvaro Herrera	84df54b22e	Constructors for interval, timestamp, timestamptz Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera Reviewed-by: Tomáš Vondra, Marko Tiikkaja With input from Fabrízio de Royes Mello, Jim Nasby	2014-03-04 15:09:43 -03:00
Robert Haas	7e8db2dc42	Minor corrections to logical decoding patch.	2014-03-04 11:07:54 -05:00
Robert Haas	b89e151054	Introduce logical decoding. This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.	2014-03-03 16:32:18 -05:00
Heikki Linnakangas	f8ce16d0d2	Rename huge_tlb_pages to huge_pages, and improve docs. Christian Kruse	2014-03-03 20:52:48 +02:00
Robert Haas	a8e9b86b5e	Bump catversion. The previous patch should have entailed a catversion bump, but I forgot.	2014-03-03 07:22:20 -05:00
Robert Haas	d83ee62231	Corrections to replication slots code and documentation. Andres Freund, per a report from Vik Faering	2014-03-03 07:16:54 -05:00
Robert Haas	ae95f5f74a	Define LSNOID in pg_type.h. Most other built-in types have a similarly-named constant, so this type should probably have one, too. Michael Paquier	2014-03-03 07:03:41 -05:00
Tom Lane	9662143f0c	Allow regex operations to be terminated early by query cancel requests. The regex code didn't have any provision for query cancel; which is unsurprising given its non-Postgres origin, but still problematic since some operations can take a long time. Introduce a callback function to check for a pending query cancel or session termination request, and call it in a couple of strategic spots where we can make the regex code exit with an error indicator. If we ever actually split out the regex code as a standalone library, some additional work will be needed to let the cancel callback function be specified externally to the library. But that's straightforward (certainly so by comparison to putting the locale-dependent character classification logic on a similar arms-length basis), and there seems no need to do it right now. A bigger issue is that there may be more places than these two where we need to check for cancels. We can always add more checks later, now that the infrastructure is in place. Since there are known examples of not-terribly-long regexes that can lock up a backend for a long time, back-patch to all supported branches. I have hopes of fixing the known performance problems later, but adding query cancel ability seems like a good idea even if they were all fixed.	2014-03-01 15:20:56 -05:00
Alvaro Herrera	ef5856fd9b	Allow BASE_BACKUP to be throttled A new MAX_RATE option allows imposing a limit to the network transfer rate from the server side. This is useful to limit the stress that taking a base backup has on the server. pg_basebackup is now able to specify a value to the server, too. Author: Antonin Houska Patch reviewed by Stefan Radomski, Andres Freund, Zoltán Böszörményi, Fujii Masao, and Álvaro Herrera.	2014-02-27 18:55:57 -03:00
Robert Haas	dd1a3bccca	Show xid and xmin in pg_stat_activity and pg_stat_replication. Christian Kruse, reviewed by Andres Freund and myself, with further minor adjustments by me.	2014-02-25 12:34:04 -05:00
Robert Haas	6615e77439	Use pg_lsn data type in pg_stat_replication, too. Michael Paquier, per a suggestion from Andres Freund	2014-02-24 10:38:45 -05:00
Robert Haas	6f289c2b7d	Switch various builtin functions to use pg_lsn instead of text. The functions in slotfuncs.c don't exist in any released version, but the changes to xlogfuncs.c represent backward-incompatibilities. Per discussion, we're hoping that the queries using these functions are few enough and simple enough that this won't cause too much breakage for users. Michael Paquier, reviewed by Andres Freund and further modified by me.	2014-02-19 11:37:43 -05:00
Robert Haas	694e3d139a	Further code review for pg_lsn data type. Change input function error messages to be more consistent with what is done elsewhere. Remove a bunch of redundant type casts, so that the compiler will warn us if we screw up. Don't pass LSNs by value on platforms where a Datum is only 32 bytes, per buildfarm. Move macros for packing and unpacking LSNs to pg_lsn.h so that we can include access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.	2014-02-19 10:06:59 -05:00
Robert Haas	844a28a9dd	pg_lsn macro naming and type behavior revisions. Change pg_lsn_mi so that it can return negative values when subtracting LSNs, and clean up some perhaps ill-considered macro names.	2014-02-19 09:34:15 -05:00
Robert Haas	7d03a83f4d	Add a pg_lsn data type, to represent an LSN. Robert Haas and Michael Paquier	2014-02-19 08:35:23 -05:00
Noah Misch	31400a6733	Predict integer overflow to avoid buffer overruns. Several functions, mostly type input functions, calculated an allocation size such that the calculation wrapped to a small positive value when arguments implied a sufficiently-large requirement. Writes past the end of the inadvertent small allocation followed shortly thereafter. Coverity identified the path_in() vulnerability; code inspection led to the rest. In passing, add check_stack_depth() to prevent stack overflow in related functions. Back-patch to 8.4 (all supported versions). The non-comment hstore changes touch code that did not exist in 8.4, so that part stops at 9.0. Noah Misch and Heikki Linnakangas, reviewed by Tom Lane. Security: CVE-2014-0064	2014-02-17 09:33:31 -05:00
Noah Misch	4318daecc9	Fix handling of wide datetime input/output. Many server functions use the MAXDATELEN constant to size a buffer for parsing or displaying a datetime value. It was much too small for the longest possible interval output and slightly too small for certain valid timestamp input, particularly input with a long timezone name. The long input was rejected needlessly; the long output caused interval_out() to overrun its buffer. ECPG's pgtypes library has a copy of the vulnerable functions, which bore the same vulnerabilities along with some of its own. In contrast to the server, certain long inputs caused stack overflow rather than failing cleanly. Back-patch to 8.4 (all supported versions). Reported by Daniel Schüssler, reviewed by Tom Lane. Security: CVE-2014-0063	2014-02-17 09:33:31 -05:00
Robert Haas	5f173040e3	Avoid repeated name lookups during table and index DDL. If the name lookups come to different conclusions due to concurrent activity, we might perform some parts of the DDL on a different table than other parts. At least in the case of CREATE INDEX, this can be used to cause the permissions checks to be performed against a different table than the index creation, allowing for a privilege escalation attack. This changes the calling convention for DefineIndex, CreateTrigger, transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible (in 9.2 and newer), and AlterTable (in 9.1 and older). In addition, CheckRelationOwnership is removed in 9.2 and newer and the calling convention is changed in older branches. A field has also been added to the Constraint node (FkConstraint in 8.4). Third-party code calling these functions or using the Constraint node will require updating. Report by Andres Freund. Patch by Robert Haas and Andres Freund, reviewed by Tom Lane. Security: CVE-2014-0062	2014-02-17 09:33:31 -05:00
Noah Misch	537cbd35c8	Prevent privilege escalation in explicit calls to PL validators. The primary role of PL validators is to be called implicitly during CREATE FUNCTION, but they are also normal functions that a user can call explicitly. Add a permissions check to each validator to ensure that a user cannot use explicit validator calls to achieve things he could not otherwise achieve. Back-patch to 8.4 (all supported versions). Non-core procedural language extensions ought to make the same two-line change to their own validators. Andres Freund, reviewed by Tom Lane and Noah Misch. Security: CVE-2014-0061	2014-02-17 09:33:31 -05:00
Tom Lane	fa1f0d7859	PGDLLIMPORT-ify MainLWLockArray, ProcDiePending, proc_exit_inprogress. These are needed in HEAD to make assorted contrib modules build on Windows. Now that all the MSVC and Mingw buildfarm members seem to be on the same page about the need for them, we can have some confidence that future problems of this ilk will be detected promptly; there seems nothing more to be learned by delaying this fix further. I chose to mark QueryCancelPending as well, since it's easy to imagine code that wants to touch ProcDiePending also caring about QueryCancelPending.	2014-02-16 20:12:43 -05:00
Tom Lane	a5cf60682e	PGDLLIMPORT'ify DateStyle and IntervalStyle. This is needed on Windows to support contrib/postgres_fdw. Although it's been broken since last March, we didn't notice until recently because there were no active buildfarm members that complained about missing PGDLLIMPORT marking. Efforts are underway to improve that situation, in support of which we're delaying fixing some other cases of global variables that should be marked PGDLLIMPORT. However, this case affects 9.3, so we can't wait any longer to fix it. I chose to mark DateOrder as well, though it's not strictly necessary for postgres_fdw.	2014-02-16 12:37:07 -05:00
Tom Lane	60ff2fdd99	Centralize getopt-related declarations in a new header file pg_getopt.h. We used to have externs for getopt() and its API variables scattered all over the place. Now that we find we're going to need to tweak the variable declarations for Cygwin, it seems like a good idea to have just one place to tweak. In this commit, the variables are declared "#ifndef HAVE_GETOPT_H". That may or may not work everywhere, but we'll soon find out. Andres Freund	2014-02-15 14:31:30 -05:00
Peter Eisentraut	0f2ca0075c	Fix typo Stefan Kaltenbrunner	2014-02-13 21:50:43 -05:00
Alvaro Herrera	801c2dc72c	Separate multixact freezing parameters from xid's Previously we were piggybacking on transaction ID parameters to freeze multixacts; but since there isn't necessarily any relationship between rates of Xid and multixact consumption, this turns out not to be a good idea. Therefore, we now have multixact-specific freezing parameters: vacuum_multixact_freeze_min_age: when to remove multis as we come across them in vacuum (default to 5 million, i.e. early in comparison to Xid's default of 50 million) vacuum_multixact_freeze_table_age: when to force whole-table scans instead of scanning only the pages marked as not all visible in visibility map (default to 150 million, same as for Xids). Whichever of both which reaches the 150 million mark earlier will cause a whole-table scan. autovacuum_multixact_freeze_max_age: when for cause emergency, uninterruptible whole-table scans (default to 400 million, double as that for Xids). This means there shouldn't be more frequent emergency vacuuming than previously, unless multixacts are being used very rapidly. Backpatch to 9.3 where multixacts were made to persist enough to require freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of fields in an unnatural place, and StdRdOptions is split in two so that the newly added fields can go at the end. Patch by me, reviewed by Robert Haas, with additional input from Andres Freund and Tom Lane.	2014-02-13 19:36:31 -03:00
Peter Eisentraut	66c04c981d	Mark some more variables as static or include the appropriate header Detected by clang's -Wmissing-variable-declarations. From: Andres Freund <andres@anarazel.de>	2014-02-08 21:21:46 -05:00
Heikki Linnakangas	dbc649fd77	Speed up "rare & frequent" type GIN queries. If you have a GIN query like "rare & frequent", we currently fetch all the items that match either rare or frequent, call the consistent function for each item, and let the consistent function filter out items that only match one of the terms. However, if we can deduce that "rare" must be present for the overall qual to be true, we can scan all the rare items, and for each rare item, skip over to the next frequent item with the same or greater TID. That greatly speeds up "rare & frequent" type queries. To implement that, introduce the concept of a tri-state consistent function, where the 3rd value is MAYBE, indicating that we don't know if that term is present. Operator classes only provide a boolean consistent function, so we simulate the tri-state consistent function by calling the boolean function several times, with the MAYBE arguments set to all combinations of TRUE and FALSE. Testing all combinations is only feasible for a small number of MAYBE arguments, but it is envisioned that we'll provide a way for operator classes to provide a native tri-state consistent function, which can be much more efficient. But that is not included in this patch. We were already using that trick to for lossy pages, calling the consistent function with the lossy entry set to TRUE and FALSE. Now that we have the tri-state consistent function, use it for lossy pages too. Alexander Korotkov, with fair amount of refactoring by me.	2014-02-07 15:22:48 +02:00
Robert Haas	80353f3528	Adjust pg_sleep_for/pg_sleep_until to use clock_timestamp. Otherwise, pg_sleep_until does the wrong thing in a multi-statement transaction. Julien Rouhaud	2014-02-03 14:33:43 -05:00
Fujii Masao	3e8554a54a	Make pg_basebackup skip temporary statistics files. The temporary statistics files don't need to be included in the backup because they are always reset at the beginning of the archive recovery. This patch changes pg_basebackup so that it skips all files located in $PGDATA/pg_stat_tmp or the directory specified by stats_temp_directory parameter.	2014-02-03 23:19:49 +09:00
Robert Haas	858ec11858	Introduce replication slots. Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas	2014-01-31 22:45:36 -05:00
Bruce Momjian	fc4ffba968	system catalogs: reorder pg_amproc entries into proper sections Report form Antonin Houska	2014-01-31 16:04:18 -05:00
Robert Haas	760c770ff6	Add convenience functions pg_sleep_for and pg_sleep_until. Vik Fearing, reviewed by Pavel Stehule and myself	2014-01-30 15:47:56 -05:00
Andrew Dunstan	5e52e9d6d4	Forgot to bump catalog version for json_array_elements_text.	2014-01-29 16:38:31 -05:00
Robert Haas	9347baa5bb	Include planning time in EXPLAIN ANALYZE output. This doesn't work for prepared queries, but it's not too easy to get the information in that case and there's some debate as to exactly what the right thing to measure is, so just do this for now. Andreas Karlsson, with slight doc changes by me.	2014-01-29 16:09:15 -05:00
Andrew Dunstan	5264d91541	Add json_array_elements_text function. This was a notable omission from the json functions added in 9.3 and there have been numerous complaints about its absence. Laurence Rowe.	2014-01-29 15:39:01 -05:00
Heikki Linnakangas	626a120656	Further optimize GIN multi-key searches. When skipping over some items in a posting tree, re-find the new location by descending the tree from root, rather than walking the right links. This can save a lot of I/O. Heavily modified from Alexander Korotkov's fast scan patch.	2014-01-29 21:24:38 +02:00
Heikki Linnakangas	25b1dafab6	Further optimize multi-key GIN searches. If we're skipping past a certain TID, avoid decoding posting list segments that only contain smaller TIDs. Extracted from Alexander Korotkov's fast scan patch, heavily modified.	2014-01-29 18:26:40 +02:00
Heikki Linnakangas	1a3458b6d8	Allow using huge TLB pages on Linux (MAP_HUGETLB) This patch adds an option, huge_tlb_pages, which allows requesting the shared memory segment to be allocated using huge pages, by using the MAP_HUGETLB flag in mmap(). This can improve performance. The default is 'try', which means that we will attempt using huge pages, and fall back to non-huge pages if it doesn't work. Currently, only Linux has MAP_HUGETLB. On other platforms, the default 'try' behaves the same as 'off'. In the passing, don't try to round the mmap() size to a multiple of pagesize. mmap() doesn't require that, and there's no particular reason for PostgreSQL to do that either. When using MAP_HUGETLB, however, round the request size up to nearest 2MB boundary. This is to work around a bug in some Linux kernel versions, but also to avoid wasting memory, because the kernel will round the size up anyway. Many people were involved in writing this patch, including Christian Kruse, Richard Poole, Abhijit Menon-Sen, reviewed by Peter Geoghegan, Andres Freund and me.	2014-01-29 14:08:30 +02:00
Andrew Dunstan	105639900b	New json functions. json_build_array() and json_build_object allow for the construction of arbitrarily complex json trees. json_object() turns a one or two dimensional array, or two separate arrays, into a json_object of name/value pairs, similarly to the hstore() function. json_object_agg() aggregates its two arguments into a single json object as name value pairs. Catalog version bumped. Andrew Dunstan, reviewed by Marko Tiikkaja.	2014-01-28 17:48:21 -05:00
Fujii Masao	9132b189bf	Add pg_stat_archiver statistics view. This view shows the statistics about the WAL archiver process's activity. Gabriele Bartolini, reviewed by Michael Paquier, refactored a bit by me.	2014-01-29 02:58:22 +09:00
Bruce Momjian	051b3341c1	Remove orphaned prototype Rajeev rastogi	2014-01-28 11:29:39 -05:00
Tom Lane	64e43c59b8	Log a detail message for auth failures due to missing or expired password. It's worth distinguishing these cases from run-of-the-mill wrong-password problems, since users have been known to waste lots of time pursuing the wrong theory about what's failing. Now, our longstanding policy about how to report authentication failures is that we don't really want to tell the client such things, since that might be giving information to a bad guy. But there's nothing wrong with reporting the details to the postmaster log, and indeed the comments in this area of the code contemplate that interesting details should be so reported. We just weren't handling these particular interesting cases usefully. To fix, add infrastructure allowing subroutines of ClientAuthentication() to return a string to be added to the errdetail_log field of the main authentication-failed error report. We might later want to use this to report other subcases of authentication failure the same way, but for the moment I just dealt with password cases. Per discussion of a patch from Josh Drake, though this is not what he proposed.	2014-01-27 21:04:09 -05:00
Robert Haas	ea9df812d8	Relax the requirement that all lwlocks be stored in a single array. This makes it possible to store lwlocks as part of some other data structure in the main shared memory segment, or in a dynamic shared memory segment. There is still a main LWLock array and this patch does not move anything out of it, but it provides necessary infrastructure for doing that in the future. This change is likely to increase the size of LWLockPadded on some platforms, especially 32-bit platforms where it was previously only 16 bytes. Patch by me. Review by Andres Freund and KaiGai Kohei.	2014-01-27 11:07:44 -05:00
Tom Lane	2850896961	Code review for auto-tuned effective_cache_size. Fix integer overflow issue noted by Magnus Hagander, as well as a bunch of other infelicities in commit `ee1e5662d8` and its unreasonably large number of followups.	2014-01-27 00:05:56 -05:00
Fujii Masao	7c619be623	Fix typos in comments for ALTER SYSTEM. Michael Paquier	2014-01-27 12:23:20 +09:00
Andrew Dunstan	cec8394b5c	Enable building with Visual Studion 2013. Backpatch to 9.3. Brar Piening.	2014-01-26 09:49:10 -05:00
Heikki Linnakangas	71c6a8e375	Add recovery_target='immediate' option. This allows ending recovery as a consistent state has been reached. Without this, there was no easy way to e.g restore an online backup, without replaying any extra WAL after the backup ended. MauMau and me.	2014-01-25 17:34:04 +02:00
Stephen Frost	fbe19ee3b8	ALTER TABLESPACE ... MOVE ... OWNED BY Add the ability to specify the objects to move by who those objects are owned by (as relowner) and change ALL to mean ALL objects. This makes the command always operate against a well-defined set of objects and not have the objects-to-be-moved based on the role of the user running the command. Per discussion with Simon and Tom.	2014-01-23 23:52:40 -05:00
Tom Lane	ac4ef637ad	Allow use of "z" flag in our printf calls, and use it where appropriate. Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me	2014-01-23 17:18:33 -05:00
Alvaro Herrera	b152c6cd0d	Make DROP IF EXISTS more consistently not fail Some cases were still reporting errors and aborting, instead of a NOTICE that the object was being skipped. This makes it more difficult to cleanly handle pg_dump --clean, so change that to instead skip missing objects properly. Per bug #7873 reported by Dave Rolsky; apparently this affects a large number of users. Authors: Pavel Stehule and Dean Rasheed. Some tweaks by Álvaro Herrera	2014-01-23 14:40:29 -03:00
Heikki Linnakangas	36a35c550a	Compress GIN posting lists, for smaller index size. GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.	2014-01-22 19:20:58 +02:00
Robert Haas	01f7808b3e	Add a cardinality function for arrays. Unlike our other array functions, this considers the total number of elements across all dimensions, and returns 0 rather than NULL when the array has no elements. But it seems that both of those behaviors are almost universally disliked, so hopefully that's OK. Marko Tiikkaja, reviewed by Dean Rasheed and Pavel Stehule	2014-01-21 12:38:53 -05:00
Alvaro Herrera	d2458e3b20	Expose a routine to print triggers during EXPLAIN ANALYZE This is so that auto_explain can use it. Kyotaro HORIGUCHI	2014-01-20 17:13:47 -03:00
Simon Riggs	4d1e2aeb1a	Speed up COPY into tables with DEFAULT nextval() Previously the presence of a nextval() prevented the use of batch-mode COPY. This patch introduces a special case just for nextval() functions. In future we will introduce a general case solution for labelling volatile functions as safe for use.	2014-01-20 17:22:38 +00:00
Magnus Hagander	98de86e422	Remove support for native krb5 authentication krb5 has been deprecated since 8.3, and the recommended way to do Kerberos authentication is using the GSSAPI authentication method (which is still fully supported). libpq retains the ability to identify krb5 authentication, but only gives an error message about it being unsupported. Since all authentication is initiated from the backend, there is no need to keep it at all in the backend.	2014-01-19 17:05:01 +01:00
Stephen Frost	5254958e92	Add CREATE TABLESPACE ... WITH ... Options Tablespaces have a few options which can be set on them to give PG hints as to how the tablespace behaves (perhaps it's faster for sequential scans, or better able to handle random access, etc). These options were only available through the ALTER TABLESPACE command. This adds the ability to set these options at CREATE TABLESPACE time, removing the need to do both a CREATE TABLESPACE and ALTER TABLESPACE to get the correct options set on the tablespace. Vik Fearing, reviewed by Michael Paquier.	2014-01-18 20:59:31 -05:00
Tom Lane	115f414124	Fix VACUUM's reporting of dead-tuple counts to the stats collector. Historically, VACUUM has just reported its new_rel_tuples estimate (the same thing it puts into pg_class.reltuples) to the stats collector. That number counts both live and dead-but-not-yet-reclaimable tuples. This behavior may once have been right, but modern versions of the pgstats code track live and dead tuple counts separately, so putting the total into n_live_tuples and zero into n_dead_tuples is surely pretty bogus. Fix it to report live and dead tuple counts separately. This doesn't really do much for situations where updating transactions commit concurrently with a VACUUM scan (possibly causing double-counting or omission of the tuples they add or delete); but it's clearly an improvement over what we were doing before. Hari Babu, reviewed by Amit Kapila	2014-01-18 19:24:33 -05:00
Stephen Frost	76e91b38ba	Add ALTER TABLESPACE ... MOVE command This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of objects from one tablespace to another. This can be extremely handy and avoids a lot of error-prone scripting. ALTER TABLESPACE ... MOVE will only move objects the user owns, will notify the user if no objects were found, and can be used to move ALL objects or specific types of objects (TABLES, INDEXES, or MATERIALIZED VIEWS).	2014-01-18 18:56:40 -05:00
Tom Lane	0d79c0a8cc	Make various variables const (read-only). These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me	2014-01-18 16:04:32 -05:00
Andrew Dunstan	7d7eee8bb7	Export a few more symbols required for test_shm_mq module. Patch from Amit Kapila.	2014-01-18 15:29:45 -05:00
Andrew Dunstan	708c529c7f	Export set_latch_on_sigusr1 symbol for Windows. Per buildfarm currawong and grip from David Rowley.	2014-01-17 12:48:23 -05:00
Andrew Dunstan	b64d956d58	Prevent double macro definition of WIN32. David Rowley.	2014-01-17 11:49:44 -05:00
Magnus Hagander	9c14dd22e1	Define WIN32 when _WIN32 is set _WIN32 is set by the compiler, whereas our code uses WIN32 that is normally set through our build system. To make it possible to build extensions out of tree we cannot rely on that, so set the WIN32 symbol explicitly whenever the compiler has set _WIN32. Not setting this symbol causes double inclusion of pg_config_os.h, and possibly other errors as well. Craig Ringer	2014-01-17 12:41:32 +01:00
Robert Haas	ed46758381	Logging running transactions every 15 seconds. Previously, we did this just once per checkpoint, but that could make Hot Standby take a long time to initialize. To avoid busying an otherwise-idle system, we don't do this if no WAL has been written since we did it last. Andres Freund	2014-01-15 12:41:20 -05:00
Tom Lane	061b079f89	Fix multiple bugs in index page locking during hot-standby WAL replay. In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.	2014-01-14 17:35:21 -05:00
Robert Haas	246a9a8d0c	Fix typo in comment. Etsuro Fujita	2014-01-14 14:34:57 -05:00
Robert Haas	ec9037df26	Single-reader, single-writer, lightweight shared message queue. This code provides infrastructure for user backends to communicate relatively easily with background workers. The message queue is structured as a ring buffer and allows messages of arbitary length to be sent and received. Patch by me. Review by KaiGai Kohei and Andres Freund.	2014-01-14 12:23:22 -05:00
Robert Haas	6ddd5137b2	Simple table of contents for a shared memory segment. This interface is intended to make it simple to divide a dynamic shared memory segment into different regions with distinct purposes. It therefore serves much the same purpose that ShmemIndex accomplishes for the main shared memory segment, but it is intended to be more lightweight. Patch by me. Review by Andres Freund.	2014-01-14 12:18:58 -05:00
Robert Haas	2bb1f14b89	Make bitmap heap scans show exact/lossy block info in EXPLAIN ANALYZE. Etsuro Fujita	2014-01-13 14:42:16 -05:00
Tom Lane	158b7fa6a3	Disallow LATERAL references to the target table of an UPDATE/DELETE. On second thought, commit `0c051c9008` was over-hasty: rather than allowing this case, we ought to reject it for now. That leaves the field clear for a future feature that allows the target table to be re-specified in the FROM (or USING) clause, which will enable left-joining the target table to something else. We can then also allow LATERAL references to such an explicitly re-specified target table. But allowing them right now will create ambiguities or worse for such a feature, and it isn't something we documented 9.3 as supporting. While at it, add a convenience subroutine to avoid having several copies of the ereport for disalllowed-LATERAL-reference cases.	2014-01-11 19:03:12 -05:00
Andrew Dunstan	11829ff8b2	Remove DESCR entries for json operator functions. Per -hackers discussion.	2014-01-10 22:25:04 -05:00
Bruce Momjian	111022eac6	Move username lookup functions from /port to /common Per suggestion from Peter E and Alvaro	2014-01-10 18:03:28 -05:00
Tom Lane	220b34331f	We don't need to include pg_sema.h in s_lock.h anymore. Minor improvement to commit `daa7527afc`: s_lock.h no longer has any need to mention PGSemaphoreData, so we can rip out the #include that supplies that. In a non-HAVE_SPINLOCKS build, this doesn't really buy much since we still need the #include in spin.h --- but everywhere else, this reduces #include footprint by some trifle, and helps keep the different locking facilities separate.	2014-01-08 20:58:22 -05:00
Robert Haas	daa7527afc	Reduce the number of semaphores used under --disable-spinlocks. Instead of allocating a semaphore from the operating system for every spinlock, allocate a fixed number of semaphores (by default, 1024) from the operating system and multiplex all the spinlocks that get created onto them. This could self-deadlock if a process attempted to acquire more than one spinlock at a time, but since processes aren't supposed to execute anything other than short stretches of straight-line code while holding a spinlock, that shouldn't happen. One motivation for this change is that, with the introduction of dynamic shared memory, it may be desirable to create spinlocks that last for less than the lifetime of the server. Without this change, attempting to use such facilities under --disable-spinlocks would quickly exhaust any supply of available semaphores. Quite apart from that, it's desirable to contain the quantity of semaphores needed to run the server simply on convenience grounds, since using too many may make it harder to get PostgreSQL running on a new platform, which is mostly the point of --disable-spinlocks in the first place. Patch by me; review by Tom Lane.	2014-01-08 18:58:00 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Tom Lane	8b49a6044d	Cache catalog lookup data across groups in ordered-set aggregates. The initial commit of ordered-set aggregates just did all the setup work afresh each time the aggregate function is started up. But in a GROUP BY query, the catalog lookups need not be repeated for each group, since the column datatypes and sort information won't change. When there are many small groups, this makes for a useful, though not huge, performance improvement. Per suggestion from Andrew Gierth. Profiling of these cases suggests that it might be profitable to avoid duplicate lookups within tuplesort startup as well; but changing the tuplesort APIs would have much broader impact, so I left that for another day.	2014-01-05 12:28:39 -05:00
Tom Lane	a7ef273e1c	Fix calculation of maximum statistics-message size. The PGSTAT_NUM_TABENTRIES macro should have been updated when new fields were added to struct PgStat_MsgTabstat in commit `644828908`, but it wasn't. Fix that. Also, add a static assertion that we didn't overrun the intended size limit on stats messages. This will not necessarily catch every mistake in computing the maximum array size for stats messages, but it will catch ones that have practical consequences. (The assertion in fact doesn't complain about the aforementioned error in PGSTAT_NUM_TABENTRIES, because that was not big enough to cause the array length to increase.) No back-patch, as there's no actual bug in existing releases; this is just in the nature of future-proofing. Mark Dilger and Tom Lane	2014-01-02 21:45:51 -05:00
Alvaro Herrera	722acf51a0	Handle wraparound during truncation in multixact/members In pg_multixact/members, relying on modulo-2^32 arithmetic for wraparound handling doesn't work all that well. Because we don't explicitely track wraparound of the allocation counter for members, it is possible that the "live" area exceeds 2^31 entries; trying to remove SLRU segments that are "old" according to the original logic might lead to removal of segments still in use. To fix, have the truncation routine use a tailored SlruScanDirectory callback that keeps track of the live area in actual use; that way, when the live range exceeds 2^31 entries, the oldest segments still live will not get removed untimely. This new SlruScanDir callback needs to take care not to remove segments that are "in the future": if new SLRU segments appear while the truncation is ongoing, make sure we don't remove them. This requires examination of shared memory state to recheck for false positives, but testing suggests that this doesn't cause a problem. The original coding didn't suffer from this pitfall because segments created when truncation is running are never considered to be removable. Per Andres Freund's investigation of bug #8673 reported by Serge Negodyuck.	2014-01-02 18:16:54 -03:00
Robert Haas	3cff1879f8	Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them. We haven't wanted to do this in the past on the grounds that in rare cases the original xmin value will be needed for forensic purposes, but commit `37484ad2aa` removes that objection, so now we can. Per extensive discussion, among many people, on pgsql-hackers.	2014-01-02 15:15:51 -05:00
Robert Haas	4b351841fa	Rename walLogHints to wal_log_hints for easier grepping. Michael Paquier	2014-01-01 20:17:00 -05:00
Tom Lane	f7fbf4b0be	Remove dead code now that orindxpath.c is history. We don't need make_restrictinfo_from_bitmapqual() anymore at all. generate_bitmap_or_paths() doesn't need to be exported, and we can drop its rather klugy restriction_only flag.	2013-12-30 12:50:31 -05:00
Tom Lane	f343a880d5	Extract restriction OR clauses whether or not they are indexable. It's possible to extract a restriction OR clause from a join clause that has the form of an OR-of-ANDs, if each sub-AND includes a clause that mentions only one specific relation. While PG has been aware of that idea for many years, the code previously only did it if it could extract an indexable OR clause. On reflection, though, that seems a silly limitation: adding a restriction clause can be a win by reducing the number of rows that have to be filtered at the join step, even if we have to test the clause as a plain filter clause during the scan. This should be especially useful for foreign tables, where the change can cut the number of rows that have to be retrieved from the foreign server; but testing shows it can win even on local tables. Per a suggestion from Robert Haas. As a heuristic, I made the code accept an extracted restriction clause if its estimated selectivity is less than 0.9, which will probably result in accepting extracted clauses just about always. We might need to tweak that later based on experience. Since the code no longer has even a weak connection to Path creation, remove orindxpath.c and create a new file optimizer/util/orclauses.c. There's some additional janitorial cleanup of now-dead code that needs to happen, but it seems like that's a fit subject for a separate commit.	2013-12-30 12:24:37 -05:00
Tom Lane	ed011d9754	Undo autoconf 2.69's attempt to #define _DARWIN_USE_64_BIT_INODE. Defining this symbol causes OS X 10.5 to use a buggy version of readdir(), which can sometimes fail with EINVAL if the previously-fetched directory entry has been deleted or renamed. In later OS X versions that bug has been repaired, but we still don't need the #define because it's on by default. So this is just an all-around bad idea, and we can do without it.	2013-12-29 12:57:56 -05:00
Peter Eisentraut	a09e3fd776	Fix whitespace	2013-12-26 23:51:56 -05:00
Tom Lane	8d65da1f01	Support ordered-set (WITHIN GROUP) aggregates. This patch introduces generic support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). We also added mode() though it is not in the spec, as well as versions of percentile_cont() and percentile_disc() that can compute multiple percentile values in one pass over the data. Unlike the original submission, this patch puts full control of the sorting process in the hands of the aggregate's support functions. To allow the support functions to find out how they're supposed to sort, a new API function AggGetAggref() is added to nodeAgg.c. This allows retrieval of the aggregate call's Aggref node, which may have other uses beyond the immediate need. There is also support for ordered-set aggregates to install cleanup callback functions, so that they can be sure that infrastructure such as tuplesort objects gets cleaned up. In passing, make some fixes in the recently-added support for variadic aggregates, and make some editorial adjustments in the recent FILTER additions for aggregates. Also, simplify use of IsBinaryCoercible() by allowing it to succeed whenever the target type is ANY or ANYELEMENT. It was inconsistent that it dealt with other polymorphic target types but not these. Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing, and rather heavily editorialized upon by Tom Lane	2013-12-23 16:11:35 -05:00
Robert Haas	37484ad2aa	Change the way we mark tuples as frozen. Instead of changing the tuple xmin to FrozenTransactionId, the combination of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous proposals to freeze tuples opportunistically before vacuum_freeze_min_age is reached have foundered on the objection that replacing xmin by FrozenTransactionId might hinder debugging efforts when things in this area go awry; this patch is intended to solve that problem by keeping the XID around (but largely ignoring the value to which it is set). Third-party code that checks for HEAP_XMIN_INVALID on tuples where HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix, use the new accessor macros in htup_details.h rather than consulting the bits directly. HeapTupleHeaderGetXmin has been modified to return FrozenTransactionId when the infomask bits indicate that the tuple is frozen; use HeapTupleHeaderGetRawXmin when you already know that the tuple isn't marked commited or frozen, or want the raw value anyway. We currently do this in routines that display the xmin for user consumption, in tqual.c where it's known to be safe and important for the avoidance of extra cycles, and in the function-caching code for various procedural languages, which shouldn't invalidate the cache just because the tuple gets frozen. Robert Haas and Andres Freund	2013-12-22 15:49:09 -05:00
Fujii Masao	961bf59fb7	Rename wal_log_hintbits to wal_log_hints, per discussion on pgsql-hackers. Sawada Masahiko	2013-12-21 03:33:16 +09:00
Bruce Momjian	527fdd9df1	Move pg_upgrade_support global variables to their own include file Previously their declarations were spread around to avoid accidental access.	2013-12-19 16:10:07 -05:00
Peter Eisentraut	94b899b829	Upgrade to Autoconf 2.69	2013-12-18 20:53:23 -05:00
Robert Haas	001a573a20	Allow on-detach callbacks for dynamic shared memory segments. Just as backends must clean up their shared memory state (releasing lwlocks, buffer pins, etc.) before exiting, they must also perform any similar cleanups related to dynamic shared memory segments they have mapped before unmapping those segments. So add a mechanism to ensure that. Existing on_shmem_exit hooks include both "user level" cleanup such as transaction abort and removal of leftover temporary relations and also "low level" cleanup that forcibly released leftover shared memory resources. On-detach callbacks should run after the first group but before the second group, so create a new before_shmem_exit function for registering the early callbacks and keep on_shmem_exit for the regular callbacks. (An earlier draft of this patch added an additional argument to on_shmem_exit, but that had a much larger footprint and probably a substantially higher risk of breaking third party code for no real gain.) Patch by me, reviewed by KaiGai Kohei and Andres Freund.	2013-12-18 13:09:09 -05:00
Bruce Momjian	613c6d26bd	Fix incorrect error message reported for non-existent users Previously, lookups of non-existent user names could return "Success"; it will now return "User does not exist" by resetting errno. This also centralizes the user name lookup code in libpgport. Report and analysis by Nicolas Marchildon; patch by me	2013-12-18 12:16:21 -05:00
Alvaro Herrera	11ac4c73cb	Don't ignore tuple locks propagated by our updates If a tuple was locked by transaction A, and transaction B updated it, the new version of the tuple created by B would be locked by A, yet visible only to B; due to an oversight in HeapTupleSatisfiesUpdate, the lock held by A wouldn't get checked if transaction B later deleted (or key-updated) the new version of the tuple. This might cause referential integrity checks to give false positives (that is, allow deletes that should have been rejected). This is an easy oversight to have made, because prior to improved tuple locks in commit `0ac5ad5134` it wasn't possible to have tuples created by our own transaction that were also locked by remote transactions, and so locks weren't even considered in that code path. It is recommended that foreign keys be rechecked manually in bulk after installing this update, in case some referenced rows are missing with some referencing row remaining. Per bug reported by Daniel Wood in CAPweHKe5QQ1747X2c0tA=5zf4YnS2xcvGf13Opd-1Mq24rF1cQ@mail.gmail.com	2013-12-18 13:45:51 -03:00
Tatsuo Ishii	65d6e4cb5c	Add ALTER SYSTEM command to edit the server configuration file. Patch contributed by Amit Kapila. Reviewed by Hari Babu, Masao Fujii, Boszormenyi Zoltan, Andres Freund, Greg Smith and others.	2013-12-18 23:42:44 +09:00
Alvaro Herrera	3b97e6823b	Rework tuple freezing protocol Tuple freezing was broken in connection to MultiXactIds; commit `8e53ae025d` tried to fix it, but didn't go far enough. As noted by Noah Misch, freezing a tuple whose Xmax is a multi containing an aborted update might cause locks in the multi to go ignored by later transactions. This is because the code depended on a multixact above their cutoff point not having any lock-only member older than the cutoff point for Xids, which is easily defeated in READ COMMITTED transactions. The fix for this involves creating a new MultiXactId when necessary. But this cannot be done during WAL replay, and moreover multixact examination requires using CLOG access routines which are not supposed to be used during WAL replay either; so tuple freezing cannot be done with the old freeze WAL record. Therefore, separate the freezing computation from its execution, and change the WAL record to carry all necessary information. At WAL replay time, it's easy to re-execute freezing because we don't need to re-compute the new infomask/Xmax values but just take them from the WAL record. While at it, restructure the coding to ensure all page changes occur in a single critical section without much room for failures. The previous coding wasn't using a critical section, without any explanation as to why this was acceptable. In replication scenarios using the 9.3 branch, standby servers must be upgraded before their master, so that they are prepared to deal with the new WAL record once the master is upgraded; failure to do so will cause WAL replay to die with a PANIC message. Later upgrade of the standby will allow the process to continue where it left off, so there's no disruption of the data in the standby in any case. Standbys know how to deal with the old WAL record, so it's okay to keep the master running the old code for a while. In master, the old freeze WAL record is gone, for cleanliness' sake; there's no compatibility concern there. Backpatch to 9.3, where the original bug was introduced and where the previous fix was backpatched. Álvaro Herrera and Andres Freund	2013-12-16 11:29:50 -03:00
Heikki Linnakangas	50e547096c	Add GUC to enable WAL-logging of hint bits, even with checksums disabled. WAL records of hint bit updates is useful to tools that want to examine which pages have been modified. In particular, this is required to make the pg_rewind tool safe (without checksums). This can also be used to test how much extra WAL-logging would occur if you enabled checksums, without actually enabling them (which you can't currently do without re-initdb'ing). Sawada Masahiko, docs by Samrat Revagade. Reviewed by Dilip Kumar, with further changes by me.	2013-12-13 16:26:14 +02:00
Simon Riggs	8693559cac	New autovacuum_work_mem parameter If autovacuum_work_mem is set, autovacuum workers now use this parameter in preference to maintenance_work_mem. Peter Geoghegan	2013-12-12 11:42:39 +00:00
Robert Haas	66abc2608c	Add a new reloption, user_catalog_table. When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.	2013-12-10 19:17:34 -05:00
Robert Haas	e55704d8b2	Add new wal_level, logical, sufficient for logical decoding. When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit `07cacba983`. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.	2013-12-10 19:01:40 -05:00
Noah Misch	53685d7981	Rename TABLE() to ROWS FROM(). SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and other collection types. This feature, however, deals with set-returning functions. Use a different syntax for this feature to keep open the possibility of implementing the standard TABLE().	2013-12-10 09:34:37 -05:00
Heikki Linnakangas	9e857436ef	Don't include unused space in LOG_NEWPAGE records. This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.	2013-12-04 00:10:47 +02:00
Peter Eisentraut	34fa72ec9c	Remove use of obsolescent Autoconf macros Remove the use of the following macros, which are obsolescent according to the Autoconf documentation: - AC_C_CONST - AC_C_STRINGIZE - AC_C_VOLATILE - AC_FUNC_MEMCMP	2013-11-30 09:17:08 -05:00
Alvaro Herrera	1df0122daa	Truncate pg_multixact/'s contents during crash recovery Commit `9dc842f08` of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund	2013-11-29 21:47:15 -03:00
Alvaro Herrera	f54106f77e	Fix full-table-vacuum request mechanism for MultiXactIds While autovacuum dutifully launched anti-multixact-wraparound vacuums when the multixact "age" was reached, the vacuum code was not aware that it needed to make them be full table vacuums. As the resulting partial-table vacuums aren't capable of actually increasing relminmxid, autovacuum continued to launch anti-wraparound vacuums that didn't have the intended effect, until age of relfrozenxid caused the vacuum to finally be a full table one via vacuum_freeze_table_age. To fix, introduce logic for multixacts similar to that for plain TransactionIds, using the same GUCs. Backpatch to 9.3, where permanent MultiXactIds were introduced. Andres Freund, some cleanup by Álvaro	2013-11-29 21:47:13 -03:00
Tom Lane	16e1b7a1b7	Fix assorted race conditions in the new timeout infrastructure. Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.	2013-11-29 16:41:00 -05:00
Robert Haas	8e18d04d4d	Refine our definition of what constitutes a system relation. Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than all TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.	2013-11-28 20:57:20 -05:00
Tom Lane	7db285afc9	Fix stale-pointer problem in fast-path locking logic. When acquiring a lock in fast-path mode, we must reset the locallock object's lock and proclock fields to NULL. They are not necessarily that way to start with, because the locallock could be left over from a failed lock acquisition attempt earlier in the transaction. Failure to do this led to all sorts of interesting misbehaviors when LockRelease tried to clean up no-longer-related lock and proclock objects in shared memory. Per report from Dan Wood. In passing, modify LockRelease to elog not just Assert if it doesn't find lock and proclock objects for a formerly fast-path lock, matching the code in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a bug but it will help in diagnosing any future bugs in this area. Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry to break out of their loops over the fastpath array once they've found the sole matching entry. This was inconsistently done in some search loops and not others. Improve assorted related comments, too. Back-patch to 9.2 where the fast-path mechanism was introduced.	2013-11-27 18:10:00 -05:00
Heikki Linnakangas	631118fe1e	Get rid of the post-recovery cleanup step of GIN page splits. Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.	2013-11-27 19:21:23 +02:00
Heikki Linnakangas	ce5326eed3	More GIN refactoring. Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.	2013-11-27 15:43:05 +02:00
Peter Eisentraut	85ed91ee7d	Implement information_schema.parameters.parameter_default column Reviewed-by: Ali Dar <ali.munir.dar@gmail.com> Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com> Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>	2013-11-26 23:21:35 -05:00
Bruce Momjian	a6542a4b68	Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behavior Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a transaction block from error (post-9.3) to warning. (Was nothing in <= 9.3.) Also change ABORT outside of a transaction block from notice to warning.	2013-11-25 19:19:40 -05:00
Tom Lane	45e02e3232	Fix array slicing of int2vector and oidvector values. The previous coding labeled expressions such as pg_index.indkey[1:3] as being of int2vector type; which is not right because the subscript bounds of such a result don't, in general, satisfy the restrictions of int2vector. To fix, implicitly promote the result of slicing int2vector to int2[], or oidvector to oid[]. This is similar to what we've done with domains over arrays, which is a good analogy because these types are very much like restricted domains of the corresponding regular-array types. A side-effect is that we now also forbid array-element updates on such columns, eg while "update pg_index set indkey[4] = 42" would have worked before if you were superuser (and corrupted your catalogs irretrievably, no doubt) it's now disallowed. This seems like a good thing since, again, some choices of subscripting would've led to results not satisfying the restrictions of int2vector. The case of an array-slice update was rejected before, though with a different error message than you get now. We could make these cases work in future if we added a cast from int2[] to int2vector (with a cast function checking the subscript restrictions) but it seems unlikely that there's any value in that. Per report from Ronan Dunklau. Back-patch to all supported branches because of the crash risks involved.	2013-11-23 20:03:56 -05:00
Tom Lane	784e762e88	Support multi-argument UNNEST(), and TABLE() syntax for multiple functions. This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me	2013-11-21 19:37:20 -05:00
Heikki Linnakangas	501012631e	Refactor the internal GIN B-tree interface for forming a downlink. This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.	2013-11-20 16:57:41 +02:00
Heikki Linnakangas	04965ad40e	Further GIN refactoring. Merge some functions that were always called together. Makes the code little bit more readable.	2013-11-20 16:09:14 +02:00
Tom Lane	f901bb50e3	Add make_date() and make_time() functions. Pavel Stehule, reviewed by Jeevan Chalke and Atri Sharma	2013-11-17 15:06:50 -05:00
Tom Lane	69c8fbac20	Improve performance of numeric sum(), avg(), stddev(), variance(), etc. This patch improves performance of most built-in aggregates that formerly used a NUMERIC or NUMERIC array as their transition type; this includes not only aggregates on numeric inputs, but some aggregates on integer inputs where overflow of an int8 value is a possibility. The code now uses a special-purpose data structure to avoid array construction and deconstruction overhead, as well as packing and unpacking overhead for numeric values. These aggregates' transition type is now declared as INTERNAL, since it doesn't correspond to any SQL data type. To keep the planner from thinking that that means a lot of storage will be used, we make use of the just-added pg_aggregate.aggtransspace feature. The space estimate is set to 128 bytes, which is at least in the right ballpark. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 18:46:34 -05:00
Tom Lane	6cb86143e8	Allow aggregates to provide estimates of their transition state data size. Formerly the planner had a hard-wired rule of thumb for guessing the amount of space consumed by an aggregate function's transition state data. This estimate is critical to deciding whether it's OK to use hash aggregation, and in many situations the built-in estimate isn't very good. This patch adds a column to pg_aggregate wherein a per-aggregate estimate can be provided, overriding the planner's default, and infrastructure for setting the column via CREATE AGGREGATE. It may be that additional smarts will be required in future, perhaps even a per-aggregate estimation function. But this is already a step forward. This is extracted from a larger patch to improve the performance of numeric and int8 aggregates. I (tgl) thought it was worth reviewing and committing this infrastructure separately. In this commit, all built-in aggregates are given aggtransspace = 0, so no behavior should change. Hadi Moshayedi, reviewed by Pavel Stehule and Tomas Vondra	2013-11-16 16:03:40 -05:00
Tom Lane	f3b3b8d5be	Compute correct em_nullable_relids in get_eclass_for_sort_expr(). Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr must be able to compute valid em_nullable_relids for any new equivalence class members it creates. I'd worried about this in the commit message for `db9f0e1d9a`, but claimed that it wasn't a problem because multi-member ECs should already exist when it runs. That is transparently wrong, though, because this function is also called by initialize_mergeclause_eclasses, which runs during deconstruct_jointree. The example given in the bug report (which the new regression test item is based upon) fails because the COALESCE() expression is first seen by initialize_mergeclause_eclasses rather than process_equivalence. Fixing this requires passing the appropriate nullable_relids set to get_eclass_for_sort_expr, and it requires new code to compute that set for top-level expressions such as ORDER BY, GROUP BY, etc. We store the top-level nullable_relids in a new field in PlannerInfo to avoid computing it many times. In the back branches, I've added the new field at the end of the struct to minimize ABI breakage for planner plugins. There doesn't seem to be a good alternative to changing get_eclass_for_sort_expr's API signature, though. There probably aren't any third-party extensions calling that function directly; moreover, if there are, they probably need to think about what to pass for nullable_relids anyway. Back-patch to 9.2, like the previous patch in this area.	2013-11-15 16:46:18 -05:00
Peter Eisentraut	001e114b8d	Fix whitespace issues found by git diff --check, add gitattributes Set per file type attributes in .gitattributes to fine-tune whitespace checks. With the associated cleanups, the tree is now clean for git	2013-11-10 14:48:29 -05:00
Heikki Linnakangas	ac4ab97ec0	Fix race condition in GIN posting tree page deletion. If a page is deleted, and reused for something else, just as a search is following a rightlink to it from its left sibling, the search would continue scanning whatever the new contents of the page are. That could lead to incorrect query results, or even something more curious if the page is reused for a different kind of a page. To fix, modify the search algorithm to lock the next page before releasing the previous one, and refrain from deleting pages from the leftmost branch of the tree. Add a new Concurrency section to the README, explaining why this works. There is a lot more one could say about concurrency in GIN, but that's for another patch. Backpatch to all supported versions.	2013-11-08 22:21:42 +02:00
Robert Haas	07cacba983	Add the notion of REPLICA IDENTITY for a table. Pending patches for logical replication will use this to determine which columns of a tuple ought to be considered as its candidate key. Andres Freund, with minor, mostly cosmetic adjustments by me	2013-11-08 12:30:43 -05:00
Heikki Linnakangas	0ea53256a8	Fix missing argument and function prototypes. Not sure how I missed these in previous commit.	2013-11-06 11:22:58 +02:00
Heikki Linnakangas	ecaa4708e5	Misc GIN refactoring. Merge the isEnoughSpace and placeToPage functions in the b-tree interface into one function that tries to put a tuple on page, and returns false if it doesn't fit. Move createPostingTree function to gindatapage.c, and change its contract so that it can be passed more items than fit on the root page. It's in a better position than the callers to know how many items fit. Move ginMergeItemPointers out of gindatapage.c, into a separate file. These changes make no difference now, but reduce the footprint of Alexander Korotkov's upcoming patch to pack item pointers more tightly.	2013-11-06 10:32:09 +02:00
Tom Lane	45f64f1bbf	Remove CTimeZone/HasCTZSet, root and branch. These variables no longer have any useful purpose, since there's no reason to special-case brute force timezones now that we have a valid session_timezone setting for them. Remove the variables, and remove the SET/SHOW TIME ZONE code that deals with them. The user-visible impact of this is that SHOW TIME ZONE will now show a POSIX-style zone specification, in the form "<+-offset>-+offset", rather than an interval value when a brute-force zone has been set. While perhaps less intuitive, this is a better definition than before because it's actually possible to give that string back to SET TIME ZONE and get the same behavior, unlike what used to happen. We did not previously mention the angle-bracket syntax when describing POSIX timezone specifications; add some documentation so that people can figure out what these strings do. (There's still quite a lot of undocumented functionality there, but anybody who really cares can go read the POSIX spec to find out about it. In practice most people seem to prefer Olsen-style city names anyway.)	2013-11-01 13:57:31 -04:00
Tom Lane	631dc390f4	Fix some odd behaviors when using a SQL-style simple GMT offset timezone. Formerly, when using a SQL-spec timezone setting with a fixed GMT offset (called a "brute force" timezone in the code), the session_timezone variable was not updated to match the nominal timezone; rather, all code was expected to ignore session_timezone if HasCTZSet was true. This is of course obviously fragile, though a search of the code finds only timeofday() failing to honor the rule. A bigger problem was that DetermineTimeZoneOffset() supposed that if its pg_tz parameter was pointer-equal to session_timezone, then HasCTZSet should override the parameter. This would cause datetime input containing an explicit zone name to be treated as referencing the brute-force zone instead, if the zone name happened to match the session timezone that had prevailed before installing the brute-force zone setting (as reported in bug #8572). The same malady could affect AT TIME ZONE operators. To fix, set up session_timezone so that it matches the brute-force zone specification, which we can do using the POSIX timezone definition syntax "<abbrev>offset", and get rid of the bogus lookaside check in DetermineTimeZoneOffset(). Aside from fixing the erroneous behavior in datetime parsing and AT TIME ZONE, this will cause the timeofday() function to print its result in the user-requested time zone rather than some previously-set zone. It might also affect results in third-party extensions, if there are any that make use of session_timezone without considering HasCTZSet, but in all cases the new behavior should be saner than before. Back-patch to all supported branches.	2013-11-01 12:13:18 -04:00
Tom Lane	6756c8ad30	Fix old typo in comment. NFAs have children, but their individual states don't.	2013-10-29 15:34:18 -04:00
Robert Haas	d2aecaea15	Modify dynamic shared memory code to use Size rather than uint64. This is more consistent with what we do elsewhere.	2013-10-28 12:12:06 -04:00
Noah Misch	c50b7c09d8	Add large object functions catering to SQL callers. With these, one need no longer manipulate large object descriptors and extract numeric constants from header files in order to read and write large object contents from SQL. Pavel Stehule, reviewed by Rushabh Lathia.	2013-10-27 22:56:54 -04:00
Tom Lane	3147acd63e	Use improved vsnprintf calling logic in more places. When we are using a C99-compliant vsnprintf implementation (which should be most places, these days) it is worth the trouble to make use of its report of how large the buffer needs to be to succeed. This patch adjusts stringinfo.c and some miscellaneous usages in pg_dump to do that, relying on the logic recently added in libpgcommon's psprintf.c. Since these places want to know the number of bytes written once we succeed, modify the API of pvsnprintf() to report that. There remains near-duplicate logic in pqexpbuffer.c, but since that code is in libpq, psprintf.c's approach of exit()-on-error isn't appropriate for use there. Also note that I didn't bother touching the multitude of places that call (v)snprintf without any attempt to provide a resizable buffer. Release-note-worthy incompatibility: the API of appendStringInfoVA() changed. If there's any third-party code that's calling that directly, it will need tweaking along the same lines as in this patch. David Rowley and Tom Lane	2013-10-24 21:43:57 -04:00
Tom Lane	2c66f9924c	Replace pg_asprintf() with psprintf(). This eliminates an awkward coding pattern that's also unnecessarily inconsistent with backend coding. psprintf() is now the thing to use everywhere.	2013-10-22 19:40:26 -04:00
Tom Lane	09a89cb5fc	Get rid of use of asprintf() in favor of a more portable implementation. asprintf(), aside from not being particularly portable, has a fundamentally badly-designed API; the psprintf() function that was added in passing in the previous patch has a much better API choice. Moreover, the NetBSD implementation that was borrowed for the previous patch doesn't work with non-C99-compliant vsnprintf, which is something we still have to cope with on some platforms; and it depends on va_copy which isn't all that portable either. Get rid of that code in favor of an implementation similar to what we've used for many years in stringinfo.c. Also, move it into libpgcommon since it's not really libpgport material. I think this patch will be enough to turn the buildfarm green again, but there's still cosmetic work left to do, namely get rid of pg_asprintf() in favor of using psprintf(). That will come in a followon patch.	2013-10-22 18:42:13 -04:00
Noah Misch	709170b790	Consistently use unsigned arithmetic for alignment calculations. This avoids an assumption about the signed number representation. It is anticipated to have no functional changes on supported configurations; many two's complement assumptions remain elsewhere. Per a suggestion from Andres Freund.	2013-10-20 21:04:52 -04:00
Robert Haas	cab5dc5daf	Allow only some columns of a view to be auto-updateable. Previously, unless all columns were auto-updateable, we wouldn't inserts, updates, or deletes, or at least not without a rule or trigger; now, we'll allow inserts and updates that target only the auto-updateable columns, and deletes even if there are no auto-updateable columns at all provided the view definition is otherwise suitable. Dean Rasheed, reviewed by Marko Tiikkaja	2013-10-18 10:35:36 -04:00
Robert Haas	523beaa11b	Provide a reliable mechanism for terminating a background worker. Although previously-introduced APIs allow the process that registers a background worker to obtain the worker's PID, there's no way to prevent a worker that is not currently running from being restarted. This patch introduces a new API TerminateBackgroundWorker() that prevents the background worker from being restarted, terminates it if it is currently running, and causes it to be unregistered if or when it is not running. Patch by me. Review by Michael Paquier and KaiGai Kohei.	2013-10-18 10:23:11 -04:00
Peter Eisentraut	c2316dcda1	Fix for lack of va_copy() on certain Windows versions Based-on-patch-by: David Rowley <dgrowleyml@gmail.com>	2013-10-18 09:54:50 -04:00
Robert Haas	ea91a6be89	Remove IRIX port. Development of IRIX has been discontinued, and support is scheduled to end in December of 2013. Therefore, there will be no supported versions of this operating system by the time PostgreSQL 9.4 is released. Furthermore, we have no maintainer for this platform.	2013-10-18 08:14:21 -04:00
Robert Haas	81051a86bc	Remove spinlock support for SINIX, Sun3, and NS32K. All of these platforms are very much obsolete. As far as I can determine, the last version of SINIX, later renamed Reliant, occurred some time between 2002 and 2005. The last release of SunOS that would run on a sun3 was released in November of 1991; the last release of OpenBSD which supported that platform was in 2001. The highest clock speed of any processor in the family was 25MHz. The NS32K (national semiconductor 320xx) architecture was retired in 1990. Support can be re-added if a maintainer emerges for any of these platforms, but it seems unlikely. Reviewed by Andres Freund.	2013-10-17 12:02:05 -04:00
Peter Eisentraut	5b6d08cd29	Add use of asprintf() Add asprintf(), pg_asprintf(), and psprintf() to simplify string allocation and composition. Replacement implementations taken from NetBSD. Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Asif Naeem <anaeem.it@gmail.com>	2013-10-13 00:09:18 -04:00
Alvaro Herrera	31cf1a1a43	Rework SSL renegotiation code The existing renegotiation code was home for several bugs: it might erroneously report that renegotiation had failed; it might try to execute another renegotiation while the previous one was pending; it failed to terminate the connection if the renegotiation never actually took place; if a renegotiation was started, the byte count was reset, even if the renegotiation wasn't completed (this isn't good from a security perspective because it means continuing to use a session that should be considered compromised due to volume of data transferred.) The new code is structured to avoid these pitfalls: renegotiation is started a little earlier than the limit has expired; the handshake sequence is retried until it has actually returned successfully, and no more than that, but if it fails too many times, the connection is closed. The byte count is reset only when the renegotiation has succeeded, and if the renegotiation byte count limit expires, the connection is terminated. This commit only touches the master branch, because some of the changes are controversial. If everything goes well, a back-patch might be considered. Per discussion started by message 20130710212017.GB4941@eldon.alvh.no-ip.org	2013-10-10 23:45:20 -03:00
Peter Eisentraut	5dd41f3574	Remove maintainer-check target, fold into normal build make maintainer-check was obscure and rarely called in practice, and many breakages were missed. Fold everything that make maintainer-check used to do into the normal build. Specifically: - Call duplicate_oids when genbki.pl is called. - Check for tabs in SGML files when the documentation is built. - Run msgfmt with the -c option during the regular build. Add an additional configure check to see whether we are using the GNU version. (make maintainer-check probably used to fail with non-GNU msgfmt.) Keep maintainer-check as around as phony target for the time being in case anyone is calling it. But it won't do anything anymore.	2013-10-10 20:11:56 -04:00
Peter Eisentraut	3dc543b3d8	Replace duplicate_oids with Perl implementation It is more portable, more robust, and more readable. From: Andrew Dunstan <andrew@dunslane.net>	2013-10-10 20:09:42 -04:00
Andrew Dunstan	4d212bac17	json_typeof function. Andrew Tipton.	2013-10-10 12:21:59 -04:00
Peter Eisentraut	261c7d4b65	Revive line type Change the input/output format to {A,B,C}, to match the internal representation. Complete the implementations of line_in, line_out, line_recv, line_send. Remove comments and error messages about the line type not being implemented. Add regression tests for existing line operators and functions. Reviewed-by: rui hua <365507506hua@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>	2013-10-09 22:34:38 -04:00
Robert Haas	0ac5e5a7e1	Allow dynamic allocation of shared memory segments. Patch by myself and Amit Kapila. Design help from Noah Misch. Review by Andres Freund.	2013-10-09 21:05:02 -04:00
Kevin Grittner	f566515192	Add record_image_ops opclass for matview concurrent refresh. REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview containing a column of a type without a default btree operator class. It also did not produce results consistent with a non- concurrent REFRESH or a normal view if any column was of a type which allowed user-visible differences between values which compared as equal according to the type's default btree opclass. Concurrent matview refresh was modified to use the new operators to solve these problems. Documentation was added for record comparison, both for the default btree operator class for record, and the newly added operators. Regression tests now check for proper behavior both for a matview with a box column and a matview containing a citext column. Reviewed by Steve Singer, who suggested some of the doc language.	2013-10-09 14:26:09 -05:00
Bruce Momjian	ee1e5662d8	Auto-tune effective_cache size to be 4x shared buffers	2013-10-08 12:12:24 -04:00
Heikki Linnakangas	5962519b36	TYPEALIGN doesn't work on int64 on 32-bit platforms. The TYPEALIGN macro, and the related ones like MAXALIGN, don't work with values larger than intptr_t, because TYPEALIGN casts the argument to intptr_t to do the arithmetic. That's not a problem when dealing with pointers or lengths or offsets related to pointers, but the XLogInsert scaling patch added a call to MAXALIGN with an XLogRecPtr argument. To fix, add wider variants of the macros, called TYPEALIGN64 and MAXALIGN64, which are just like the existing variants but work with uint64 instead of intptr_t. Report and patch by David Rowley, analysis by Andres Freund.	2013-10-08 01:59:57 +03:00
Kevin Grittner	c01262a824	Eliminate xmin from hash tag for predicate locks on heap tuples. If a tuple was frozen while its predicate locks mattered, read-write dependencies could be missed, resulting in failure to detect conflicts which could lead to anomalies in committed serializable transactions. This field was added to the tag when we still thought that it was necessary to carry locks forward to a new version of an updated row. That was later proven to be unnecessary, which allowed simplification of the code, but elimination of xmin from the tag was missed at the time. Per report and analysis by Heikki Linnakangas. Backpatch to 9.1.	2013-10-07 14:16:54 -05:00
Bruce Momjian	a54141aebc	Issue error on SET outside transaction block in some cases Issue error for SET LOCAL/CONSTRAINTS/TRANSACTION outside a transaction block, as they have no effect. Per suggestion from Morten Hustveit	2013-10-04 13:50:28 -04:00
Robert Haas	d90ced8bb2	Add DISCARD SEQUENCES command. DISCARD ALL will now discard cached sequence information, as well. Fabrízio de Royes Mello, reviewed by Zoltán Böszörményi, with some further tweaks by me.	2013-10-03 16:23:31 -04:00
Heikki Linnakangas	c2b175b948	Minor GIN code refactoring. It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.	2013-10-03 11:51:31 +03:00
Alvaro Herrera	15732b34e8	Add WaitForLockers in lmgr, refactoring index.c code This is in support of a future REINDEX CONCURRENTLY feature. Michael Paquier	2013-10-01 17:57:01 -03:00
Alvaro Herrera	1247ea28cb	Remove `proc` argument from LockCheckConflicts This has been unused since commit `8563ccae2c`. Noted by Antonin Houska	2013-09-16 22:14:14 -03:00
Alvaro Herrera	dd778e9d88	Rename various "freeze multixact" variables It seems to make more sense to use "cutoff multixact" terminology throughout the backend code; "freeze" is associated with replacing of an Xid with FrozenTransactionId, which is not what we do for MultiXactIds. Andres Freund Some adjustments by Álvaro Herrera	2013-09-16 15:47:31 -03:00
Bruce Momjian	0f59f4a645	Add comment for VARSIZE_ANY_EXHDR macro Gurjeet Singh	2013-09-10 20:18:53 -04:00
Fujii Masao	71129b6fc5	Remove leftover function prototype. The prototype for inval_twophase_postcommit wasn't removed when it's definition was removed in `efc16ea520` / the initial HS commit. Andres Freund	2013-09-11 01:32:24 +09:00
Robert Haas	71901ab6da	Introduce InvalidCommandId. This allows a 32-bit field to represent an optional command ID without a separate flag bit. Andres Freund	2013-09-09 16:25:29 -04:00
Kevin Grittner	277607d600	Eliminate pg_rewrite.ev_attr column and related dead code. Commit `95ef6a3448` removed the ability to create rules on an individual column as of 7.3, but left some residual code which has since been useless. This cleans up that dead code without any change in behavior other than dropping the useless column from the catalog.	2013-09-05 14:03:43 -05:00
Heikki Linnakangas	20cb18db46	Make catalog cache hash tables resizeable. If the hash table backing a catalog cache becomes too full (fillfactor > 2), enlarge it. A new buckets array, double the size of the old, is allocated, and all entries in the old hash are moved to the right bucket in the new hash. This has two benefits. First, cache lookups don't get so expensive when there are lots of entries in a cache, like if you access hundreds of thousands of tables. Second, we can make the (initial) sizes of the caches much smaller, which saves memory. This patch dials down the initial sizes of the catcaches. The new sizes are chosen so that a backend that only runs a few basic queries still won't need to enlarge any of them.	2013-09-05 20:20:03 +03:00
Jeff Davis	b1892aaeaa	Revert WAL posix_fallocate() patches. This reverts commit `269e780822` and commit `5b571bb8c8`. Unfortunately, the initial patch had insufficient performance testing, and resulted in a regression. Per report by Thom Brown.	2013-09-04 23:43:41 -07:00
Heikki Linnakangas	375d8526f2	Keep heavily-contended fields in XLogCtlInsert on different cache lines. Performance testing shows that if the insertpos_lck spinlock and the fields that it protects are on the same cache line with other variables that are frequently accessed, the false sharing can hurt performance a lot. Keep them apart by adding some padding.	2013-09-04 23:14:33 +03:00
Robert Haas	cc52d5b33f	Expose fsync_fname as a public API. Andres Freund	2013-09-04 11:15:00 -04:00
Tom Lane	0c66a22377	Update comments concerning PGC_S_TEST. This GUC context value was once only used by ALTER DATABASE SET and ALTER USER SET. That's not true anymore, though, so rewrite the comments to be a bit more general. Patch in HEAD only, since this is just an internal documentation issue.	2013-09-03 18:56:22 -04:00
Tom Lane	0d3f4406df	Allow aggregate functions to be VARIADIC. There's no inherent reason why an aggregate function can't be variadic (even VARIADIC ANY) if its transition function can handle the case. Indeed, this patch to add the feature touches none of the planner or executor, and little of the parser; the main missing stuff was DDL and pg_dump support. It is true that variadic aggregates can create the same sort of ambiguity about parameters versus ORDER BY keys that was complained of when we (briefly) had both one- and two-argument forms of string_agg(). However, the policy formed in response to that discussion only said that we'd not create any built-in aggregates with varying numbers of arguments, not that we shouldn't allow users to do it. So the logical extension of that is we can allow users to make variadic aggregates as long as we're wary about shipping any such in core. In passing, this patch allows aggregate function arguments to be named, to the extent of remembering the names in pg_proc and dumping them in pg_dump. You can't yet call an aggregate using named-parameter notation. That seems like a likely future extension, but it'll take some work, and it's not what this patch is really about. Likewise, there's still some work needed to make window functions handle VARIADIC fully, but I left that for another day. initdb forced because of new aggvariadic field in Aggref parse nodes.	2013-09-03 17:08:46 -04:00
Alvaro Herrera	8b290f3115	Update obsolete comment	2013-09-03 16:53:16 -04:00
Tom Lane	8e2b71d2d0	Reset the binary heap in MergeAppend rescans. Failing to do so can cause queries to return wrong data, error out or crash. This requires adding a new binaryheap_reset() method to binaryheap.c, but that probably should have been there anyway. Per bug #8410 from Terje Elde. Diagnosis and patch by Andres Freund.	2013-08-30 19:15:21 -04:00
Heikki Linnakangas	b03d196be0	Use a non-locking initial test in TAS_SPIN on x86_64. Testing done in 2011 by Tom Lane concluded that this is a win on Intel Xeons and AMD Opterons, but it was not changed back then, because of an old comment in tas() that suggested that it's a huge loss on older Opterons. However, didn't have separate TAS() and TAS_SPIN() macros back then, so the comment referred to doing a non-locked initial test even on the first access, in uncontended case. I don't have access to older Opterons, but I'm pretty sure that doing an initial unlocked test is unlikely to be a loss while spinning, even though it might be for the first access. We probably should do the same on 32-bit x86, but I'm afraid of changing it without any testing. Hence just add a note to the x86 implementation suggesting that we probably should do the same there.	2013-08-29 14:04:37 +03:00
Robert Haas	090d0f2050	Allow discovery of whether a dynamic background worker is running. Using the infrastructure provided by this patch, it's possible either to wait for the startup of a dynamically-registered background worker, or to poll the status of such a worker without waiting. In either case, the current PID of the worker process can also be obtained. As usual, worker_spi is updated to demonstrate the new functionality. Patch by me. Review by Andres Freund.	2013-08-28 14:08:13 -04:00
Tom Lane	fcf9ecad57	In locate_grouping_columns(), don't expect an exact match of Var typmods. It's possible that inlining of SQL functions (or perhaps other changes?) has exposed typmod information not known at parse time. In such cases, Vars generated by query_planner might have valid typmod values while the original grouping columns only have typmod -1. This isn't a semantic problem since the behavior of grouping only depends on type not typmod, but it breaks locate_grouping_columns' use of tlist_member to locate the matching entry in query_planner's result tlist. We can fix this without an excessive amount of new code or complexity by relying on the fact that locate_grouping_columns only gets called when make_subplanTargetList has set need_tlist_eval == false, and that can only happen if all the grouping columns are simple Vars. Therefore we only need to search the sub_tlist for a matching Var, and we can reasonably define a "match" as being a match of the Var identity fields varno/varattno/varlevelsup. The code still Asserts that vartype matches, but ignores vartypmod. Per bug #8393 from Evan Martin. The added regression test case is basically the same as his example. This has been broken for a very long time, so back-patch to all supported branches.	2013-08-23 17:30:53 -04:00
Andrew Dunstan	73838b5251	Unconditionally use the WSA equivalents of Socket error constants. This change will only apply to mingw compilers, and has been found necessary by late versions of the mingw-w64 compiler. It's the same as what is done elsewhere for the Microsoft compilers. If this doesn't upset older compilers in the buildfarm, it will be backpatched to 9.1. Problem reported by Michael Cronenworth, although not his patch.	2013-08-20 14:11:36 -04:00
Alvaro Herrera	78e1220104	Fix pg_upgrade failure from servers older than 9.3 When upgrading from servers of versions 9.2 and older, and MultiXactIds have been used in the old server beyond the first page (that is, 2048 multis or more in the default 8kB-page build), pg_upgrade would set the next multixact offset to use beyond what has been allocated in the new cluster. This would cause a failure the first time the new cluster needs to use this value, because the pg_multixact/offsets/ file wouldn't exist or wouldn't be large enough. To fix, ensure that the transient server instances launched by pg_upgrade extend the file as necessary. Per report from Jesse Denardo in CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com	2013-08-19 12:56:18 -04:00
Tom Lane	9e7e29c75a	Fix planner problems with LATERAL references in PlaceHolderVars. The planner largely failed to consider the possibility that a PlaceHolderVar's expression might contain a lateral reference to a Var coming from somewhere outside the PHV's syntactic scope. We had a previous report of a problem in this area, which I tried to fix in a quick-hack way in commit `4da6439bd8`, but Antonin Houska pointed out that there were still some problems, and investigation turned up other issues. This patch largely reverts that commit in favor of a more thoroughly thought-through solution. The new theory is that a PHV's ph_eval_at level cannot be higher than its original syntactic level. If it contains lateral references, those don't change the ph_eval_at level, but rather they create a lateral-reference requirement for the ph_eval_at join relation. The code in joinpath.c needs to handle that. Another issue is that createplan.c wasn't handling nested PlaceHolderVars properly. In passing, push knowledge of lateral-reference checks for join clauses into join_clause_is_movable_to. This is mainly so that FDWs don't need to deal with it. This patch doesn't fix the original join-qual-placement problem reported by Jeremy Evans (and indeed, one of the new regression test cases shows the wrong answer because of that). But the PlaceHolderVar problems need to be fixed before that issue can be addressed, so committing this separately seems reasonable.	2013-08-17 20:22:37 -04:00
Robert Haas	2dee7998f9	Move more bgworker code to bgworker.c; also, some renaming. Per discussion on pgsql-hackers. Michael Paquier, slightly modified by me. Original suggestion from Amit Kapila.	2013-08-16 15:31:28 -04:00
Tom Lane	1b1d3d92c3	Remove ph_may_need from PlaceHolderInfo, with attendant simplifications. The planner logic that attempted to make a preliminary estimate of the ph_needed levels for PlaceHolderVars seems to be completely broken by lateral references. Fortunately, the potential join order optimization that this code supported seems to be of relatively little value in practice; so let's just get rid of it rather than trying to fix it. Getting rid of this allows fairly substantial simplifications in placeholder.c, too, so planning in such cases should be a bit faster. Issue noted while pursuing bugs reported by Jeremy Evans and Antonin Houska, though this doesn't in itself fix either of their reported cases. What this does do is prevent an Assert crash in the kind of query illustrated by the added regression test. (I'm not sure that the plan for that query is stable enough across platforms to be usable as a regression test output ... but we'll soon find out from the buildfarm.) Back-patch to 9.3. The problem case can't arise without LATERAL, so no need to touch older branches.	2013-08-14 18:38:47 -04:00
Tom Lane	3d5282c6f0	Emit a log message if output is about to be redirected away from stderr. We've seen multiple cases of people looking at the postmaster's original stderr output to try to diagnose problems, not realizing/remembering that their logging configuration is set up to send log messages somewhere else. This seems particularly likely to happen in prepackaged distributions, since many packagers patch the code to change the factory-standard logging configuration to something more in line with their platform conventions. In hopes of reducing confusion, emit a LOG message about this at the point in startup where we are about to switch log output away from the original stderr, providing a pointer to where to look instead. This message will appear as the last thing in the original stderr output. (We might later also try to emit such link messages when logging parameters are changed on-the-fly; but that case seems to be both noticeably harder to do nicely, and much less frequently a problem in practice.) Per discussion, back-patch to 9.3 but not further.	2013-08-13 15:24:52 -04:00
Tom Lane	3ced8837db	Simplify query_planner's API by having it return the top-level RelOptInfo. Formerly, query_planner returned one or possibly two Paths for the topmost join relation, so that grouping_planner didn't see the join RelOptInfo (at least not directly; it didn't have any hesitation about examining cheapest_path->parent, though). However, correct selection of the Paths involved a significant amount of coupling between query_planner and grouping_planner, a problem which has gotten worse over time. It seems best to give up on this API choice and instead return the topmost RelOptInfo explicitly. Then grouping_planner can pull out the Paths it wants from the rel's path list. In this way we can remove all knowledge of grouping behaviors from query_planner. The only real benefit of the old way is that in the case of an empty FROM clause, we never made any RelOptInfos at all, just a Path. Now we have to gin up a dummy RelOptInfo to represent the empty FROM clause. That's not a very big deal though. While at it, simplify query_planner's API a bit more by having the caller set up root->tuple_fraction and root->limit_tuples, rather than passing those values as separate parameters. Since query_planner no longer does anything with either value, requiring it to fill the PlannerInfo fields seemed pretty arbitrary. This patch just rearranges code; it doesn't (intentionally) change any behaviors. Followup patches will do more interesting things.	2013-08-05 15:01:09 -04:00
Alvaro Herrera	88c556680c	Fix crash in error report of invalid tuple lock My tweak of these error messages in commit `c359a1b082` contained the thinko that a query would always have rowMarks set for a query containing a locking clause. Not so: when declaring a cursor, for instance, rowMarks isn't set at the point we're checking, so we'd be dereferencing a NULL pointer. The fix is to pass the lock strength to the function raising the error, instead of trying to reverse-engineer it. The result not only is more robust, but it also seems cleaner overall. Per report from Robert Haas.	2013-08-02 13:18:37 -04:00
Robert Haas	149e38e5ee	Assorted bgworker-related comment fixes. Per gripes by Amit Kapila.	2013-08-01 12:20:31 -04:00
Robert Haas	813fb03155	Remove SnapshotNow and HeapTupleSatisfiesNow. We now use MVCC catalog scans, and, per discussion, have eliminated all other remaining uses of SnapshotNow, so that we can now get rid of it. This will break third-party code which is still using it, which is intentional, as we want such code to be updated to do things the new way.	2013-08-01 10:46:19 -04:00
Stephen Frost	ddef1a39c6	Allow a context to be passed in for error handling As pointed out by Tom Lane, we can allow other users of the error handler callbacks to provide their own memory context by adding the context to use to ErrorData and using that instead of explicitly using ErrorContext. This then allows GetErrorContextStack() to be called from inside exception handlers, so modify plpgsql to take advantage of that and add an associated regression test for it.	2013-08-01 01:07:20 -04:00
Alvaro Herrera	3142cf6dd5	Fix a couple of inconsequential typos in new header	2013-07-31 17:57:00 -04:00
Greg Stark	c62736cc37	Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF) Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost	2013-07-29 16:38:01 +01:00
Tom Lane	3d13623d75	Prevent leakage of SPI tuple tables during subtransaction abort. plpgsql often just remembers SPI-result tuple tables in local variables, and has no mechanism for freeing them if an ereport(ERROR) causes an escape out of the execution function whose local variable it is. In the original coding, that wasn't a problem because the tuple table would be cleaned up when the function's SPI context went away during transaction abort. However, once plpgsql grew the ability to trap exceptions, repeated trapping of errors within a function could result in significant intra-function-call memory leakage, as illustrated in bug #8279 from Chad Wagner. We could fix this locally in plpgsql with a bunch of PG_TRY/PG_CATCH coding, but that would be tedious, probably slow, and prone to bugs of omission; moreover it would do nothing for similar risks elsewhere. What seems like a better plan is to make SPI itself responsible for freeing tuple tables at subtransaction abort. This patch attacks the problem that way, keeping a list of live tuple tables within each SPI function context. Currently, such freeing is automatic for tuple tables made within the failed subtransaction. We might later add a SPI call to mark a tuple table as not to be freed this way, allowing callers to opt out; but until someone exhibits a clear use-case for such behavior, it doesn't seem worth bothering. A very useful side-effect of this change is that SPI_freetuptable() can now defend itself against bad calls, such as duplicate free requests; this should make things more robust in many places. (In particular, this reduces the risks involved if a third-party extension contains now-redundant SPI_freetuptable() calls in error cleanup code.) Even though the leakage problem is of long standing, it seems imprudent to back-patch this into stable branches, since it does represent an API semantics change for SPI users. We'll patch this in 9.3, but live with the leakage in older branches.	2013-07-25 16:46:14 -04:00
Stephen Frost	8312832567	Add GET DIAGNOSTICS ... PG_CONTEXT in PL/PgSQL This adds the ability to get the call stack as a string from within a PL/PgSQL function, which can be handy for logging to a table, or to include in a useful message to an end-user. Pavel Stehule, reviewed by Rushabh Lathia and rather heavily whacked around by Stephen Frost.	2013-07-24 18:53:27 -04:00
Tom Lane	fa2fad3c06	Improve ilist.h's support for deletion of slist elements during iteration. Previously one had to use slist_delete(), implying an additional scan of the list, making this infrastructure considerably less efficient than traditional Lists when deletion of element(s) in a long list is needed. Modify the slist_foreach_modify() macro to support deleting the current element in O(1) time, by keeping a "prev" pointer in addition to "cur" and "next". Although this makes iteration with this macro a bit slower, no real harm is done, since in any scenario where you're not going to delete the current list element you might as well just use slist_foreach instead. Improve the comments about when to use each macro. Back-patch to 9.3 so that we'll have consistent semantics in all branches that provide ilist.h. Note this is an ABI break for callers of slist_foreach_modify(). Andres Freund and Tom Lane	2013-07-24 17:42:34 -04:00
Tom Lane	10a509d829	Move strip_implicit_coercions() from optimizer to nodeFuncs.c. Use of this function has spread into the parser and rewriter, so it seems like time to pull it out of the optimizer and put it into the more central nodeFuncs module. This eliminates the need to #include optimizer/clauses.h in most of the calling files, demonstrating that this function was indeed a bit outside the normal code reference patterns.	2013-07-23 18:21:19 -04:00
Tom Lane	a7cd853b75	Change post-rewriter representation of dropped columns in joinaliasvars. It's possible to drop a column from an input table of a JOIN clause in a view, if that column is nowhere actually referenced in the view. But it will still be there in the JOIN clause's joinaliasvars list. We used to replace such entries with NULL Const nodes, which is handy for generation of RowExpr expansion of a whole-row reference to the view. The trouble with that is that it can't be distinguished from the situation after subquery pull-up of a constant subquery output expression below the JOIN. Instead, replace such joinaliasvars with null pointers (empty expression trees), which can't be confused with pulled-up expressions. expandRTE() still emits the old convention, though, for convenience of RowExpr generation and to reduce the risk of breaking extension code. In HEAD and 9.3, this patch also fixes a problem with some new code in ruleutils.c that was failing to cope with implicitly-casted joinaliasvars entries, as per recent report from Feike Steenbergen. That oversight was because of an inadequate description of the data structure in parsenodes.h, which I've now corrected. There were some pre-existing oversights of the same ilk elsewhere, which I believe are now all fixed.	2013-07-23 16:23:45 -04:00
Alvaro Herrera	c359a1b082	Tweak FOR UPDATE/SHARE error message wording (again) In commit `0ac5ad5134` I changed some error messages from "FOR UPDATE/SHARE" to a rather long gobbledygook which nobody liked. Then, in commit `cb9b66d31` I changed them again, but the alternative chosen there was deemed suboptimal by Peter Eisentraut, who in message 1373937980.20441.8.camel@vanquo.pezone.net proposed an alternative involving a dynamically-constructed string based on the actual locking strength specified in the SQL command. This patch implements that suggestion.	2013-07-23 14:03:09 -04:00
Robert Haas	f40a318eea	Remove bgw_sighup and bgw_sigterm. Per discussion on pgsql-hackers, these aren't really needed. Interim versions of the background worker patch had the worker starting with signals already unblocked, which would have made this necessary. But the final version does not, so we don't really need it; and it doesn't work well with the new facility for starting dynamic background workers, so just rip it out. Also per discussion on pgsql-hackers, back-patch this change to 9.3. It's best to get the API break out of the way before we do an official release of this facility, to avoid more pain for extension authors later.	2013-07-22 14:13:00 -04:00
Robert Haas	0518eceec3	Adjust HeapTupleSatisfies* routines to take a HeapTuple. Previously, these functions took a HeapTupleHeader, but upcoming patches for logical replication will introduce new a new snapshot type under which the tuple's TID will be used to lookup (CMIN, CMAX) for visibility determination purposes. This makes that information available. Code churn is minimal since HeapTupleSatisfiesVisibility took the HeapTuple anyway, and deferenced it before calling the satisfies function. Independently of logical replication, this allows t_tableOid and t_self to be cross-checked via assertions in tqual.c. This seems like a useful way to make sure that all callers are setting these values properly, which has been previously put forward as desirable. Andres Freund, reviewed by Álvaro Herrera	2013-07-22 13:38:44 -04:00
Robert Haas	f01d1ae3a1	Add infrastructure for mapping relfilenodes to relation OIDs. Future patches are expected to introduce logical replication that works by decoding WAL. WAL contains relfilenodes rather than relation OIDs, so this infrastructure will be needed to find the relation OID based on WAL contents. If logical replication does not make it into this release, we probably should consider reverting this, since it will add some overhead to DDL operations that create new relations. One additional index insert per pg_class row is not a large overhead, but it's more than zero. Another way of meeting the needs of logical replication would be to the relation OID to WAL, but that would burden DML operations, not only DDL. Andres Freund, with some changes by me. Design review, in earlier versions, by Álvaro Herrera.	2013-07-22 11:09:10 -04:00
Peter Eisentraut	ff41a5de09	Clean up new JSON API typedefs The new JSON API uses a bit of an unusual typedef scheme, where for example OkeysState is a pointer to okeysState. And that's not applied consistently either. Change that to the more usual PostgreSQL style where struct typedefs are upper case, and use pointers explicitly.	2013-07-20 06:38:31 -04:00
Stephen Frost	4cbe3ac3e8	WITH CHECK OPTION support for auto-updatable VIEWs For simple views which are automatically updatable, this patch allows the user to specify what level of checking should be done on records being inserted or updated. For 'LOCAL CHECK', new tuples are validated against the conditionals of the view they are being inserted into, while for 'CASCADED CHECK' the new tuples are validated against the conditionals for all views involved (from the top down). This option is part of the SQL specification. Dean Rasheed, reviewed by Pavel Stehule	2013-07-18 17:10:16 -04:00
Andrew Dunstan	d26888bc4d	Move checking an explicit VARIADIC "any" argument into the parser. This is more efficient and simpler . It does mean that an untyped NULL can no longer be used in such cases, which should be mentioned in Release Notes, but doesn't seem a terrible loss. The workaround is to cast the NULL to some array type. Pavel Stehule, reviewed by Jeevan Chalke.	2013-07-18 11:52:12 -04:00
Tom Lane	89779bf2c8	Fix a few problems in barrier.h. On HPPA, implement pg_memory_barrier() as pg_compiler_barrier(), which should be correct since this arch doesn't do memory access reordering, and is anyway better than the completely-nonfunctional-on-this-arch dummy_spinlock code. (But note this patch only fixes things for gcc, not for builds with HP's compiler.) Also, fix incorrect default definition of pg_memory_barrier as a macro requiring an argument. Also, fix incorrect spelling of "#elif" as "#else if" in icc code path (spotted by pgindent). This doesn't come close to fixing all of the functional and stylistic deficiencies in barrier.h, but at least it un-breaks my personal build. Now that we're actually using barriers in the code, this file is going to need some serious attention.	2013-07-17 18:38:20 -04:00
Noah Misch	b560ec1b0d	Implement the FILTER clause for aggregate function calls. This is SQL-standard with a few extensions, namely support for subqueries and outer references in clause expressions. catversion bump due to change in Aggref and WindowFunc. David Fetter, reviewed by Dean Rasheed.	2013-07-16 20:15:36 -04:00
Kevin Grittner	cc1965a99b	Add support for REFRESH MATERIALIZED VIEW CONCURRENTLY. This allows reads to continue without any blocking while a REFRESH runs. The new data appears atomically as part of transaction commit. Review questioned the Assert that a matview was not a system relation. This will be addressed separately. Reviewed by Hitoshi Harada, Robert Haas, Andres Freund. Merged after review with security patch `f3ab5d4`.	2013-07-16 12:55:44 -05:00
Robert Haas	7f7485a0cd	Allow background workers to be started dynamically. There is a new API, RegisterDynamicBackgroundWorker, which allows an ordinary user backend to register a new background writer during normal running. This means that it's no longer necessary for all background workers to be registered during processing of shared_preload_libraries, although the option of registering workers at that time remains available. When a background worker exits and will not be restarted, the slot previously used by that background worker is automatically released and becomes available for reuse. Slots used by background workers that are configured for automatic restart can't (yet) be released without shutting down the system. This commit adds a new source file, bgworker.c, and moves some of the existing control logic for background workers there. Previously, there was little enough logic that it made sense to keep everything in postmaster.c, but not any more. This commit also makes the worker_spi contrib module into an extension and adds a new function, worker_spi_launch, which can be used to demonstrate the new facility.	2013-07-16 13:02:15 -04:00
Peter Eisentraut	070518ddab	Add session_preload_libraries configuration parameter This is like shared_preload_libraries except that it takes effect at backend start and can be changed without a full postmaster restart. It is like local_preload_libraries except that it is still only settable by a superuser. This can be a better way to load modules such as auto_explain. Since there are now three preload parameters, regroup the documentation a bit. Put all parameters into one section, explain common functionality only once, update the descriptions to reflect current and future realities. Reviewed-by: Dimitri Fontaine <dimitri@2ndQuadrant.fr>	2013-07-12 21:23:50 -04:00
Heikki Linnakangas	e5592c61ad	Fix memory barrier support on icc on ia64, 2nd attempt. Itanium doesn't have the mfence instruction - that's a 386 thing. Use the "mf" instruction instead. This reverts the previous commit to add "#include <emmintrinsic.h>"; the problem was not with a missing #include.	2013-07-09 11:34:18 +03:00
Heikki Linnakangas	6052bceba5	Add #include needed for _mm_mfence() intrinsic on ia64. Hopefully this fixes the build failure on buildfarm member dugong.	2013-07-09 10:29:43 +03:00
Heikki Linnakangas	9a20a9b21b	Improve scalability of WAL insertions. This patch replaces WALInsertLock with a number of WAL insertion slots, allowing multiple backends to insert WAL records to the WAL buffers concurrently. This is particularly useful for parallel loading large amounts of data on a system with many CPUs. This has one user-visible change: switching to a new WAL segment with pg_switch_xlog() now fills the remaining unused portion of the segment with zeros. This potentially adds some overhead, but it has been a very common practice by DBA's to clear the "tail" of the segment with an external pg_clearxlogtail utility anyway, to make the WAL files compress better. With this patch, it's no longer necessary to do that. This patch adds a new GUC, xloginsert_slots, to tune the number of WAL insertion slots. Performance testing suggests that the default, 8, works pretty well for all kinds of worklods, but I left the GUC in place to allow others with different hardware to test that easily. We might want to remove that before release. Reviewed by Andres Freund.	2013-07-08 11:23:56 +03:00
Magnus Hagander	5a348fe077	Fix include-guard Looks like a cut/paste error in the original addition of the file. Andres Freund	2013-07-07 13:36:20 +02:00
Jeff Davis	269e780822	Use posix_fallocate() for new WAL files, where available. This function is more efficient than actually writing out zeroes to the new file, per microbenchmarks by Jon Nelson. Also, it may reduce the likelihood of WAL file fragmentation. Jon Nelson, with review by Andres Freund, Greg Smith and me.	2013-07-05 12:30:29 -07:00
Magnus Hagander	c87ff71f37	Expose the estimation of number of changed tuples since last analyze This value, now pg_stat_all_tables.n_mod_since_analyze, was already tracked and used by autovacuum, but not exposed to the user. Mark Kirkwood, review by Laurenz Albe	2013-07-05 15:10:15 +02:00
Noah Misch	79e0f87a15	Use type "int64" for memory accounting in tuplesort.c/tuplestore.c. Commit `263865a489` switched tuplesort.c and tuplestore.c variables representing memory usage from type "long" to type "Size". This was unnecessary; I thought doing so avoided overflow scenarios on 64-bit Windows, but guc.c already limited work_mem so as to prevent the overflow. It was also incomplete, not touching the logic that assumed a signed data type. Change the affected variables to "int64". This is perfect for 64-bit platforms, and it reduces the need to contemplate platform-specific overflow scenarios. It also puts us close to being able to support work_mem over 2 GiB on 64-bit Windows. Per report from Andres Freund.	2013-07-04 23:13:54 -04:00
Robert Haas	6bc8ef0b7f	Add new GUC, max_worker_processes, limiting number of bgworkers. In 9.3, there's no particular limit on the number of bgworkers; instead, we just count up the number that are actually registered, and use that to set MaxBackends. However, that approach causes problems for Hot Standby, which needs both MaxBackends and the size of the lock table to be the same on the standby as on the master, yet it may not be desirable to run the same bgworkers in both places. 9.3 handles that by failing to notice the problem, which will probably work fine in nearly all cases anyway, but is not theoretically sound. A further problem with simply counting the number of registered workers is that new workers can't be registered without a postmaster restart. This is inconvenient for administrators, since bouncing the postmaster causes an interruption of service. Moreover, there are a number of applications for background processes where, by necessity, the background process must be started on the fly (e.g. parallel query). While this patch doesn't actually make it possible to register new background workers after startup time, it's a necessary prerequisite. Patch by me. Review by Michael Paquier.	2013-07-04 11:24:24 -04:00
Fujii Masao	2ef085d0e6	Get rid of pg_class.reltoastidxid. Treat TOAST index just the same as normal one and get the OID of TOAST index from pg_index but not pg_class.reltoastidxid. This change allows us to handle multiple TOAST indexes, and which is required infrastructure for upcoming REINDEX CONCURRENTLY feature. Patch by Michael Paquier, reviewed by Andres Freund and me.	2013-07-04 03:24:09 +09:00
Peter Eisentraut	d864852685	Add #include to make header file independent	2013-07-02 20:19:52 -04:00
Robert Haas	3682025015	Add support for multiple kinds of external toast datums. To that end, support tags rather than lengths for external datums. As an example of how this can be used, add support or "indirect" tuples which point to some externally allocated memory containing a toast tuple. Similar infrastructure could be used for other purposes, including, perhaps, support for alternative compression algorithms. Andres Freund, reviewed by Hitoshi Harada and myself	2013-07-02 13:38:55 -04:00
Robert Haas	568d4138c6	Use an MVCC snapshot, rather than SnapshotNow, for catalog scans. SnapshotNow scans have the undesirable property that, in the face of concurrent updates, the scan can fail to see either the old or the new versions of the row. In many cases, we work around this by requiring DDL operations to hold AccessExclusiveLock on the object being modified; in some cases, the existing locking is inadequate and random failures occur as a result. This commit doesn't change anything related to locking, but will hopefully pave the way to allowing lock strength reductions in the future. The major issue has held us back from making this change in the past is that taking an MVCC snapshot is significantly more expensive than using a static special snapshot such as SnapshotNow. However, testing of various worst-case scenarios reveals that this problem is not severe except under fairly extreme workloads. To mitigate those problems, we avoid retaking the MVCC snapshot for each new scan; instead, we take a new snapshot only when invalidation messages have been processed. The catcache machinery already requires that invalidation messages be sent before releasing the related heavyweight lock; else other backends might rely on locally-cached data rather than scanning the catalog at all. Thus, making snapshot reuse dependent on the same guarantees shouldn't break anything that wasn't already subtly broken. Patch by me. Review by Michael Paquier and Andres Freund.	2013-07-02 09:47:01 -04:00
Robert Haas	0d22987ae9	Add a convenience routine makeFuncCall to reduce duplication. David Fetter and Andrew Gierth, reviewed by Jeevan Chalke	2013-07-01 14:46:54 -04:00
Peter Eisentraut	129759d6a5	Fix cpluspluscheck in checksum code C++ is more picky about comparing signed and unsigned integers.	2013-06-30 10:25:43 -04:00
Heikki Linnakangas	ee6556555b	Inline ginCompareItemPointers function for speed. ginCompareItemPointers function is called heavily in gin index scans - inlining it speeds up some kind of queries a lot.	2013-06-29 12:55:34 +03:00
Simon Riggs	f177cbfe67	ALTER TABLE ... ALTER CONSTRAINT for FKs Allow constraint attributes to be altered, so the default setting of NOT DEFERRABLE can be altered to DEFERRABLE and back. Review by Abhijit Menon-Sen	2013-06-29 00:27:30 +01:00
Robert Haas	5893ffa79c	Make the OVER keyword unreserved. This results in a slightly less specific error message when OVER is used in a context where we don't accept window functions, but per discussion, it's worth it to get the benefit of not needing to reserve this keyword any more. This same refactoring will also let us avoid reserving some other keywords that we expect to add in upcoming patches (specifically, IGNORE, RESPECT, and FILTER). Troels Nielsen, with minor changes by me	2013-06-28 11:11:00 -04:00
Robert Haas	5ee73525d5	Define Trap and TrapMacro even in non-cassert builds. In some cases, the use of these macros may be preferable to Assert() or AssertMacro(), since this way the caller can set the trap message. Andres Freund and Robert Haas	2013-06-28 09:33:34 -04:00
Noah Misch	263865a489	Permit super-MaxAllocSize allocations with MemoryContextAllocHuge(). The MaxAllocSize guard is convenient for most callers, because it reduces the need for careful attention to overflow, data type selection, and the SET_VARSIZE() limit. A handful of callers are happy to navigate those hazards in exchange for the ability to allocate a larger chunk. Introduce MemoryContextAllocHuge() and repalloc_huge(). Use this in tuplesort.c and tuplestore.c, enabling internal sorts of up to INT_MAX tuples, a factor-of-48 increase. In particular, B-tree index builds can now benefit from much-larger maintenance_work_mem settings. Reviewed by Stephen Frost, Simon Riggs and Jeff Janes.	2013-06-27 14:53:57 -04:00
Noah Misch	19085116ee	Cooperate with the Valgrind instrumentation framework. Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its Memcheck tool about the PostgreSQL allocator. This makes Valgrind roughly as sensitive to memory errors involving palloc chunks as it is to memory errors involving malloc chunks. Further client requests in PageAddItem() and printtup() verify that all bits being added to a buffer page or furnished to an output function are predictably-defined. Those tests catch failures of C-language functions to fully initialize the bits of a Datum, which in turn stymie optimizations that rely on _equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to enable these additions. An included "suppression file" silences nominal errors we don't plan to fix. Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.	2013-06-26 20:22:25 -04:00
Noah Misch	5f538ad004	Renovate display of non-ASCII messages on Windows. GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch	2013-06-26 11:17:33 -04:00
Simon Riggs	4f14c86d74	Reverting previous commit, pending investigation of sporadic seg faults from various build farm members.	2013-06-24 21:21:18 +01:00
Simon Riggs	b577a57d41	ALTER TABLE ... ALTER CONSTRAINT for FKs Allow constraint attributes to be altered, so the default setting of NOT DEFERRABLE can be altered to DEFERRABLE and back. Review by Abhijit Menon-Sen	2013-06-24 20:07:41 +01:00
Simon Riggs	1f09121b4e	Ensure no xid gaps during Hot Standby startup In some cases with higher numbers of subtransactions it was possible for us to incorrectly initialize subtrans leading to complaints of missing pages. Bug report by Sergey Konoplev Analysis and fix by Andres Freund	2013-06-23 11:05:02 +01:00
Jeff Davis	b8fd1a09f3	Add buffer_std flag to MarkBufferDirtyHint(). MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.	2013-06-17 08:02:12 -07:00
Tom Lane	5242fefb47	Be consistent about #define'ing configure symbols as "1" not empty. This is just neatnik-ism, since all the tests in the code are #ifdefs, but we shouldn't specify symbols as "Define to 1 ..." and then not actually define them that way.	2013-06-15 14:11:43 -04:00
Tom Lane	58ae1f4577	Stamp HEAD as 9.4devel. Let the hacking begin ...	2013-06-14 14:41:28 -04:00
Tom Lane	e472b92140	Avoid deadlocks during insertion into SP-GiST indexes. SP-GiST's original scheme for avoiding deadlocks during concurrent index insertions doesn't work, as per report from Hailong Li, and there isn't any evident way to make it work completely. We could possibly lock individual inner tuples instead of their whole pages, but preliminary experimentation suggests that the performance penalty would be huge. Instead, if we fail to get a buffer lock while descending the tree, just restart the tree descent altogether. We keep the old tuple positioning rules, though, in hopes of reducing the number of cases where this can happen. Teodor Sigaev, somewhat edited by Tom Lane	2013-06-14 14:26:43 -04:00
Tom Lane	f04216341d	Refactor checksumming code to make it easier to use externally. pg_filedump and other external utility programs are likely to want to be able to check Postgres page checksums. To avoid messy duplication of code, move the checksumming functionality into an exported header file, much as we did awhile back for the CRC code. In passing, get rid of an unportable assumption that a static char[] array will be word-aligned, and do some other minor code beautification.	2013-06-13 22:35:56 -04:00
Noah Misch	813895e4ac	Don't pass oidvector by value. Since the structure ends with a flexible array, doing so truncates any vector having more than one element. New in 9.3, so no back-patch.	2013-06-12 19:50:37 -04:00
Tom Lane	dc3eb56383	Improve updatability checking for views and foreign tables. Extend the FDW API (which we already changed for 9.3) so that an FDW can report whether specific foreign tables are insertable/updatable/deletable. The default assumption continues to be that they're updatable if the relevant executor callback function is supplied by the FDW, but finer granularity is now possible. As a test case, add an "updatable" option to contrib/postgres_fdw. This patch also fixes the information_schema views, which previously did not think that foreign tables were ever updatable, and fixes view_is_auto_updatable() so that a view on a foreign table can be auto-updatable. initdb forced due to changes in information_schema views and the functions they rely on. This is a bit unfortunate to do post-beta1, but if we don't change this now then we'll have another API break for FDWs when we do change it. Dean Rasheed, somewhat editorialized on by Tom Lane	2013-06-12 17:53:33 -04:00
Tom Lane	5c7603c318	Add ARM64 (aarch64) support to s_lock.h. Use the same gcc atomic functions as we do on newer ARM chips. (Basically this is a copy and paste of the __arm__ code block, but omitting the SWPB option since that definitely won't work.) Back-patch to 9.2. The patch would work further back, but we'd also need to update config.guess/config.sub in older branches to make them build out-of-the-box, and there hasn't been demand for it. Mark Salter	2013-06-04 15:42:02 -04:00
Heikki Linnakangas	15386281a6	Put back allow_system_table_mods check in heap_create(). This reverts commit `a475c60367`. Erik Rijkers reported back in January 2013 that after the patch, if you do "pg_dump -t myschema.mytable" to dump a single table, and restore that in a database where myschema does not exist, the table is silently created in pg_catalog instead. That is because pg_dump uses "SET search_path=myschema, pg_catalog" to set schema the table is created in. While allow_system_table_mods is not a very elegant solution to this, we can't leave it as it is, so for now, revert it back to the way it was previously.	2013-06-03 17:22:31 +03:00
Bruce Momjian	9af4159fce	pgindent run for release 9.3 This is the first run of the Perl-based pgindent script. Also update pgindent instructions.	2013-05-29 16:58:43 -04:00
Robert Haas	6eb971bd64	Fix typo in comment. Pavan Deolasee	2013-05-23 11:34:30 -04:00
Heikki Linnakangas	cb953d8b1b	Use the term "radix tree" instead of "suffix tree" for SP-GiST text opclass. What we have implemented is a radix tree (or a radix trie or a patricia trie), but the docs and code comments incorrectly called it a "suffix tree". Alexander Korotkov	2013-05-08 14:34:26 +03:00
Tom Lane	817a89423f	Stamp 9.3beta1.	2013-05-06 16:57:06 -04:00
Tom Lane	1d6c72a55b	Move materialized views' is-populated status into their pg_class entries. Previously this state was represented by whether the view's disk file had zero or nonzero size, which is problematic for numerous reasons, since it's breaking a fundamental assumption about heap storage. This was done to allow unlogged matviews to revert to unpopulated status after a crash despite our lack of any ability to update catalog entries post-crash. However, this poses enough risk of future problems that it seems better to not support unlogged matviews until we can find another way. Accordingly, revert that choice as well as a number of existing kluges forced by it in favor of creating a pg_class.relispopulated flag column.	2013-05-06 13:27:22 -04:00
Simon Riggs	ceabfb20f9	Bump PG_CONTROL_VERSION to 937	2013-04-30 13:27:47 +01:00
Simon Riggs	443951748c	Record data_checksum_version in control file. The value is not used anywhere in code, but will allow future changes to the checksum version should that become necessary in the future.	2013-04-30 12:27:12 +01:00
Tom Lane	db9f0e1d9a	Postpone creation of pathkeys lists to fix bug #8049 . This patch gets rid of the concept of, and infrastructure for, non-canonical PathKeys; we now only ever create canonical pathkey lists. The need for non-canonical pathkeys came from the desire to have grouping_planner initialize query_pathkeys and related pathkey lists before calling query_planner. However, since query_planner didn't actually do anything with those lists before they'd been made canonical, we can get rid of the whole mess by just not creating the lists at all until the point where we formerly canonicalized them. There are several ways in which we could implement that without making query_planner itself deal with grouping/sorting features (which are supposed to be the province of grouping_planner). I chose to add a callback function to query_planner's API; other alternatives would have required adding more fields to PlannerInfo, which while not bad in itself would create an ABI break for planner-related plugins in the 9.2 release series. This still breaks ABI for anything that calls query_planner directly, but it seems somewhat unlikely that there are any such plugins. I had originally conceived of this change as merely a step on the way to fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes that bug all by itself, as per the added regression test. The reason is that now get_eclass_for_sort_expr is adding the ORDER BY expression at the end of EquivalenceClass creation not the start, and so anything that is in a multi-member EquivalenceClass has already been created with correct em_nullable_relids. I am suspicious that there are related scenarios in which we still need to teach get_eclass_for_sort_expr to compute correct nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3 to fix bugs that are only hypothetical. So for the moment, do this and stop here. Back-patch to 9.2 but not to earlier branches, since they don't exhibit this bug for lack of join-clause-movement logic that depends on em_nullable_relids being correct. (We might have to revisit that choice if any related bugs turn up.) In 9.2, don't change the signature of make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as not to risk more plugin breakage than we have to.	2013-04-29 14:50:03 -04:00
Simon Riggs	43e7a66849	Introduce new page checksum algorithm and module. Isolate checksum calculation to its own module, so that bufpage knows little if anything about the details of the calculation. This implementation is a modified FNV-1a hash checksum, details of which are given in the new checksum.c header comments. Basic implementation only, so we fix the output value. Later related commits will add version numbers to pg_control, compiler optimization flags and memory barriers. Ants Aasma, reviewed by Jeff Davis and Simon Riggs	2013-04-29 09:05:27 +01:00
Tom Lane	f8db76e875	Editorialize a bit on new ProcessUtility() API. Choose a saner ordering of parameters (adding a new input param after the output params seemed a bit random), update the function's header comment to match reality (cmon folks, is this really that hard?), get rid of useless and sloppily-defined distinction between PROCESS_UTILITY_SUBCOMMAND and PROCESS_UTILITY_GENERATED.	2013-04-28 00:18:45 -04:00
Tom Lane	5194024d72	Incidental cleanup of matviews code. Move checking for unscannable matviews into ExecOpenScanRelation, which is a better place for it first because the open relation is already available (saving a relcache lookup cycle), and second because this eliminates the problem of telling the difference between rangetable entries that will or will not be scanned by the query. In particular we can get rid of the not-terribly-well-thought-out-or-implemented isResultRel field that the initial matviews patch added to RangeTblEntry. Also get rid of entirely unnecessary scannability check in the rewriter, and a bogus decision about whether RefreshMatViewStmt requires a parse-time snapshot. catversion bump due to removal of a RangeTblEntry field, which changes stored rules.	2013-04-27 17:48:57 -04:00
Peter Eisentraut	cc26ea9fe2	Clean up references to SQL92 In most cases, these were just references to the SQL standard in general. In a few cases, a contrast was made between SQL92 and later standards -- those have been kept unchanged.	2013-04-20 11:04:41 -04:00
Heikki Linnakangas	87ae9e7265	Remove some unused and seldom used fields from RelationAmInfo. This saves some memory from each index relcache entry. At least on a 64-bit machine, it saves just enough to shrink a typical relcache entry's memory usage from 2k to 1k. That's nice if you have a lot of backends and a lot of indexes.	2013-04-16 15:07:58 +03:00
Andrew Dunstan	d788121aba	Mark json IO and extraction functions immutable. Per complaint from Hubert Depesz Lubaczewski. Catalog version bumped.	2013-04-15 21:46:25 -04:00
Tom Lane	0b33790421	Clean up the mess around EXPLAIN and materialized views. Revert the matview-related changes in explain.c's API, as per recent complaint from Robert Haas. The reason for these appears to have been principally some ill-considered choices around having intorel_startup do what ought to be parse-time checking, plus a poor arrangement for passing it the view parsetree it needs to store into pg_rewrite when creating a materialized view. Do the latter by having parse analysis stick a copy into the IntoClause, instead of doing it at runtime. (On the whole, I seriously question the choice to represent CREATE MATERIALIZED VIEW as a variant of SELECT INTO/CREATE TABLE AS, because that means injecting even more complexity into what was already a horrid legacy kluge. However, I didn't go so far as to rethink that choice ... yet.) I also moved several error checks into matview parse analysis, and made the check for external Params in a matview more accurate. In passing, clean things up a bit more around interpretOidsOption(), and fix things so that we can use that to force no-oids for views, sequences, etc, thereby eliminating the need to cons up "oids = false" options when creating them. catversion bump due to change in IntoClause. (I wonder though if we really need readfuncs/outfuncs support for IntoClause anymore.)	2013-04-12 19:25:31 -04:00
Robert Haas	f8a54e936b	sepgsql: Enforce db_procedure:{execute} permission. To do this, we add an additional object access hook type, OAT_FUNCTION_EXECUTE. KaiGai Kohei	2013-04-12 08:58:01 -04:00
Robert Haas	d017bf41a3	Minor wording corrections for object-access hook stuff. KaiGai Kohei	2013-04-12 08:40:02 -04:00
Alvaro Herrera	6a76edb188	Fix confusion between ObjectType and ObjectClass Per report by Will Leinweber and Peter Eisentraut	2013-04-11 11:59:47 -03:00
Kevin Grittner	52e6e33ab4	Create a distinction between a populated matview and a scannable one. The intent was that being populated would, long term, be just one of the conditions which could affect whether a matview was scannable; being populated should be necessary but not always sufficient to scan the relation. Since only CREATE and REFRESH currently determine the scannability, names and comments accidentally conflated these concepts, leading to confusion. Also add missing locking for the SQL function which allows a test for scannability, and fix a modularity violatiion. Per complaints from Tom Lane, although its not clear that these will satisfy his concerns. Hopefully this will at least better frame the discussion.	2013-04-09 13:02:49 -05:00
Robert Haas	0bf42a5f3b	Adjust ExplainOneQuery_hook_type to take a DestReceiver argument. The materialized views patch adjusted ExplainOneQuery to take an additional DestReceiver argument, but failed to add a matching argument to the definition of ExplainOneQuery_hook. This is a problem for users of the hook that want to call ExplainOnePlan. Fix by adding the missing argument.	2013-04-09 10:25:08 -04:00
Tom Lane	3ccae48f44	Support indexing of regular-expression searches in contrib/pg_trgm. This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane	2013-04-09 01:06:54 -04:00
Simon Riggs	47c4333189	Avoid tricky race condition recording XLOG_HINT We copy the buffer before inserting an XLOG_HINT to avoid WAL CRC errors caused by concurrent hint writes to buffer while share locked. To make this work we refactor RestoreBackupBlock() to allow an XLOG_HINT to avoid the normal path for backup blocks, which assumes the underlying buffer is exclusive locked. Resulting code completely changes layout of XLOG_HINT WAL records, but this isn't even beta code, so this is a low impact change. In passing, avoid taking WALInsertLock for full page writes on checksummed hints, remove related cruft from XLogInsert() and improve xlog_desc record for XLOG_HINT. Andres Freund Bug report by Fujii Masao, testing by Jeff Janes and Jaime Casanova, review by Jeff Davis and Simon Riggs. Applied with changes from review and some comment editing.	2013-04-08 08:52:39 +01:00
Robert Haas	e965e6344c	sepgsql: Enforce db_schema:search permission. KaiGai Kohei, with comment and doc wordsmithing by me	2013-04-05 08:51:31 -04:00
Heikki Linnakangas	bf2b0a1478	Fix crash on compiling a regular expression with more than 32k colors. Throw an error instead. Backpatch to all supported branches.	2013-04-04 19:48:11 +03:00
Tom Lane	17fe2793ea	Fix insecure parsing of server command-line switches. An oversight in commit `e710b65c1c` allowed database names beginning with "-" to be treated as though they were secure command-line switches; and this switch processing occurs before client authentication, so that even an unprivileged remote attacker could exploit the bug, needing only connectivity to the postmaster's port. Assorted exploits for this are possible, some requiring a valid database login, some not. The worst known problem is that the "-r" switch can be invoked to redirect the process's stderr output, so that subsequent error messages will be appended to any file the server can write. This can for example be used to corrupt the server's configuration files, so that it will fail when next restarted. Complete destruction of database tables is also possible. Fix by keeping the database name extracted from a startup packet fully separate from command-line switches, as had already been done with the user name field. The Postgres project thanks Mitsumasa Kondo for discovering this bug, Kyotaro Horiguchi for drafting the fix, and Noah Misch for recognizing the full extent of the danger. Security: CVE-2013-1899	2013-04-01 14:00:51 -04:00
Tom Lane	ce9ab88981	Make REPLICATION privilege checks test current user not authenticated user. The pg_start_backup() and pg_stop_backup() functions checked the privileges of the initially-authenticated user rather than the current user, which is wrong. For example, a user-defined index function could successfully call these functions when executed by ANALYZE within autovacuum. This could allow an attacker with valid but low-privilege database access to interfere with creation of routine backups. Reported and fixed by Noah Misch. Security: CVE-2013-1901	2013-04-01 13:09:24 -04:00
Andrew Dunstan	a570c98d7f	Add new JSON processing functions and parser API. The JSON parser is converted into a recursive descent parser, and exposed for use by other modules such as extensions. The API provides hooks for all the significant parser event such as the beginning and end of objects and arrays, and providing functions to handle these hooks allows for fairly simple construction of a wide variety of JSON processing functions. A set of new basic processing functions and operators is also added, which use this API, including operations to extract array elements, object fields, get the length of arrays and the set of keys of a field, deconstruct an object into a set of key/value pairs, and create records from JSON objects and arrays of objects. Catalog version bumped. Andrew Dunstan, with some documentation assistance from Merlin Moncure.	2013-03-29 14:12:13 -04:00
Alvaro Herrera	473ab40c8b	Add sql_drop event for event triggers This event takes place just before ddl_command_end, and is fired if and only if at least one object has been dropped by the command. (For instance, DROP TABLE IF EXISTS of a table that does not in fact exist will not lead to such a trigger firing). Commands that drop multiple objects (such as DROP SCHEMA or DROP OWNED BY) will cause a single event to fire. Some firings might be surprising, such as ALTER TABLE DROP COLUMN. The trigger is fired after the drop has taken place, because that has been deemed the safest design, to avoid exposing possibly-inconsistent internal state (system catalogs as well as current transaction) to the user function code. This means that careful tracking of object identification is required during the object removal phase. Like other currently existing events, there is support for tag filtering. To support the new event, add a new pg_event_trigger_dropped_objects() set-returning function, which returns a set of rows comprising the objects affected by the command. This is to be used within the user function code, and is mostly modelled after the recently introduced pg_identify_object() function. Catalog version bumped due to the new function. Dimitri Fontaine and Álvaro Herrera Review by Robert Haas, Tom Lane	2013-03-28 13:05:48 -03:00
Simon Riggs	593c39d156	Revoke `bc5334d867`	2013-03-28 09:18:02 +00:00
Simon Riggs	bc5334d867	Allow external recovery_config_directory If required, recovery.conf can now be located outside of the data directory. Server needs read/write permissions on this directory.	2013-03-27 11:45:42 +00:00
Kevin Grittner	549dae0352	Fix problems with incomplete attempt to prohibit OIDS with MVs. Problem with assertion failure in restoring from pg_dump output reported by Joachim Wieland. Review and suggestions by Tom Lane and Robert Haas.	2013-03-22 13:27:34 -05:00
Simon Riggs	96ef3b8ff1	Allow I/O reliability checks using 16-bit checksums Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.	2013-03-22 13:54:07 +00:00
Tom Lane	9cbc4b80dd	Redo postgres_fdw's planner code so it can handle parameterized paths. I wasn't going to ship this without having at least some example of how to do that. This version isn't terribly bright; in particular it won't consider any combinations of multiple join clauses. Given the cost of executing a remote EXPLAIN, I'm not sure we want to be very aggressive about doing that, anyway. In support of this, refactor generate_implied_equalities_for_indexcol so that it can be used to extract equivalence clauses that aren't necessarily tied to an index.	2013-03-21 19:44:32 -04:00
Alvaro Herrera	f8348ea32e	Allow extracting machine-readable object identity Introduce pg_identify_object(oid,oid,int4), which is similar in spirit to pg_describe_object but instead produces a row of machine-readable information to uniquely identify the given object, without resorting to OIDs or other internal representation. This is intended to be used in the event trigger implementation, to report objects being operated on; but it has usefulness of its own. Catalog version bumped because of the new function.	2013-03-20 18:19:19 -03:00
Simon Riggs	bb7cc2623f	Remove PageSetTLI and rename pd_tli to pd_checksum Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.	2013-03-18 13:46:42 +00:00
Robert Haas	05f3f9c7b2	Extend object-access hook machinery to support post-alter events. This also slightly widens the scope of what we support in terms of post-create events. KaiGai Kohei, with a few changes, mostly to the comments, by me	2013-03-17 22:57:26 -04:00
Tom Lane	e2a203a190	initdb needs pqsignal() even on Windows. I had thought we weren't using this version of pqsignal() at all on Windows, but that's wrong --- initdb is using it (and coping with the POSIX-ish semantics of bare signal() :-(). So allow the file to be built in WIN32+FRONTEND case, and add it to the MSVC build logic.	2013-03-17 15:19:47 -04:00
Tom Lane	da5aeccf64	Move pqsignal() to libpgport. We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.	2013-03-17 12:06:42 -04:00
Tom Lane	d43837d030	Add lock_timeout configuration parameter. This GUC allows limiting the time spent waiting to acquire any one heavyweight lock. In support of this, improve the recently-added timeout infrastructure to permit efficiently enabling or disabling multiple timeouts at once. That reduces the performance hit from turning on lock_timeout, though it's still not zero. Zoltán Böszörményi, reviewed by Tom Lane, Stephen Frost, and Hari Babu	2013-03-16 23:22:57 -04:00
Tom Lane	4387cf956b	Avoid inserting Result nodes that only compute identity projections. The planner sometimes inserts Result nodes to perform column projections (ie, arbitrary scalar calculations) above plan nodes that lack projection logic of their own. However, we did that even if the lower plan node was in fact producing the required column set already; which is a pretty common case given the popularity of "SELECT * FROM ...". Measurements show that the useless plan node adds non-negligible overhead, especially when there are many columns in the result. So add a check to avoid inserting a Result node unless there's something useful for it to do. There are a couple of remaining places where unnecessary Result nodes could get inserted, but they are (a) much less performance-critical, and (b) coded in such a way that it's hard to avoid inserting a Result, because the desired tlist is changed on-the-fly in subsequent logic. We'll leave those alone for now. Kyotaro Horiguchi; reviewed and further hacked on by Amit Kapila and Tom Lane.	2013-03-14 13:43:18 -04:00
Heikki Linnakangas	59d0bf9dca	Add cost estimation of range @> and <@ operators. The estimates are based on the existing lower bound histogram, and a new histogram of range lengths. Bump catversion, because the range length histogram now needs to be present in statistic slot kind 6, or you get an error on @> and <@ queries. (A re-ANALYZE would be enough to fix that, though) Alexander Korotkov, with some refactoring by me.	2013-03-14 15:36:56 +02:00
Andrew Dunstan	38fb4d978c	JSON generation improvements. This adds the following: json_agg(anyrecord) -> json to_json(any) -> json hstore_to_json(hstore) -> json (also used as a cast) hstore_to_json_loose(hstore) -> json The last provides heuristic treatment of numbers and booleans. Also, in json generation, if any non-builtin type has a cast to json, that function is used instead of the type's output function. Andrew Dunstan, reviewed by Steve Singer. Catalog version bumped.	2013-03-10 17:35:36 -04:00
Tom Lane	21734d2fb8	Support writable foreign tables. This patch adds the core-system infrastructure needed to support updates on foreign tables, and extends contrib/postgres_fdw to allow updates against remote Postgres servers. There's still a great deal of room for improvement in optimization of remote updates, but at least there's basic functionality there now. KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather heavily revised by Tom Lane.	2013-03-10 14:16:02 -04:00
Magnus Hagander	7f49a67f95	Report pg_hba line number and contents when users fail to log in Instead of just reporting which user failed to log in, log both the line number in the active pg_hba.conf file (which may not match reality in case the file has been edited and not reloaded) and the contents of the matching line (which will always be correct), to make it easier to debug incorrect pg_hba.conf files. The message to the client remains unchanged and does not include this information, to prevent leaking security sensitive information. Reviewed by Tom Lane and Dean Rasheed	2013-03-10 15:54:37 +01:00
Heikki Linnakangas	96443d1420	Forgot catversion bump in the SP-GiST adjacent support patch.	2013-03-08 17:12:38 +02:00
Heikki Linnakangas	23f10b6473	SP-GiST support of the range adjacent operator -\|- Alexander Korotkov, reviewed by Jeff Davis.	2013-03-08 15:03:19 +02:00
Heikki Linnakangas	7ccefe8610	Fix tli history file fetching, broken by the archive after crash recevery patch. If we were about to enter archive recovery after crash recovery, we scanned the archive for the latest tli history file, and set the recovery target timeline to that. However, when we actually tried to read the history file, we would not fetch the file from the archive, because we were not in archive recovery yet. To fix, make readTimeLineHistory and existsTimeLineHistory to always fetch the file from archive if archive recovery is requested, even if we're not in archive recovery yet. Backpatch to 9.2. Mitsumasa KONDO	2013-03-07 12:33:24 +02:00
Tom Lane	1908abc4a3	Arrange to cache FdwRoutine structs in foreign tables' relcache entries. This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.	2013-03-06 23:48:09 -05:00
Robert Haas	f90cc26982	Code beautification for object-access hook machinery. KaiGai Kohei	2013-03-06 20:53:25 -05:00
Tom Lane	e11cb8ba2c	Fix missing #include in commands/matview.h. It needs parsenodes.h to be compilable regardless of previous headers.	2013-03-06 18:21:05 -05:00
Tom Lane	80b011ef0a	Fix to_char() to use ASCII-only case-folding rules where appropriate. formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.	2013-03-05 13:02:30 -05:00
Kevin Grittner	c8056592bc	Bump catversion because of new function in the materialized view patch.	2013-03-05 05:32:03 -06:00
Kevin Grittner	3bf3ab8c56	Add a materialized view relations. A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.	2013-03-03 18:23:31 -06:00
Tom Lane	b15a6da292	Get rid of any toast table when converting a table to a view. Also make sure other fields of the view's pg_class entry are appropriate for a view; it shouldn't have relfrozenxid set for instance. This ancient omission isn't believed to have any serious consequences in versions 8.4-9.2, so no backpatch. But let's fix it before it does bite us in some serious way. It's just luck that the case doesn't cause problems for autovacuum. (It did cause problems in 8.3, but that's out of support.) Andres Freund	2013-03-03 19:05:47 -05:00
Tom Lane	2b78d101d1	Fix SQL function execution to be safe with long-lived FmgrInfos. fmgr_sql had been designed on the assumption that the FmgrInfo it's called with has only query lifespan. This is demonstrably unsafe in connection with range types, as shown in bug #7881 from Andrew Gierth. Fix things so that we re-generate the function's cache data if the (sub)transaction it was made in is no longer active. Back-patch to 9.2. This might be needed further back, but it's not clear whether the case can realistically arise without range types, so for now I'll desist from back-patching further.	2013-03-03 17:39:58 -05:00
Heikki Linnakangas	3d009e45bd	Add support for piping COPY to/from an external program. This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding psql \copy syntax. Like with reading/writing files, the backend version is superuser-only, and in the psql version, the program is run in the client. In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you the stdin/stdout is quoted, it's now interpreted as a filename. For example, "\copy foo from 'stdin'" now reads from a file called 'stdin', not from standard input. Before this, there was no way to specify a filename called stdin, stdout, pstdin or pstdout. This creates a new function in pgport, wait_result_to_str(), which can be used to convert the exit status of a process, as returned by wait(3), to a human-readable string. Etsuro Fujita, reviewed by Amit Kapila.	2013-02-27 18:22:31 +02:00
Tom Lane	c153530dc1	Install headers from the new src/include/common subdirectory. This got missed in commit `8396447cdb`. Andres Freund	2013-02-26 15:27:30 -05:00
Alvaro Herrera	639ed4e84b	Add pg_xlogdump contrib program This program relies on rm_desc backend routines and the xlogreader infrastructure to emit human-readable rendering of WAL records. Author: Andres Freund, with many reworks by Álvaro Reviewed (in a much earlier version) by Peter Eisentraut	2013-02-22 16:56:55 -03:00
Alvaro Herrera	a730183926	Move relpath() to libpgcommon This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.	2013-02-21 22:46:17 -03:00
Tom Lane	54a2786835	Need to decorate XactIsoLevel as PGDLLIMPORT for postgres_fdw. Per buildfarm.	2013-02-21 09:28:42 -05:00
Alvaro Herrera	a40d09e27f	Move ExceptionalCondition back to postgres.h It needs to be defined in the backend even when assertions are not enabled. It's cleaner to put it back, than create a separate #ifdef section in c.h. Per trouble report from Jeff Janes	2013-02-18 18:53:32 -03:00
Alvaro Herrera	187492b6c2	Split pgstat file in smaller pieces We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane.	2013-02-18 18:12:52 -03:00
Peter Eisentraut	9475db3a4e	Add ALTER ROLE ALL SET command This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE ... SET functionality to allow creating settings that apply to all users in all databases. reviewed by Pavel Stehule	2013-02-17 23:45:36 -05:00
Simon Riggs	c2f79ba269	Force archive_status of .done for xlogs created by dearchival/replication. This is a forward-patch of commit `6f4b8a4f4f`, applied to 9.2 back in August. The plan was to do something else in master, but it looks like it's not going to happen, so let's just apply the 9.2 solution to master as well. Fujii Masao	2013-02-15 19:28:06 +02:00
Tom Lane	fdaf44862b	Invent pre-commit/pre-prepare/pre-subcommit events for xact callbacks. Currently it's only possible for loadable modules to get control during post-commit cleanup of a transaction. That doesn't work too well if they want to do something that could throw an error; for example, an FDW might need to issue a remote commit, which could well fail. To improve matters, extend the existing APIs for XactCallback and SubXactCallback functions to provide new pre-commit events for this purpose. The release notes will need to mention that existing callback functions should be checked to make sure they don't do something unwanted when one of the new event types occurs. In the examples within our source tree, contrib/sepgsql was fine but plpgsql had been a bit too cute.	2013-02-14 20:35:08 -05:00
Tom Lane	71627f3d19	Fix CVE-2013-0255 properly. Revert commit `ab0f7b6089` (in HEAD only) in favor of the proper solution, which is to declare enum_recv() correctly in the system catalogs. It should be declared to take type "internal" not "cstring". Also improve the type_sanity regression test, which should have caught this typo, so that it actually would. Most of the relevant checks on the signature of type I/O functions should not have been restricted to basetypes/pseudotypes, as they should apply to any type's I/O functions.	2013-02-13 16:20:01 -05:00
Alvaro Herrera	0e81ddde2c	Rename "string" pstrdup argument to "in" The former name collides with a symbol also used in the isolation test's parser, causing assorted failures in certain platforms.	2013-02-12 12:43:09 -03:00
Alvaro Herrera	8396447cdb	Create libpgcommon, and move pg_malloc et al to it libpgcommon is a new static library to allow sharing code among the various frontend programs and backend; this lets us eliminate duplicate implementations of common routines. We avoid libpgport, because that's intended as a place for porting issues; per discussion, it seems better to keep them separate. The first use case, and the only implemented by this patch, is pg_malloc and friends, which many frontend programs were already using. At the same time, we can use this to provide palloc emulation functions for the frontend; this way, some palloc-using files in the backend can also be used by the frontend cleanly. To do this, we change palloc() in the backend to be a function instead of a macro on top of MemoryContextAlloc(). This was previously believed to cause loss of performance, but this implementation has been tweaked by Tom and Andres so that on modern compilers it provides a slight improvement over the previous one. This lets us clean up some places that were already with localized hacks. Most of the pg_malloc/palloc changes in this patch were authored by Andres Freund. Zoltán Böszörményi also independently provided a form of that. libpgcommon infrastructure was authored by Álvaro.	2013-02-12 11:21:05 -03:00
Peter Eisentraut	0cb1fac3b1	Add noreturn attributes to some error reporting functions	2013-02-12 07:13:22 -05:00
Heikki Linnakangas	62401db45c	Support unlogged GiST index. The reason this wasn't supported before was that GiST indexes need an increasing sequence to detect concurrent page-splits. In a regular WAL- logged GiST index, the LSN of the page-split record is used for that purpose, and in a temporary index, we can get away with a backend-local counter. Neither of those methods works for an unlogged relation. To provide such an increasing sequence of numbers, create a "fake LSN" counter that is saved and restored across shutdowns. On recovery, unlogged relations are blown away, so the counter doesn't need to survive that either. Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.	2013-02-11 23:07:09 +02:00
Heikki Linnakangas	7803e9327d	Include previous TLI in end-of-recovery and shutdown checkpoint records. This isn't used for anything but a sanity check at the moment, but it could be highly valuable for debugging purposes. It could also be used to recreate timeline history by traversing WAL, which seems useful.	2013-02-11 18:16:25 +02:00
Tom Lane	0fd0f3688b	Document and clean up gistsplit.c. Improve comments, rename some variables and functions, slightly simplify a couple of APIs, in an attempt to make this code readable by people other than its original author. Even though this is essentially just cosmetic, back-patch to all active branches, because otherwise it's going to make back-patching future fixes in this file very painful.	2013-02-10 11:58:15 -05:00
Tom Lane	c61e26ee3e	Add support for ALTER RULE ... RENAME TO. Ali Dar, reviewed by Dean Rasheed.	2013-02-08 23:58:40 -05:00
Alvaro Herrera	381d4b70a9	Clean up c.h / postgres.h after Assert() move Per Tom	2013-02-08 12:50:58 -03:00
Tom Lane	166d534fcd	Repair bugs in GiST page splitting code for multi-column indexes. When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.	2013-02-07 17:44:02 -05:00
Alvaro Herrera	5a1cd89f8f	Split out list of XLog resource managers The new rmgrlist.h header, containing all necessary data about built-in resource managers, allows other pieces of code to access them. In particular, this allows a future pg_xlogdump program to extract rm_desc function pointers, without having to keep a duplicate list of them.	2013-02-06 08:47:28 -03:00
Alvaro Herrera	e1d25de35a	Move Assert() definitions to c.h This way, they can be used by frontend and backend code. We already supported that, but doing it this way allows us to mix true frontend files with backend files compiled in frontend environment. Author: Andres Freund	2013-02-01 17:50:04 -03:00
Tom Lane	0900ac2d0d	Fix plpgsql's reporting of plan-time errors in possibly-simple expressions. exec_simple_check_plan and exec_eval_simple_expr attempted to call GetCachedPlan directly. This meant that if an error was thrown during planning, the resulting context traceback would not include the line normally contributed by _SPI_error_callback. This is already inconsistent, but just to be really odd, a re-execution of the very same expression would show the additional context line, because we'd already have cached the plan and marked the expression as non-simple. The problem is easy to demonstrate in 9.2 and HEAD because planning of a cached plan doesn't occur at all until GetCachedPlan is done. In earlier versions, it could only be an issue if initial planning had succeeded, then a replan was forced (already somewhat improbable for a simple expression), and the replan attempt failed. Since the issue is mainly cosmetic in older branches anyway, it doesn't seem worth the risk of trying to fix it there. It is worth fixing in 9.2 since the instability of the context printout can affect the results of GET STACKED DIAGNOSTICS, as per a recent discussion on pgsql-novice. To fix, introduce a SPI function that wraps GetCachedPlan while installing the correct callback function. Use this instead of calling GetCachedPlan directly from plpgsql. Also introduce a wrapper function for extracting a SPI plan's CachedPlanSource list. This lets us stop including spi_priv.h in pl_exec.c, which was never a very good idea from a modularity standpoint. In passing, fix a similar inconsistency that could occur in SPI_cursor_open, which was also calling GetCachedPlan without setting up a context callback.	2013-01-30 20:02:23 -05:00
Tom Lane	991f3e5ab3	Provide database object names as separate fields in error messages. This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.	2013-01-29 17:08:26 -05:00
Simon Riggs	fd4ced5230	Fast promote mode skips checkpoint at end of recovery. pg_ctl promote -m fast will skip the checkpoint at end of recovery so that we can achieve very fast failover when the apply delay is low. Write new WAL record XLOG_END_OF_RECOVERY to allow us to switch timeline correctly for downstream log readers. If we skip synchronous end of recovery checkpoint we request a normal spread checkpoint so that the window of re-recovery is low. Simon Riggs and Kyotaro Horiguchi, with input from Fujii Masao. Review by Heikki Linnakangas	2013-01-29 00:06:15 +00:00
Bruce Momjian	7e2322dff3	Allow CREATE TABLE IF EXIST so succeed if the schema is nonexistent Previously, CREATE TABLE IF EXIST threw an error if the schema was nonexistent. This was done by passing 'missing_ok' to the function that looks up the schema oid.	2013-01-26 13:24:50 -05:00
Tom Lane	0d5fbdc157	Change plan caching to honor, not resist, changes in search_path. In the initial implementation of plan caching, we saved the active search_path when a plan was first cached, then reinstalled that path anytime we needed to reparse or replan. The idea of that was to try to reselect the same referenced objects, in somewhat the same way that views continue to refer to the same objects in the face of schema or name changes. Of course, that analogy doesn't bear close inspection, since holding the search_path fixed doesn't cope with object drops or renames. Moreover sticking with the old path seems to create more surprises than it avoids. So instead of doing that, consider that the cached plan depends on search_path, and force reparse/replan if the active search_path is different than it was when we last saved the plan. This gets us fairly close to having "transparency" of plan caching, in the sense that the cached statement acts the same as if you'd just resubmitted the original query text for another execution. There are still some corner cases where this fails though: a new object added in the search path schema(s) might capture a reference in the query text, but we'd not realize that and force a reparse. We might try to fix that in the future, but for the moment it looks too expensive and complicated.	2013-01-25 14:14:41 -05:00
Andrew Dunstan	1068771abf	Use correct output device for Windows prompts. This ensures that mapping of non-ascii prompts to the correct code page occurs. Bug report and original patch from Alexander Law, reviewed and reworked by Noah Misch. Backpatch to all live branches.	2013-01-24 16:01:31 -05:00
Alvaro Herrera	74ebba84ae	Redefine HEAP_XMAX_IS_LOCKED_ONLY Tuples marked SELECT FOR UPDATE in a cluster that's later processed by pg_upgrade would have a different infomask bit pattern than those produced by 9.3dev; that bit pattern was being seen as "dead" by HEAD (because they would fail the "is this tuple locked" test, and so the visibility rules would thing they're updated, even though there's no HEAP_UPDATED version of them). In other words, some rows could silently disappear after pg_upgrade. With this new definition, those tuples become visible again. This is breakage resulting from my commit `0ac5ad5134`.	2013-01-24 16:10:02 -03:00
Alvaro Herrera	0ac5ad5134	Improve concurrency of foreign key locking This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov	2013-01-23 12:04:59 -03:00
Heikki Linnakangas	52906f175a	Implement pg_unreachable() on MSVC.	2013-01-23 12:53:55 +02:00
Heikki Linnakangas	990fe3c4ed	Fix more issues with cascading replication and timeline switches. When a standby server follows the master using WAL archive, and it chooses a new timeline (recovery_target_timeline='latest'), it only fetches the timeline history file for the chosen target timeline, not any other history files that might be missing from pg_xlog. For example, if the current timeline is 2, and we choose 4 as the new recovery target timeline, the history file for timeline 3 is not fetched, even if it's part of this server's history. That's enough for the standby itself - the history file for timeline 4 includes timeline 3 as well - but if a cascading standby server wants to recover to timeline 3, it needs the history file. To fix, when a new recovery target timeline is chosen, try to copy any missing history files from the archive to pg_xlog between the old and new target timeline. A second similar issue was with the WAL files. When a standby recovers from archive, and it reaches a segment that contains a switch to a new timeline, recovery fetches only the WAL file labelled with the new timeline's ID. The file from the new timeline contains a copy of the WAL from the old timeline up to the point where the switch happened, and recovery recovers it from the new file. But in streaming replication, walsender only tries to read it from the old timeline's file. To fix, change walsender to read it from the new file, so that it behaves the same as recovery in that sense, and doesn't try to open the possibly nonexistent file with the old timeline's ID.	2013-01-23 10:19:20 +02:00
Tom Lane	75b39e7909	Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag. Originally we didn't bother to mark FuncExprs with any indication whether VARIADIC had been given in the source text, because there didn't seem to be any need for it at runtime. However, because we cannot fold a VARIADIC ANY function's arguments into an array (since they're not necessarily all the same type), we do actually need that information at runtime if VARIADIC ANY functions are to respond unsurprisingly to use of the VARIADIC keyword. Add the missing field, and also fix ruleutils.c so that VARIADIC ANY function calls are dumped properly. Extracted from a larger patch that also fixes concat() and format() (the only two extant VARIADIC ANY functions) to behave properly when VARIADIC is specified. This portion seems appropriate to review and commit separately. Pavel Stehule	2013-01-21 20:26:15 -05:00
Robert Haas	841a5150c5	Add ddl_command_end support for event triggers. Dimitri Fontaine, with slight changes by me	2013-01-21 18:00:24 -05:00
Alvaro Herrera	765cbfdc92	Refactor ALTER some-obj RENAME implementation Remove duplicate implementations of catalog munging and miscellaneous privilege checks. Instead rely on already existing data in objectaddress.c to do the work. Author: KaiGai Kohei, changes by me Reviewed by: Robert Haas, Álvaro Herrera, Dimitri Fontaine	2013-01-21 12:06:41 -03:00
Heikki Linnakangas	6f7cddc7ae	Now that START_REPLICATION returns the next timeline's ID after reaching end of timeline, take advantage of that in walreceiver. Startup process is still in control of choosign the target timeline, by scanning the timeline history files present in pg_xlog, but walreceiver now uses the next timeline's ID to fetch its history file immediately after it has finished streaming the old timeline. Before, the standby would first try to restart streaming on the old timeline, which fetches the missing timeline history file as a side-effect, and only then restart from the new timeline. This patch eliminates the extra iteration, which speeds up the timeline switch and reduces the noise in the log caused by the extra restart on the old timeline.	2013-01-18 11:59:34 +02:00
Heikki Linnakangas	2ff6555313	Use the right timeline when beginning to stream from master. The xlogreader refactoring broke the logic to decide which timeline to start streaming from. XLogPageRead() uses the timeline history to check which timeline the requested WAL position falls into. However, after the refactoring, XLogPageRead() is always first called with the first page in the segment, to verify the segment header, and only then with the actual WAL position we're interested in. That first read of the segment's header made XLogPageRead() to always start streaming from the old timeline containing the segment header, not the timeline containing the actual record, if there was a timeline switch within the segment. I thought I fixed this yesterday, but that fix was too narrow and only fixed this for the corner-case that the timeline switch happened in the first page of the segment. To fix this more robustly, pass explicitly the position of the record we're actually interested in to XLogPageRead, and use that to decide which timeline to read from, rather than deduce it from the page and offset. Per report from Fujii Masao.	2013-01-18 11:46:49 +02:00
Alvaro Herrera	279628a0a7	Accelerate end-of-transaction dropping of relations When relations are dropped, at end of transaction we need to remove the files and clean the buffer pool of buffers containing pages of those relations. Previously we would scan the buffer pool once per relation to clean up buffers. When there are many relations to drop, the repeated scans make this process slow; so we now instead pass a list of relations to drop and scan the pool once, checking each buffer against the passed list. When the number of relations is larger than a threshold (which as of this patch is being set to 20 relations) we sort the array before starting, and bsearch the array; when it's smaller, we simply scan the array linearly each time, because that's faster. The exact optimal threshold value depends on many factors, but the difference is not likely to be significant enough to justify making it user-settable. This has been measured to be a significant win (a 15x win when dropping 100,000 relations; an extreme case, but reportedly a real one). Author: Tomas Vondra, some tweaks by me Reviewed by: Robert Haas, Shigeru Hanada, Andres Freund, Álvaro Herrera	2013-01-17 16:13:17 -03:00
Heikki Linnakangas	0b6329130e	Make pg_receivexlog and pg_basebackup -X stream work across timeline switches. This mirrors the changes done earlier to the server in standby mode. When receivelog reaches the end of a timeline, as reported by the server, it fetches the timeline history file of the next timeline, and restarts streaming from the new timeline by issuing a new START_STREAMING command. When pg_receivexlog crosses a timeline, it leaves the .partial suffix on the last segment on the old timeline. This helps you to tell apart a partial segment left in the directory because of a timeline switch, and a completed segment. If you just follow a single server, it won't make a difference, but it can be significant in more complicated scenarios where new WAL is still generated on the old timeline. This includes two small changes to the streaming replication protocol: First, when you reach the end of timeline while streaming, the server now sends the TLI of the next timeline in the server's history to the client. pg_receivexlog uses that as the next timeline, so that it doesn't need to parse the timeline history file like a standby server does. Second, when BASE_BACKUP command sends the begin and end WAL positions, it now also sends the timeline IDs corresponding the positions.	2013-01-17 20:23:00 +02:00
Heikki Linnakangas	9ee4d06f3f	Make GiST indexes on-disk compatible with 9.2 again. The patch that turned XLogRecPtr into a uint64 inadvertently changed the on-disk format of GiST indexes, because the NSN field in the GiST page opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that field back to the two-field struct that XLogRecPtr was before. This is the same we did to LSNs in the page header to avoid changing on-disk format. Bump catversion, as this invalidates any existing GiST indexes built on 9.3devel.	2013-01-17 16:46:16 +02:00
Alvaro Herrera	7fcbf6a405	Split out XLog reading as an independent facility This new facility can not only be used by xlog.c to carry out crash recovery, but also by external programs. By supplying a function to read XLog pages from somewhere, all the WAL reading can be used for completely different purposes. For the standard backend use, the behavior should be pretty much the same as previously. As for non-backend programs, an hypothetical pg_xlogdump program is now closer to reality, but some more backend support is still necessary. This patch was originally submitted by Andres Freund in a different form, but Heikki Linnakangas opted for and authored another design of the concept. Andres has advanced the patch since Heikki's initial version. Review and some (mostly cosmetics) changes by me.	2013-01-16 16:12:53 -03:00
Alvaro Herrera	7ac5760fa2	Rework order of checks in ALTER / SET SCHEMA When attempting to move an object into the schema in which it already was, for most objects classes we were correctly complaining about exactly that ("object is already in schema"); but for some other object classes, such as functions, we were instead complaining of a name collision ("object already exists in schema"). The latter is wrong and misleading, per complaint from Robert Haas in CA+TgmoZ0+gNf7RDKRc3u5rHXffP=QjqPZKGxb4BsPz65k7qnHQ@mail.gmail.com To fix, refactor the way these checks are done. As a bonus, the resulting code is smaller and can also share some code with Rename cases. While at it, remove use of getObjectDescriptionOids() in error messages. These are normally disallowed because of translatability considerations, but this one had slipped through since 9.1. (Not sure that this is worth backpatching, though, as it would create some untranslated messages in back branches.) This is loosely based on a patch by KaiGai Kohei, heavily reworked by me.	2013-01-15 13:23:43 -03:00
Tom Lane	2065dd2834	Prevent very-low-probability PANIC during PREPARE TRANSACTION. The code in PostPrepare_Locks supposed that it could reassign locks to the prepared transaction's dummy PGPROC by deleting the PROCLOCK table entries and immediately creating new ones. This was safe when that code was written, but since we invented partitioning of the shared lock table, it's not safe --- another process could steal away the PROCLOCK entry in the short interval when it's on the freelist. Then, if we were otherwise out of shared memory, PostPrepare_Locks would have to PANIC, since it's too late to back out of the PREPARE at that point. Fix by inventing a dynahash.c function to atomically update a hashtable entry's key. (This might possibly have other uses in future.) This is an ancient bug that in principle we ought to back-patch, but the odds of someone hitting it in the field seem really tiny, because (a) the risk window is small, and (b) nobody runs servers with maxed-out lock tables for long, because they'll be getting non-PANIC out-of-memory errors anyway. So fixing it in HEAD seems sufficient, at least until the new code has gotten some testing.	2013-01-13 22:20:22 -05:00
Tom Lane	b853eb9718	Improve handling of ereport(ERROR) and elog(ERROR). In commit `71450d7fd6`, we added code to inform suitably-intelligent compilers that ereport() doesn't return if the elevel is ERROR or higher. This patch extends that to elog(), and also fixes a double-evaluation hazard that the previous commit created in ereport(), as well as reducing the emitted code size. The elog() improvement requires the compiler to support __VA_ARGS__, which should be available in just about anything nowadays since it's required by C99. But our minimum language baseline is still C89, so add a configure test for that. The previous commit assumed that ereport's elevel could be evaluated twice, which isn't terribly safe --- there are already counterexamples in xlog.c. On compilers that have __builtin_constant_p, we can use that to protect the second test, since there's no possible optimization gain if the compiler doesn't know the value of elevel. Otherwise, use a local variable inside the macros to prevent double evaluation. The local-variable solution is inferior because (a) it leads to useless code being emitted when elevel isn't constant, and (b) it increases the optimization level needed for the compiler to recognize that subsequent code is unreachable. But it seems better than not teaching non-gcc compilers about unreachability at all. Lastly, if the compiler has __builtin_unreachable(), we can use that instead of abort(), resulting in a noticeable code savings since no function call is actually emitted. However, it seems wise to do this only in non-assert builds. In an assert build, continue to use abort(), so that the behavior will be predictable and debuggable if the "impossible" happens. These changes involve making the ereport and elog macros emit do-while statement blocks not just expressions, which forces small changes in a few call sites. Andres Freund, Tom Lane, Heikki Linnakangas	2013-01-13 18:40:09 -05:00
Tom Lane	31f38f28b0	Redesign the planner's handling of index-descent cost estimation. Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit `21a39de580` I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit `bf01e34b55` fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.	2013-01-11 12:56:58 -05:00
Magnus Hagander	940d136661	Centralize single quote escaping in src/port/quotes.c For code-reuse in upcoming functionality in pg_basebackup. Zoltan Boszormenyi	2013-01-05 15:40:19 +01:00
Tom Lane	94afbd5831	Invent a "one-shot" variant of CachedPlans for better performance. SPI_execute() and related functions create a CachedPlan, execute it once, and immediately discard it, so that the functionality offered by plancache.c is of no value in this code path. And performance measurements show that the extra data copying and invalidation checking done by plancache.c slows down simple queries by 10% or more compared to 9.1. However, enough of the SPI code is shared with functions that do need plan caching that it seems impractical to bypass plancache.c altogether. Instead, let's invent a variant version of cached plans that preserves 99% of the API but doesn't offer any of the actual functionality, nor the overhead. This puts SPI_execute() performance back on par, or maybe even slightly better, than it was before. This change should resolve recent complaints of performance degradation from Dong Ye, Pavel Stehule, and others. By avoiding data copying, this change also reduces the amount of memory needed to execute many-statement SPI_execute() strings, as for instance in a recent complaint from Tomas Vondra. An additional benefit of this change is that multi-statement SPI_execute() query strings are now processed fully serially, that is we complete execution of earlier statements before running parse analysis and planning on following ones. This eliminates a long-standing POLA violation, in that DDL that affects the behavior of a later statement will now behave as expected. Back-patch to 9.2, since this was a performance regression compared to 9.1. (In 9.2, place the added struct fields so as to avoid changing the offsets of existing fields.) Heikki Linnakangas and Tom Lane	2013-01-04 17:42:19 -05:00
Heikki Linnakangas	b0daba57bb	Tolerate timeline switches while "pg_basebackup -X fetch" is running. If you take a base backup from a standby server with "pg_basebackup -X fetch", and the timeline switches while the backup is being taken, the backup used to fail with an error "requested WAL segment %s has already been removed". This is because the server-side code that sends over the required WAL files would not construct the WAL filename with the correct timeline after a switch. Fix that by using readdir() to scan pg_xlog for all the WAL segments in the range, regardless of timeline. Also, include all timeline history files in the backup, if taken with "-X fetch". That fixes another related bug: If a timeline switch happened just before the backup was initiated in a standby, the WAL segment containing the initial checkpoint record contains WAL from the older timeline too. Recovery will not accept that without a timeline history file that lists the older timeline. Backpatch to 9.2. Versions prior to that were not affected as you could not take a base backup from a standby before 9.2.	2013-01-03 19:51:00 +02:00
Magnus Hagander	794397ae1d	Move tar function headers to pgtar.h This makes it possible to include them only where they are used, so we can avoid the conflict of the uid_t and gid_t datatypes that happened in plperl (since plperl doesn't need the tar functions)	2013-01-02 20:34:08 +01:00
Alvaro Herrera	dfbba2c86c	Make sure MaxBackends is always set Auxiliary and bootstrap processes weren't getting it, causing initdb to fail completely.	2013-01-02 14:39:11 -03:00
Alvaro Herrera	cdbc0ca48c	Fix background workers for EXEC_BACKEND Commit `da07a1e8` was broken for EXEC_BACKEND because I failed to realize that the MaxBackends recomputation needed to be duplicated by subprocesses in SubPostmasterMain. However, instead of having the value be recomputed at all, it's better to assign the correct value at postmaster initialization time, and have it be propagated to exec'ed backends via BackendParameters. MaxBackends stays as zero until after modules in shared_preload_libraries have had a chance to register bgworkers, since the value is going to be untrustworthy till that's finished. Heikki Linnakangas and Álvaro Herrera	2013-01-02 12:01:14 -03:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Magnus Hagander	f5d4bdd3a5	Unify some tar functionality across different parts Move some of the tar functionality that existed mostly duplicated in both pg_dump and the walsender basebackup functionality into port/tar.c instead, so it can be used from both. It will also be used by pg_basebackup in the future, which would've caused a third copy of it around. Zoltan Boszormenyi and Magnus Hagander	2013-01-01 18:15:57 +01:00
Heikki Linnakangas	60df192aea	Keep timeline history files restored from archive in pg_xlog. The cascading standby patch in 9.2 changed the way WAL files are treated when restored from the archive. Before, they were restored under a temporary filename, and not kept in pg_xlog, but after the patch, they were copied under pg_xlog. This is necessary for a cascading standby to find them, but it also means that if the archive goes offline and a standby is restarted, it can recover back to where it was using the files in pg_xlog. It also means that if you take an offline backup from a standby server, it includes all the required WAL files in pg_xlog. However, the same change was not made to timeline history files, so if the WAL segment containing the checkpoint record contains a timeline switch, you will still get an error if you try to restart recovery without the archive, or recover from an offline backup taken from the standby. With this patch, timeline history files restored from archive are copied into pg_xlog like WAL files are, so that pg_xlog contains all the files required to recover. This is a corner-case pre-existing issue in 9.2, but even more important in master where it's possible for a standby to follow a timeline switch through streaming replication. To make that possible, the timeline history files must be present in pg_xlog.	2012-12-30 14:29:45 +02:00
Robert Haas	82b1b213ca	Adjust more backend functions to return OID rather than void. This is again intended to support extensions to the event trigger functionality. This may go a bit further than we need for that purpose, but there's some value in being consistent, and the OID may be useful for other purposes also. Dimitri Fontaine	2012-12-29 07:55:37 -05:00
Alvaro Herrera	5ab3af46dd	Remove obsolete XLogRecPtr macros This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera	2012-12-28 13:06:15 -03:00
Alvaro Herrera	eaa1f7220a	Remove unused NextLogPage macro Commit `061e7efb1b` did away with its last caller, but neglected to remove the actual definition. Author: Andres Freund	2012-12-27 18:23:23 -03:00
Simon Riggs	c2b3218064	Update comments on rd_newRelfilenodeSubid. Ensure comments accurately reflect state of code given new understanding, and recent changes. Include example code from Noah Misch to illustrate how rd_newRelfilenodeSubid can be reset deterministically. No code changes.	2012-12-24 17:07:06 +00:00
Simon Riggs	42fa810c14	Fix more weird compiler messages caused by unmatched function prototypes. Andres Freund	2012-12-24 16:25:26 +00:00
Simon Riggs	2dcb2ebee2	Add function prototype from previous commit.	2012-12-24 09:18:42 +00:00
Robert Haas	c504513f83	Adjust many backend functions to return OID rather than void. Extracted from a larger patch by Dimitri Fontaine. It is hoped that this will provide infrastructure for enriching the new event trigger functionality, but it seems possibly useful for other purposes as well.	2012-12-23 18:37:58 -05:00
Tom Lane	31bc839724	Prevent failure when RowExpr or XmlExpr is parse-analyzed twice. transformExpr() is required to cope with already-transformed expression trees, for various ugly-but-not-quite-worth-cleaning-up reasons. However, some of its newer subroutines hadn't gotten the memo. This accounts for bug #7763 from Norbert Buchmuller: transformRowExpr() was overwriting the previously determined type of a RowExpr during CREATE TABLE LIKE INCLUDING INDEXES. Additional investigation showed that transformXmlExpr had the same kind of problem, but all the other cases seem to be safe. Andres Freund and Tom Lane	2012-12-23 14:07:24 -05:00
Heikki Linnakangas	d57a97343e	Forgot to remove extern declaration of GetRecoveryTargetTLI() Fujii Masao	2012-12-21 09:29:03 +02:00
Heikki Linnakangas	af275a12df	Follow TLI of last replayed record, not recovery target TLI, in walsenders. Most of the time, the last replayed record comes from the recovery target timeline, but there is a corner case where it makes a difference. When the startup process scans for a new timeline, and decides to change recovery target timeline, there is a window where the recovery target TLI has already been bumped, but there are no WAL segments from the new timeline in pg_xlog yet. For example, if we have just replayed up to point 0/30002D8, on timeline 1, there is a WAL file called 000000010000000000000003 in pg_xlog that contains the WAL up to that point. When recovery switches recovery target timeline to 2, a walsender can immediately try to read WAL from 0/30002D8, from timeline 2, so it will try to open WAL file 000000020000000000000003. However, that doesn't exist yet - the startup process hasn't copied that file from the archive yet nor has the walreceiver streamed it yet, so walsender fails with error "requested WAL segment 000000020000000000000003 has already been removed". That's harmless, in that the standby will try to reconnect later and by that time the segment is already created, but error messages that should be ignored are not good. To fix that, have walsender track the TLI of the last replayed record, instead of the recovery target timeline. That way walsender will not try to read anything from timeline 2, until the WAL segment has been created and at least one record has been replayed from it. The recovery target timeline is now xlog.c's internal affair, it doesn't need to be exposed in shared memory anymore. This fixes the error reported by Thom Brown. depesz the same error message, but I'm not sure if this fixes his scenario.	2012-12-20 14:39:04 +02:00
Tom Lane	6919b7e329	Fix failure to ignore leftover temp tables after a server crash. During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), even if their BackendId matches our session, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit `debcec7dc3` which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.	2012-12-17 20:15:32 -05:00
Andrew Dunstan	1c382655ad	Provide Assert() for frontend code. Per discussion on-hackers. psql is converted to use the new code. Follows a suggestion from Heikki Linnakangas.	2012-12-14 18:03:07 -05:00
Heikki Linnakangas	abfd192b1b	Allow a streaming replication standby to follow a timeline switch. Before this patch, streaming replication would refuse to start replicating if the timeline in the primary doesn't exactly match the standby. The situation where it doesn't match is when you have a master, and two standbys, and you promote one of the standbys to become new master. Promoting bumps up the timeline ID, and after that bump, the other standby would refuse to continue. There's significantly more timeline related logic in streaming replication now. First of all, when a standby connects to primary, it will ask the primary for any timeline history files that are missing from the standby. The missing files are sent using a new replication command TIMELINE_HISTORY, and stored in standby's pg_xlog directory. Using the timeline history files, the standby can follow the latest timeline present in the primary (recovery_target_timeline='latest'), just as it can follow new timelines appearing in an archive directory. START_REPLICATION now takes a TIMELINE parameter, to specify exactly which timeline to stream WAL from. This allows the standby to request the primary to send over WAL that precedes the promotion. The replication protocol is changed slightly (in a backwards-compatible way although there's little hope of streaming replication working across major versions anyway), to allow replication to stop when the end of timeline reached, putting the walsender back into accepting a replication command. Many thanks to Amit Kapila for testing and reviewing various versions of this patch.	2012-12-13 19:17:32 +02:00
Heikki Linnakangas	527668717a	Make xlog_internal.h includable in frontend context. This makes unnecessary the ugly hack used to #include postgres.h in pg_basebackup. Based on Alvaro Herrera's patch	2012-12-13 14:59:13 +02:00
Kevin Grittner	b19e4250b4	Fix performance problems with autovacuum truncation in busy workloads. In situations where there are over 8MB of empty pages at the end of a table, the truncation work for trailing empty pages takes longer than deadlock_timeout, and there is frequent access to the table by processes other than autovacuum, there was a problem with the autovacuum worker process being canceled by the deadlock checking code. The truncation work done by autovacuum up that point was lost, and the attempt tried again by a later autovacuum worker. The attempts could continue indefinitely without making progress, consuming resources and blocking other processes for up to deadlock_timeout each time. This patch has the autovacuum worker checking whether it is blocking any other thread at 20ms intervals. If such a condition develops, the autovacuum worker will persist the work it has done so far, release its lock on the table, and sleep in 50ms intervals for up to 5 seconds, hoping to be able to re-acquire the lock and try again. If it is unable to get the lock in that time, it moves on and a worker will try to continue later from the point this one left off. While this patch doesn't change the rules about when and what to truncate, it does cause the truncation to occur sooner, with less blocking, and with the consumption of fewer resources when there is contention for the table's lock. The only user-visible change other than improved performance is that the table size during truncation may change incrementally instead of just once. This problem exists in all supported versions but is infrequently reported, although some reports of performance problems when autovacuum runs might be caused by this. Initial commit is just the master branch, but this should probably be backpatched once the build farm and general developer usage confirm that there are no surprising effects. Jan Wieck	2012-12-11 14:33:08 -06:00
Tom Lane	a99c42f291	Support automatically-updatable views. This patch makes "simple" views automatically updatable, without the need to create either INSTEAD OF triggers or INSTEAD rules. "Simple" views are those classified as updatable according to SQL-92 rules. The rewriter transforms INSERT/UPDATE/DELETE commands on such views directly into an equivalent command on the underlying table, which will generally have noticeably better performance than is possible with either triggers or user-written rules. A view that has INSTEAD OF triggers or INSTEAD rules continues to operate the same as before. For the moment, security_barrier views are not considered simple. Also, we do not support WITH CHECK OPTION. These features may be added in future. Dean Rasheed, reviewed by Amit Kapila	2012-12-08 18:26:21 -05:00
Alvaro Herrera	da07a1e856	Background worker processes Background workers are postmaster subprocesses that run arbitrary user-specified code. They can request shared memory access as well as backend database connections; or they can just use plain libpq frontend database connections. Modules listed in shared_preload_libraries can register background workers in their _PG_init() function; this is early enough that it's not necessary to provide an extra GUC option, because the necessary extra resources can be allocated early on. Modules can install more than one bgworker, if necessary. Care is taken that these extra processes do not interfere with other postmaster tasks: only one such process is started on each ServerLoop iteration. This means a large number of them could be waiting to be started up and postmaster is still able to quickly service external connection requests. Also, shutdown sequence should not be impacted by a worker process that's reasonably well behaved (i.e. promptly responds to termination signals.) The current implementation lets worker processes specify their start time, i.e. at what point in the server startup process they are to be started: right after postmaster start (in which case they mustn't ask for shared memory access), when consistent state has been reached (useful during recovery in a HOT standby server), or when recovery has terminated (i.e. when normal backends are allowed). In case of a bgworker crash, actions to take depend on registration data: if shared memory was requested, then all other connections are taken down (as well as other bgworkers), just like it were a regular backend crashing. The bgworker itself is restarted, too, within a configurable timeframe (which can be configured to be never). More features to add to this framework can be imagined without much effort, and have been discussed, but this seems good enough as a useful unit already. An elementary sample module is supplied. Author: Álvaro Herrera This patch is loosely based on prior patches submitted by KaiGai Kohei, and unsubmitted code by Simon Riggs. Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund, Heikki Linnakangas, Simon Riggs, Amit Kapila	2012-12-06 17:47:30 -03:00
Heikki Linnakangas	32f4de0adf	Write exact xlog position of timeline switch in the timeline history file. This allows us to do some more rigorous sanity checking for various incorrect point-in-time recovery scenarios, and provides more information for debugging purposes. It will also come handy in the upcoming patch to allow timeline switches to be replicated by streaming replication.	2012-12-04 17:29:07 +02:00
Heikki Linnakangas	5ce108bf32	Track the timeline associated with minRecoveryPoint, for more sanity checks. This allows recovery to notice certain incorrect recovery scenarios. If a server has recovered to point X on timeline 5, and you restart recovery, it better be on timeline 5 when it reaches point X again, not on some timeline with a higher ID. This can happen e.g if you a standby server is shut down, a new timeline appears in the WAL archive, and the standby server is restarted. It will try to follow the new timeline, which is wrong because some WAL on the old timeline was already replayed before shutdown. Requires an initdb (or at least pg_resetxlog), because this adds a field to the control file.	2012-12-04 11:31:00 +02:00
Peter Eisentraut	aa2fec0a18	Add support for LDAP URLs Allow specifying LDAP authentication parameters as RFC 4516 LDAP URLs.	2012-12-03 23:31:02 -05:00
Simon Riggs	f21bb9cfb5	Refactor inCommit flag into generic delayChkpt flag. Rename PGXACT->inCommit flag into delayChkpt flag, and generalise comments to allow use in other situations, such as the forthcoming potential use in checksum patch. Replace wait loop to look for VXIDs with delayChkpt set. No user visible changes, not behaviour changes at present. Simon Riggs, reviewed and rebased by Jeff Davis	2012-12-03 13:13:53 +00:00
Simon Riggs	5457a130d3	Reduce scope of changes for COPY FREEZE. Allow support only for freezing tuples by explicit command. Previous coding mistakenly extended slightly beyond what was agreed as correct on -hackers. So essentially a partial revoke of earlier work, leaving just the COPY FREEZE command.	2012-12-02 20:52:52 +00:00
Tom Lane	3114cb60a1	Don't advance checkPoint.nextXid near the end of a checkpoint sequence. This reverts commit `c11130690d` in favor of actually fixing the problem: namely, that we should never have been modifying the checkpoint record's nextXid at this point to begin with. The nextXid should match the state as of the checkpoint's logical WAL position (ie the redo point), not the state as of its physical position. It's especially bogus to advance it in some wal_levels and not others. In any case there is no need for the checkpoint record to carry the same nextXid shown in the XLOG_RUNNING_XACTS record just emitted by LogStandbySnapshot, as any replay operation will already have adopted that value as current. This fixes bug #7710 from Tarvi Pillessaar, and probably also explains bug #6291 from Daniel Farina, in that if a checkpoint were in progress at the instant of XID wraparound, the epoch bump would be lost as reported. (And, of course, these days there's at least a 50-50 chance of a checkpoint being in progress at any given instant.) Diagnosed by me and independently by Andres Freund. Back-patch to all branches supporting hot standby.	2012-12-02 15:20:41 -05:00
Simon Riggs	5c11725867	Rearrange storage of data in xl_running_xacts. Previously we stored all xids mixed together. Now we store top-level xids first, followed by all subxids. Also skip logging any subxids if the snapshot is suboverflowed, since there are potentially large numbers of them and they are not useful in that case anyway. Has value in the envisaged design for decoding of WAL. No planned effect on Hot Standby. Andres Freund, reviewed by me	2012-12-02 19:39:37 +00:00
Tom Lane	7b90469b71	Allow adding values to an enum type created in the current transaction. Normally it is unsafe to allow ALTER TYPE ADD VALUE in a transaction block, because instances of the value could be added to indexes later in the same transaction, and then they would still be accessible even if the transaction rolls back. However, we can allow this if the enum type itself was created in the current transaction, because then any such indexes would have to go away entirely on rollback. The reason for allowing this is to support pg_upgrade's new usage of pg_restore --single-transaction: in --binary-upgrade mode, pg_dump emits enum types as a succession of ALTER TYPE ADD VALUE commands so that it can preserve the values' OIDs. The support is a bit limited, so we'll leave it undocumented. Andres Freund	2012-12-01 14:27:30 -05:00
Simon Riggs	8de72b66a2	COPY FREEZE and mark committed on fresh tables. When a relfilenode is created in this subtransaction or a committed child transaction and it cannot otherwise be seen by our own process, mark tuples committed ahead of transaction commit for all COPY commands in same transaction. If FREEZE specified on COPY and pre-conditions met then rows will also be frozen. Both options designed to avoid revisiting rows after commit, increasing performance of subsequent commands after data load and upgrade. pg_restore changes later. Simon Riggs, review comments from Heikki Linnakangas, Noah Misch and design input from Tom Lane, Robert Haas and Kevin Grittner	2012-12-01 12:54:20 +00:00
Tom Lane	4af446e7cd	Produce a more useful error message for over-length Unix socket paths. The length of a socket path name is constrained by the size of struct sockaddr_un, and there's not a lot we can do about it since that is a kernel API. However, it would be a good thing if we produced an intelligible error message when the user specifies a socket path that's too long --- and getaddrinfo's standard API is too impoverished to do this in the natural way. So insert explicit tests at the places where we construct a socket path name. Now you'll get an error that makes sense and even tells you what the limit is, rather than something generic like "Non-recoverable failure in name resolution". Per trouble report from Jeremy Drake and a fix idea from Andrew Dunstan.	2012-11-29 19:57:01 -05:00
Simon Riggs	f1e57a4ec9	Cleanup VirtualXact at end of Hot Standby.	2012-11-29 21:59:11 +00:00
Robert Haas	7a2fe9bd03	Basic binary heap implementation. There are probably other places where this can be used, but for now, this just makes MergeAppend use it, so that this code will have test coverage. There is other work in the queue that will use this, as well. Abhijit Menon-Sen, reviewed by Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane, and others.	2012-11-29 11:16:59 -05:00
Tom Lane	3c84046490	Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit `8cb53654db`, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.	2012-11-28 21:26:01 -05:00
Alvaro Herrera	1577b46b7c	Split out rmgr rm_desc functions into their own files This is necessary (but not sufficient) to have them compilable outside of a backend environment.	2012-11-28 13:01:15 -03:00
Tom Lane	e78d288c89	Add explicit casts in ilist.h's inline functions. Needed to silence C++ errors, per report from Peter Eisentraut. Andres Freund	2012-11-27 10:58:37 -05:00
Heikki Linnakangas	1f67078ea3	Add OpenTransientFile, with automatic cleanup at end-of-xact. Files opened with BasicOpenFile or PathNameOpenFile are not automatically cleaned up on error. That puts unnecessary burden on callers that only want to keep the file open for a short time. There is AllocateFile, but that returns a buffered FILE * stream, which in many cases is not the nicest API to work with. So add function called OpenTransientFile, which returns a unbuffered fd that's cleaned up like the FILE* returned by AllocateFile(). This plugs a few rare fd leaks in error cases: 1. copy_file() - fixed by by using OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed by adding close() calls to the error cases. Can't use OpenTransientFile here because the fd is supposed to persist over transaction boundaries. 3. lo_import/lo_export - fixed by using OpenTransientFile instead of PathNameOpenFile. In addition to plugging those leaks, this replaces many BasicOpenFile() calls with OpenTransientFile() that were not leaking, because the code meticulously closed the file on error. That wasn't strictly necessary, but IMHO it's good for robustness. The same leaks exist in older versions, but given the rarity of the issues, I'm not backpatching this. Not yet, anyway - it might be good to backpatch later, after this mechanism has had some more testing in master branch.	2012-11-27 10:25:50 +02:00
Tom Lane	532994299e	Revert patch for taking fewer snapshots. This reverts commit `d573e239f0`, "Take fewer snapshots". While that seemed like a good idea at the time, it caused execution to use a snapshot that had been acquired before locking any of the tables mentioned in the query. This created user-visible anomalies that were not present in any prior release of Postgres, as reported by Tomas Vondra. While this whole area could do with a redesign (since there are related cases that have anomalies anyway), it doesn't seem likely that any future patch would be reasonably back-patchable; and we don't want 9.2 to exhibit a behavior that's subtly unlike either past or future releases. Hence, revert to prior code while we rethink the problem.	2012-11-26 15:55:43 -05:00
Tom Lane	d3237e04ca	Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).	2012-11-26 12:57:58 -05:00
Heikki Linnakangas	644a0a6379	Fix archive_cleanup_command. When I moved ExecuteRecoveryCommand() from xlog.c to xlogarchive.c, I didn't realize that it's called from the checkpoint process, not the startup process. I tried to use InRedo variable to decide whether or not to attempt cleaning up the archive (must not do so before we have read the initial checkpoint record), but that variable is only valid within the startup process. Instead, let ExecuteRecoveryCommand() always clean up the archive, and add an explicit argument to RestoreArchivedFile() to say whether that's allowed or not. The caller knows better. Reported by Erik Rijkers, diagnosis by Fujii Masao. Only 9.3devel is affected.	2012-11-19 10:14:20 +02:00
Tom Lane	3bbf668de9	Fix multiple problems in WAL replay. Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.	2012-11-12 22:05:53 -05:00
Heikki Linnakangas	dbdf9679d7	Use correct text domain for translating errcontext() messages. errcontext() is typically used in an error context callback function, not within an ereport() invocation like e.g errmsg and errdetail are. That means that the message domain that the TEXTDOMAIN magic in ereport() determines is not the right one for the errcontext() calls. The message domain needs to be determined by the C file containing the errcontext() call, not the file containing the ereport() call. Fix by turning errcontext() into a macro that passes the TEXTDOMAIN to use for the errcontext message. "errcontext" was used in a few places as a variable or struct field name, I had to rename those out of the way, now that errcontext is a macro. We've had this problem all along, but this isn't doesn't seem worth backporting. It's a fairly minor issue, and turning errcontext from a function to a macro requires at least a recompile of any external code that calls errcontext().	2012-11-12 17:07:29 +02:00
Heikki Linnakangas	c9d44a75d4	Silence "expression result unused" warnings in AssertVariableIsOfTypeMacro At least clang 3.1 generates those warnings. Prepend (void) to avoid them, like we have in AssertMacro.	2012-11-12 15:02:40 +02:00
Tom Lane	3e7fdcffd6	Fix WaitLatch() to return promptly when the requested timeout expires. If the sleep is interrupted by a signal, we must recompute the remaining time to wait; otherwise, a steady stream of non-wait-terminating interrupts could delay return from WaitLatch indefinitely. This has been shown to be a problem for the autovacuum launcher, and there may well be other places now or in the future with similar issues. So we'd better make the function robust, even though this'll add at least one gettimeofday call per wait. Back-patch to 9.2. We might eventually need to fix 9.1 as well, but the code is quite different there, and the usage of WaitLatch in 9.1 is so limited that it's not clearly important to do so. Reported and diagnosed by Jeff Janes, though I rewrote his patch rather heavily.	2012-11-08 20:04:48 -05:00
Tom Lane	dcc55dd21a	Rename ResolveNew() to ReplaceVarsFromTargetList(), and tweak its API. This function currently lacks the option to throw error if the provided targetlist doesn't have any matching entry for a Var to be replaced. Two of the four existing call sites would be better off with an error, as would the usage in the pending auto-updatable-views patch, so it seems past time to extend the API to support that. To do so, replace the "event" parameter (historically of type CmdType, though it was declared plain int) with a special-purpose enum type. It's unclear whether this function might be called by third-party code. Since many C compilers wouldn't warn about a call site continuing to use the old calling convention, rename the function to forcibly break any such code that hasn't been updated. The old name was none too well chosen anyhow.	2012-11-08 16:52:49 -05:00
Bruce Momjian	aa69670e42	Add URLs to document why DLLIMPORT is needed on Windows. Per email from Craig Ringer	2012-11-07 15:01:25 -05:00
Heikki Linnakangas	add6c3179a	Make the streaming replication protocol messages architecture-independent. We used to send structs wrapped in CopyData messages, which works as long as the client and server agree on things like endianess, timestamp format and alignment. That's good enough for running a standby server, which has to run on the same platform anyway, but it's useful for tools like pg_receivexlog to work across platforms. This breaks protocol compatibility of streaming replication, but we never promised that to be compatible across versions, anyway.	2012-11-07 19:09:13 +02:00
Tom Lane	5ed6546cf7	Fix handling of inherited check constraints in ALTER COLUMN TYPE. This case got broken in 8.4 by the addition of an error check that complains if ALTER TABLE ONLY is used on a table that has children. We do use ONLY for this situation, but it's okay because the necessary recursion occurs at a higher level. So we need to have a separate flag to suppress recursion without making the error check. Reported and patched by Pavan Deolasee, with some editorial adjustments by me. Back-patch to 8.4, since this is a regression of functionality that worked in earlier branches.	2012-11-05 13:36:16 -05:00
Alvaro Herrera	04f28bdb84	Fix ALTER EXTENSION / SET SCHEMA In its original conception, it was leaving some objects into the old schema, but without their proper pg_depend entries; this meant that the old schema could be dropped, causing future pg_dump calls to fail on the affected database. This was originally reported by Jeff Frost as #6704; there have been other complaints elsewhere that can probably be traced to this bug. To fix, be more consistent about altering a table's subsidiary objects along the table itself; this requires some restructuring in how tables are relocated when altering an extension -- hence the new AlterTableNamespaceInternal routine which encapsulates it for both the ALTER TABLE and the ALTER EXTENSION cases. There was another bug lurking here, which was unmasked after fixing the previous one: certain objects would be reached twice via the dependency graph, and the second attempt to move them would cause the entire operation to fail. Per discussion, it seems the best fix for this is to do more careful tracking of objects already moved: we now maintain a list of moved objects, to avoid attempting to do it twice for the same object. Authors: Alvaro Herrera, Dimitri Fontaine Reviewed by Tom Lane	2012-10-31 10:52:55 -03:00
Kevin Grittner	6868ed7491	Throw error if expiring tuple is again updated or deleted. This prevents surprising behavior when a FOR EACH ROW trigger BEFORE UPDATE or BEFORE DELETE directly or indirectly updates or deletes the the old row. Prior to this patch the requested action on the row could be silently ignored while all triggered actions based on the occurence of the requested action could be committed. One example of how this could happen is if the BEFORE DELETE trigger for a "parent" row deleted "children" which had trigger functions to update summary or status data on the parent. This also prevents similar surprising problems if the query has a volatile function which updates a target row while it is already being updated. There are related issues present in FOR UPDATE cursors and READ COMMITTED queries which are not handled by this patch. These issues need further evalution to determine what change, if any, is needed. Where the new error messages are generated, in most cases the best fix will be to move code from the BEFORE trigger to an AFTER trigger. Where this is not feasible, the trigger can avoid the error by re-issuing the triggering statement and returning NULL. Documentation changes will be submitted in a separate patch. Kevin Grittner and Tom Lane with input from Florian Pflug and Robert Haas, based on problems encountered during conversion of Wisconsin Circuit Court trigger logic to plpgsql triggers.	2012-10-26 14:55:36 -05:00
Tom Lane	a4e8680a6c	When converting a table to a view, remove its system columns. Views should not have any pg_attribute entries for system columns. However, we forgot to remove such entries when converting a table to a view. This could lead to crashes later on, if someone attempted to reference such a column, as reported by Kohei KaiGai. Patch in HEAD only. This bug has been there forever, but in the back branches we will have to defend against existing mis-converted views, so it doesn't seem worthwhile to change the conversion code too.	2012-10-24 13:39:37 -04:00
Alvaro Herrera	f4c4335a4a	Add context info to OAT_POST_CREATE security hook ... and have sepgsql use it to determine whether to check permissions during certain operations. Indexes that are being created as a result of REINDEX, for instance, do not need to have their permissions checked; they were already checked when the index was created. Author: KaiGai Kohei, slightly revised by me	2012-10-23 18:24:24 -03:00
Tom Lane	dc5aeca168	Remove unnecessary "head" arguments from some dlist/slist functions. dlist_delete, dlist_insert_after, dlist_insert_before, slist_insert_after do not need access to the list header, and indeed insisting on that negates one of the main advantages of a doubly-linked list. In consequence, revert addition of "cache_bucket" field to CatCTup.	2012-10-18 19:04:20 -04:00
Tom Lane	8f8d746478	Code review for inline-list patch. Make foreach macros less syntactically dangerous, and fix some typos in evidently-never-tested ones. Add missing slist_next_node and slist_head_node functions. Fix broken dlist_check code. Assorted comment improvements.	2012-10-18 16:47:07 -04:00
Tom Lane	72a4231f0c	Fix planning of non-strict equivalence clauses above outer joins. If a potential equivalence clause references a variable from the nullable side of an outer join, the planner needs to take care that derived clauses are not pushed to below the outer join; else they may use the wrong value for the variable. (The problem arises only with non-strict clauses, since if an upper clause can be proven strict then the outer join will get simplified to a plain join.) The planner attempted to prevent this type of error by checking that potential equivalence clauses aren't outerjoin-delayed as a whole, but actually we have to check each side separately, since the two sides of the clause will get moved around separately if it's treated as an equivalence. Bugs of this type can be demonstrated as far back as 7.4, even though releases before 8.3 had only a very ad-hoc notion of equivalence clauses. In addition, we neglected to account for the possibility that such clauses might have nonempty nullable_relids even when not outerjoin-delayed; so the equivalence-class machinery lacked logic to compute correct nullable_relids values for clauses it constructs. This oversight was harmless before 9.2 because we were only using RestrictInfo.nullable_relids for OR clauses; but as of 9.2 it could result in pushing constructed equivalence clauses to incorrect places. (This accounts for bug #7604 from Bill MacArthur.) Fix the first problem by adding a new test check_equivalence_delay() in distribute_qual_to_rels, and fix the second one by adding code in equivclass.c and called functions to set correct nullable_relids for generated clauses. Although I believe the second part of this is not currently necessary before 9.2, I chose to back-patch it anyway, partly to keep the logic similar across branches and partly because it seems possible we might find other reasons why we need valid values of nullable_relids in the older branches. Add regression tests illustrating these problems. In 9.0 and up, also add test cases checking that we can push constants through outer joins, since we've broken that optimization before and I nearly broke it again with an overly simplistic patch for this problem.	2012-10-18 12:30:10 -04:00
Tom Lane	ff3f9c8de5	Close un-owned SMgrRelations at transaction end. If an SMgrRelation is not "owned" by a relcache entry, don't allow it to live past transaction end. This design allows the same SMgrRelation to be used for blind writes of multiple blocks during a transaction, but ensures that we don't hold onto such an SMgrRelation indefinitely. Because an SMgrRelation typically corresponds to open file descriptors at the fd.c level, leaving it open when there's no corresponding relcache entry can mean that we prevent the kernel from reclaiming deleted disk space. (While CacheInvalidateSmgr messages usually fix that, there are cases where they're not issued, such as DROP DATABASE. We might want to add some more sinval messaging for that, but I'd be inclined to keep this type of logic anyway, since allowing VFDs to accumulate indefinitely for blind-written relations doesn't seem like a good idea.) This code replaces a previous attempt towards the same goal that proved to be unreliable. Back-patch to 9.1 where the previous patch was added.	2012-10-17 12:38:21 -04:00
Tom Lane	9bacf0e373	Revert "Use "transient" files for blind writes, take 2". This reverts commit `fba105b109`. That approach had problems with the smgr-level state not tracking what we really want to happen, and with the VFD-level state not tracking the smgr-level state very well either. In consequence, it was still possible to hold kernel file descriptors open for long-gone tables (as in recent report from Tore Halset), and yet there were also cases of FDs being closed undesirably soon. A replacement implementation will follow.	2012-10-17 12:37:08 -04:00
Alvaro Herrera	a66ee69add	Embedded list interface Provide a common implementation of embedded singly-linked and doubly-linked lists. "Embedded" in the sense that the nodes' next/previous pointers exist within some larger struct; this design choice reduces memory allocation overhead. Most of the implementation uses inlineable functions (where supported), for performance. Some existing uses of both types of lists have been converted to the new code, for demonstration purposes. Other uses can (and probably will) be converted in the future. Since dllist.c is unused after this conversion, it has been removed. Author: Andres Freund Some tweaks by me Reviewed by Tom Lane, Peter Geoghegan	2012-10-17 11:31:20 -03:00
Heikki Linnakangas	ff6c78c480	Remove comment that is no longer true. AddToDataDirLockFile() supports out-of-order updates of the lockfile nowadays.	2012-10-15 11:03:39 +03:00
Tom Lane	e81e8f9342	Split up process latch initialization for more-fail-soft behavior. In the previous coding, new backend processes would attempt to create their self-pipe during the OwnLatch call in InitProcess. However, pipe creation could fail if the kernel is short of resources; and the system does not recover gracefully from a FATAL error right there, since we have armed the dead-man switch for this process and not yet set up the on_shmem_exit callback that would disarm it. The postmaster then forces an unnecessary database-wide crash and restart, as reported by Sean Chittenden. There are various ways we could rearrange the code to fix this, but the simplest and sanest seems to be to split out creation of the self-pipe into a new function InitializeLatchSupport, which must be called from a place where failure is allowed. For most processes that gets called in InitProcess or InitAuxiliaryProcess, but processes that don't call either but still use latches need their own calls. Back-patch to 9.1, which has only a part of the latch logic that 9.2 and HEAD have, but nonetheless includes this bug.	2012-10-14 22:59:56 -04:00
Tom Lane	a29f7ed554	Get rid of COERCE_DONTCARE. We don't need this hack any more.	2012-10-12 13:35:00 -04:00
Tom Lane	71e58dcfb9	Make equal() ignore CoercionForm fields for better planning with casts. This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.	2012-10-12 12:11:22 -04:00
Heikki Linnakangas	6f60fdd701	Improve replication connection timeouts. Rename replication_timeout to wal_sender_timeout, and add a new setting called wal_receiver_timeout that does the same at the walreceiver side. There was previously no timeout in walreceiver, so if the network went down, for example, the walreceiver could take a long time to notice that the connection was lost. Now with the two settings, both sides of a replication connection will detect a broken connection similarly. It is no longer necessary to manually set wal_receiver_status_interval to a value smaller than the timeout. Both wal sender and receiver now automatically send a "ping" message if more than 1/2 of the configured timeout has elapsed, and it hasn't received any messages from the other end. Amit Kapila, heavily edited by me.	2012-10-11 17:48:08 +03:00
Tom Lane	a80889a735	Set procost to 10 for each of the pg_foo_is_visible() functions. The idea here is to make sure the planner will evaluate these functions last not first among the filter conditions in psql pattern search and tab-completion queries. We've discussed this several times, and there was consensus to do it back in August, but we didn't want to do it just before a release. Now seems like a safer time. No catversion bump, since this catalog change doesn't create a backend incompatibility nor any regression test result changes.	2012-10-10 12:19:25 -04:00
Tom Lane	7e0cce0265	Remove unnecessary overhead in backend's large-object operations. Do read/write permissions checks at most once per large object descriptor, not once per lo_read or lo_write call as before. The repeated tests were quite useless in the read case since the snapshot-based tests were guaranteed to produce the same answer every time. In the write case, the extra tests could in principle detect revocation of write privileges after a series of writes has started --- but there's a race condition there anyway, since we'd check privileges before performing and certainly before committing the write. So there's no real advantage to checking every single time, and we might as well redefine it as "only check the first time". On the same reasoning, remove the LargeObjectExists checks in inv_write and inv_truncate. We already checked existence when the descriptor was opened, and checking again doesn't provide any real increment of safety that would justify the cost.	2012-10-09 16:38:00 -04:00
Alvaro Herrera	f46baf601d	Rename USE_INLINE to PG_USE_INLINE The former name was too likely to conflict with symbols from external headers; and, as seen in recent buildfarm failures in member spoonbill, it has now happened at least in plpython.	2012-10-09 11:17:33 -03:00
Tom Lane	26fe56481c	Code review for 64-bit-large-object patch. Fix broken-on-bigendian-machines byte-swapping functions, add missed update of alternate regression expected file, improve error reporting, remove some unnecessary code, sync testlo64.c with current testlo.c (it seems to have been cloned from a very old copy of that), assorted cosmetic improvements.	2012-10-08 18:24:32 -04:00
Alvaro Herrera	976fa10d20	Add support for easily declaring static inline functions We already had those, but they forced modules to spell out the function bodies twice. Eliminate some duplicates we had already grown. Extracted from a somewhat larger patch from Andres Freund.	2012-10-08 16:28:01 -03:00
Robert Haas	08c8058ce9	Add #define for UUIDOID. Phil Sorber and Thom Brown. Reviewed by Albe Laurenz.	2012-10-08 10:15:15 -04:00
Tom Lane	95d035e66d	Autoconfiscate selection of 64-bit int type for 64-bit large object API. Get rid of the fundamentally indefensible assumption that "long long int" exists and is exactly 64 bits wide on every platform Postgres runs on. Instead let the configure script select the type to use for "pg_int64". This is a bit of a pain in the rear since we do not want to pollute client namespace with all the random symbols that pg_config.h defines; instead we have to create a separate generated header file, "pg_config_ext.h". But now that the infrastructure is there, we might have the ability to add some other stuff that's long been wanting in this area.	2012-10-07 21:52:43 -04:00
Tatsuo Ishii	7e2f8ed2b0	Fix compiling errors on Windows platform. Fix wrong usage of INT64CONST macro. Fix lo_hton64 and lo_ntoh64 not to use int32_t and uint32_t.	2012-10-07 23:30:31 +09:00
Tatsuo Ishii	b51a65f5bf	Bump up catalog vesion due to 64-bit large object API functions addition.	2012-10-07 09:36:20 +09:00
Tatsuo Ishii	461ef73f09	Add API for 64-bit large object access. Now users can access up to 4TB large objects (standard 8KB BLCKSZ case). For this purpose new libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also corresponding new backend functions lo_lseek64, lo_tell64 and lo_truncate64 are added. inv_api.c is changed to handle 64-bit offsets. Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata (frontend side, docs, regression tests and example program). Reviewed by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.	2012-10-07 08:36:48 +09:00
Heikki Linnakangas	fd5942c18f	Use the regular main processing loop also in walsenders. The regular backend's main loop handles signal handling and error recovery better than the current WAL sender command loop does. For example, if the client hangs and a SIGTERM is received before starting streaming, the walsender will now terminate immediately, rather than hang until the connection times out.	2012-10-05 17:21:12 +03:00
Tom Lane	fb34e94d21	Support CREATE SCHEMA IF NOT EXISTS. Per discussion, schema-element subcommands are not allowed together with this option, since it's not very obvious what should happen to the element objects. Fabrízio de Royes Mello	2012-10-03 19:47:11 -04:00
Alvaro Herrera	994c36e01d	refactor ALTER some-obj SET OWNER implementation Remove duplicate implementation of catalog munging and miscellaneous privilege and consistency checks. Instead rely on already existing data in objectaddress.c to do the work. Author: KaiGai Kohei Tweaked by me Reviewed by Robert Haas	2012-10-03 18:07:46 -03:00
Alvaro Herrera	2164f9a125	Refactor "ALTER some-obj SET SCHEMA" implementation Instead of having each object type implement the catalog munging independently, centralize knowledge about how to do it and expand the existing table in objectaddress.c with enough data about each object type to support this operation. Author: KaiGai Kohei Tweaks by me Reviewed by Robert Haas	2012-10-02 18:13:54 -03:00
Heikki Linnakangas	d5497b95f3	Split off functions related to timeline history files and XLOG archiving. This is just refactoring, to make the functions accessible outside xlog.c. A followup patch will make use of that, to allow fetching timeline history files over streaming replication.	2012-10-02 13:37:19 +03:00
Tom Lane	0d0aa5d291	Provide some static-assertion functionality on all compilers. On reflection (especially after noticing how many buildfarm critters have __builtin_types_compatible_p but not _Static_assert), it seems like we ought to try a bit harder to make these macros do something everywhere. The initial cut at it would have been no help to code that is compiled only on platforms without _Static_assert, for instance; and in any case not all our contributors do their initial coding on the latest gcc version. Some googling about static assertions turns up quite a bit of prior art for making it work in compilers that lack _Static_assert. The method that seems closest to our needs involves defining a struct with a bit-field that has negative width if the assertion condition fails. There seems no reliable way to get the error message string to be output, but throwing a compile error with a confusing message is better than missing the problem altogether. In the same spirit, if we don't have __builtin_types_compatible_p we can at least insist that the variable have the same width as the type. This won't catch errors such as "wrong pointer type", but it's far better than nothing. In addition to changing the macro definitions, adjust a compile-time-constant Assert in contrib/hstore to use StaticAssertStmt, so we can get some buildfarm coverage on whether that macro behaves sanely or not. There's surely more places that could be converted, but this is the first one I came across.	2012-09-30 22:46:29 -04:00
Tom Lane	ea473fb2de	Add infrastructure for compile-time assertions about variable types. Currently, the macros only work with fairly recent gcc versions, but there is room to expand them to other compilers that have comparable features. Heavily revised and autoconfiscated version of a patch by Andres Freund.	2012-09-30 14:38:31 -04:00
Andrew Dunstan	6e9876dc32	Remove checks for now long outdated compilers.	2012-09-28 19:43:50 -04:00
Tom Lane	70bc583319	Fix btmarkpos/btrestrpos to handle array keys. This fixes another error in commit `9e8da0f757`. I neglected to make the mark/restore functionality save and restore the current set of array key values, which led to strange behavior if an IndexScan with ScalarArrayOpExpr quals was used as the inner side of a mergejoin. Per bug #7570 from Melese Tesfaye.	2012-09-27 17:01:02 -04:00
Heikki Linnakangas	2a0c81a12c	Add support for include_dir in config file. This allows easily splitting configuration into many files, deployed in a directory. Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.	2012-09-24 18:07:53 +03:00
Tom Lane	31510194cc	Minor corrections for ALTER TYPE ADD VALUE IF NOT EXISTS patch. Produce a NOTICE when the label already exists, for consistency with other CREATE IF NOT EXISTS commands. Also, fix the code so it produces something more user-friendly than an index violation when the label already exists. This not incidentally enables making a regression test that the previous patch didn't make for fear of exposing an unpredictable OID in the results. Also some wordsmithing on the documentation.	2012-09-22 18:35:22 -04:00
Andrew Dunstan	6d12b68cd7	Allow IF NOT EXISTS when add a new enum label. If the label is already in the enum the statement becomes a no-op. This will reduce the pain that comes from our not allowing this operation inside a transaction block. Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.	2012-09-22 12:53:31 -04:00
Tom Lane	11e131854f	Improve ruleutils.c's heuristics for dealing with rangetable aliases. The previous scheme had bugs in some corner cases involving tables that had been renamed since a view was made. This could result in dumped views that failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd Albin, as well as in some pgsql-hackers discussion back in January. Also, its behavior for printing EXPLAIN plans was sometimes confusing because of willingness to use the same alias for multiple RTEs (it was Ashutosh Bapat's complaint about that aspect that started the January thread). To fix, ensure that each RTE in the query has a unique unqualified alias, by modifying the alias if necessary (we add "_" and digits as needed to create a non-conflicting name). Then we can just print its variables with that alias, avoiding the confusing and bug-prone scheme of sometimes schema-qualifying variable names. In EXPLAIN, it proves to be expedient to take the further step of only assigning such aliases to RTEs that are actually referenced in the query, since the planner has a habit of generating extra RTEs with the same alias in situations such as inheritance-tree expansion. Although this fixes a bug of very long standing, I'm hesitant to back-patch such a noticeable behavioral change. My experiments while creating a regression test convinced me that actually incorrect output (as opposed to confusing output) occurs only in very narrow cases, which is backed up by the lack of previous complaints from the field. So we may be better off living with it in released branches; and in any case it'd be smart to let this ripen awhile in HEAD before we consider back-patching it.	2012-09-21 19:03:10 -04:00
Heikki Linnakangas	7c45e3a3c6	Parse pg_ident.conf when it's loaded, keeping it in memory in parsed format. Similar changes were done to pg_hba.conf earlier already, this commit makes pg_ident.conf to behave the same as pg_hba.conf. This has two user-visible effects. First, if pg_ident.conf contains multiple errors, the whole file is parsed at postmaster startup time and all the errors are immediately reported. Before this patch, the file was parsed and the errors were reported only when someone tries to connect using an authentication method that uses the file, and the parsing stopped on first error. Second, if you SIGHUP to reload the config files, and the new pg_ident.conf file contains an error, the error is logged but the old file stays in effect. Also, regular expressions in pg_ident.conf are now compiled only once when the file is loaded, rather than every time the a user is authenticated. That should speed up authentication if you have a lot of regexps in the file. Amit Kapila	2012-09-21 17:54:39 +03:00
Alvaro Herrera	22c734fcdb	Remove execdesc.h inclusion from tcopprot.h	2012-09-20 11:07:59 -03:00
Tom Lane	9a93e71008	Fix a couple other leftover uses of 'conisonly' terminology.	2012-09-12 15:12:24 -04:00
Tom Lane	46c508fbcf	Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH. The planner previously assumed that parameter Vars having the same absolute query level, varno, and varattno could safely be assigned the same runtime PARAM_EXEC slot, even though they might be different Vars appearing in different subqueries. This was (probably) safe before the introduction of CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can be executed during execution of some other subquery, causing the lifespan of Params at the same syntactic nesting level as the CTE to overlap with use of the same slots inside the CTE. In 9.1 we created additional hazards by using the same parameter-assignment technology for nestloop inner scan parameters, but it was broken before that, as illustrated by the added regression test. To fix, restructure the planner's management of PlannerParamItems so that items having different semantic lifespans are kept rigorously separated. This will probably result in complex queries using more runtime PARAM_EXEC slots than before, but the slots are cheap enough that this hardly matters. Also, stop generating PlannerParamItems containing Params for subquery outputs: all we really need to do is reserve the PARAM_EXEC slot number, and that now only takes incrementing a counter. The planning code is simpler and probably faster than before, as well as being more correct. Per report from Vik Reykja. These changes will mostly also need to be made in the back branches, but I'm going to hold off on that until after 9.2.0 wraps.	2012-09-05 12:55:01 -04:00
Alvaro Herrera	e20a90e188	Trim spgist_private.h inclusion It doesn't really need rel.h; relcache.h is enough.	2012-09-05 11:06:51 -03:00
Heikki Linnakangas	c4c227477b	Fix bugs in cascading replication with recovery_target_timeline='latest' The cascading replication code assumed that the current RecoveryTargetTLI never changes, but that's not true with recovery_target_timeline='latest'. The obvious upshot of that is that RecoveryTargetTLI in shared memory needs to be protected by a lock. A less obvious consequence is that when a cascading standby is connected, and the standby switches to a new target timeline after scanning the archive, it will continue to stream WAL to the cascading standby, but from a wrong file, ie. the file of the previous timeline. For example, if the standby is currently streaming from the middle of file 000000010000000000000005, and the timeline changes, the standby will continue to stream from that file. However, the WAL on the new timeline is in file 000000020000000000000005, so the standby sends garbage from 000000010000000000000005 to the cascading standby, instead of the correct WAL from file 000000020000000000000005. This also fixes a related bug where a partial WAL segment is restored from the archive and streamed to a cascading standby. The code assumed that when a WAL segment is copied from the archive, it can immediately be fully streamed to a cascading standby. However, if the segment is only partially filled, ie. has the right size, but only N first bytes contain valid WAL, that's not safe. That can happen if a partial WAL segment is manually copied to the archive, or if a partial WAL segment is archived because a server is started up on a new timeline within that segment. The cascading standby will get confused if the WAL it received is not valid, and will get stuck until it's restarted. This patch fixes that problem by not allowing WAL restored from the archive to be streamed to a cascading standby until it's been replayed, and thus validated.	2012-09-04 19:33:21 -07:00
Bruce Momjian	015722fb36	Fix to_date() and to_timestamp() to allow specification of the day of the week via ISO or Gregorian designations. The fix is to store the day-of-week consistently as 1-7, Sunday = 1. Fixes bug reported by Marc Munro	2012-09-03 22:52:44 -04:00
Tom Lane	6d2c8c0e2a	Drop cheap-startup-cost paths during add_path() if we don't need them. We can detect whether the planner top level is going to care at all about cheap startup cost (it will only do so if query_planner's tuple_fraction argument is greater than zero). If it isn't, we might as well discard paths immediately whose only advantage over others is cheap startup cost. This turns out to get rid of quite a lot of paths in complex queries --- I saw planner runtime reduction of more than a third on one large query. Since add_path isn't currently passed the PlannerInfo "root", the easiest way to tell it whether to do this was to add a bool flag to RelOptInfo. That's a bit redundant, since all relations in a given query level will have the same setting. But in the future it's possible that we'd refine the control decision to work on a per-relation basis, so this seems like a good arrangement anyway. Per my suggestion of a few months ago.	2012-09-01 18:16:24 -04:00
Tom Lane	58a031f920	Make configure probe for mbstowcs_l as well as wcstombs_l. We previously supposed that any given platform would supply both or neither of these functions, so that one configure test would be sufficient. It now appears that at least on AIX this is not the case ... which is likely an AIX bug, but nonetheless we need to cope with it. So use separate tests. Per bug #6758; thanks to Andrew Hastie for doing the followup testing needed to confirm what was happening. Backpatch to 9.1, where we began using these functions.	2012-08-31 14:17:56 -04:00
Alvaro Herrera	c219d9b0a5	Split tuple struct defs from htup.h to htup_details.h This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.	2012-08-30 16:52:35 -04:00
Tom Lane	77387f0ac8	Suppress creation of backwardly-indexed paths for LATERAL join clauses. Given a query such as SELECT * FROM foo JOIN LATERAL (SELECT foo.var1) ss(x) ON ss.x = foo.var2 the existence of the join clause "ss.x = foo.var2" encourages indxpath.c to build a parameterized path for foo using any index available for foo.var2. This is completely useless activity, though, since foo has got to be on the outside not the inside of any nestloop join with ss. It's reasonably inexpensive to add tests that prevent creation of such paths, so let's do that.	2012-08-30 14:33:00 -04:00
Tom Lane	e83bb10d6d	Adjust definition of cheapest_total_path to work better with LATERAL. In the initial cut at LATERAL, I kept the rule that cheapest_total_path was always unparameterized, which meant it had to be NULL if the relation has no unparameterized paths. It turns out to work much more nicely if we always have some path nominated as cheapest-total for each relation. In particular, let's still say it's the cheapest unparameterized path if there is one; if not, take the cheapest-total-cost path among those of the minimum available parameterization. (The first rule is actually a special case of the second.) This allows reversion of some temporary lobotomizations I'd put in place. In particular, the planner can now consider hash and merge joins for joins below a parameter-supplying nestloop, even if there aren't any unparameterized paths available. This should bring planning of LATERAL-containing queries to the same level as queries not using that feature. Along the way, simplify management of parameterized paths in add_path() and friends. In the original coding for parameterized paths in 9.2, I tried to minimize the logic changes in add_path(), so it just treated parameterization as yet another dimension of comparison for paths. We later made it ignore pathkeys (sort ordering) of parameterized paths, on the grounds that ordering isn't a useful property for the path on the inside of a nestloop, so we might as well get rid of useless parameterized paths as quickly as possible. But we didn't take that reasoning as far as we should have. Startup cost isn't a useful property inside a nestloop either, so add_path() ought to discount startup cost of parameterized paths as well. Having done that, the secondary sorting I'd implemented (in add_parameterized_path) is no longer needed --- any parameterized path that survives add_path() at all is worth considering at higher levels. So this should be a bit faster as well as simpler.	2012-08-29 22:06:07 -04:00
Alvaro Herrera	21c09e99dc	Split heapam_xlog.h from heapam.h The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.	2012-08-28 19:02:00 -04:00
Alvaro Herrera	fda0594fc2	remove catcache.h from syscache.h Instead, place a forward struct declaration for struct catclist in syscache.h. This reduces header proliferation somewhat.	2012-08-28 18:36:39 -04:00
Alvaro Herrera	45326c5a11	Split resowner.h This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.	2012-08-28 18:02:07 -04:00
Alvaro Herrera	095e6c5a7d	syncrep.h must include xlogdefs.h	2012-08-28 09:46:08 -04:00
Heikki Linnakangas	918eee0c49	Collect and use histograms of lower and upper bounds for range types. This enables selectivity estimation of the <<, >>, &<, &> and && operators, as well as the normal inequality operators: <, <=, >=, >. "range @> element" is also supported, but the range-variant @> and <@ operators are not, because they cannot be sensibly estimated with lower and upper bound histograms alone. We would need to make some assumption about the lengths of the ranges for that. Alexander's patch included a separate histogram of lengths for that, but I left that out of the patch for simplicity. Hopefully that will be added as a followup patch. The fraction of empty ranges is also calculated and used in estimation. Alexander Korotkov, heavily modified by me.	2012-08-27 15:58:46 +03:00
Tom Lane	9ff79b9d4e	Fix up planner infrastructure to support LATERAL properly. This patch takes care of a number of problems having to do with failure to choose valid join orders and incorrect handling of lateral references pulled up from subqueries. Notable changes: * Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to represent join ordering constraints created by lateral references. (I first considered extending the SpecialJoinInfo structure, but the semantics are different enough that a separate data structure seems better.) Extend join_is_legal() and related functions to prevent trying to form unworkable joins, and to ensure that we will consider joins that satisfy lateral references even if the joins would be clauseless. * Fill in the infrastructure needed for the last few types of relation scan paths to support parameterization. We'd have wanted this eventually anyway, but it is necessary now because a relation that gets pulled up out of a UNION ALL subquery may acquire a reltargetlist containing lateral references, meaning that its paths have to be parameterized whether or not we have any code that can push join quals down into the scan. * Compute data about lateral references early in query_planner(), and save in RelOptInfo nodes, to avoid repetitive calculations later. * Assorted corner-case bug fixes. There's probably still some bugs left, but this is a lot closer to being real than it was before.	2012-08-26 22:50:23 -04:00
Peter Eisentraut	5c45d2f878	Mark DateTimeParseError() noreturn This avoids a warning from clang 3.2 about an uninitialized variable 'dtype' in date_in().	2012-08-21 23:32:58 -04:00
Peter Eisentraut	71450d7fd6	Teach compiler that ereport(>=ERROR) does not return When elevel >= ERROR, we add an abort() call to the ereport() macro to give the compiler a hint that the ereport() expansion will not return, but the abort() isn't actually reached because the longjmp happens in errfinish(). Because the effect of ereport() varies with the elevel, we cannot use standard compiler attributes such as noreturn for this.	2012-08-21 00:03:32 -04:00
Tom Lane	092d7ded29	Allow OLD and NEW in multi-row VALUES within rules. Now that we have LATERAL, it's fairly painless to allow this case, which was left as a TODO in the original multi-row VALUES implementation.	2012-08-19 14:12:16 -04:00
Tom Lane	470d0b9789	Check LIBXML_VERSION instead of testing in configure script. We had put a test for libxml2's xmlStructuredErrorContext variable in configure, but of course that doesn't work on Windows builds. The next best alternative seems to be to test the LIBXML_VERSION symbol provided by xmlversion.h. Per report from Talha Bin Rizwan, though this fixes it in a different way than his proposed patch.	2012-08-17 00:05:26 -04:00
Heikki Linnakangas	317dd55a9c	Add SP-GiST support for range types. The implementation is a quad-tree, largely copied from the quad-tree implementation for points. The lower and upper bound of ranges are the 2d coordinates, with some extra code to handle empty ranges. I left out the support for adjacent operator, -\|-, from the original patch. Not because there was necessarily anything wrong with it, but it was more complicated than the other operators, and I only have limited time for reviewing. That will follow as a separate patch. Alexander Korotkov, reviewed by Jeff Davis and me.	2012-08-16 14:30:45 +03:00
Heikki Linnakangas	89911b3ab8	Fix GiST buffering build bug, which caused "failed to re-find parent" errors. We use a hash table to track the parents of inner pages, but when inserting to a leaf page, the caller of gistbufferinginserttuples() must pass a correct block number of the leaf's parent page. Before gistProcessItup() descends to a child page, it checks if the downlink needs to be adjusted to accommodate the new tuple, and updates the downlink if necessary. However, updating the downlink might require splitting the page, which might move the downlink to a page to the right. gistProcessItup() doesn't realize that, so when it descends to the leaf page, it might pass an out-of-date parent block number as a result. Fix that by returning the block a tuple was inserted to from gistbufferinginserttuples(). This fixes the bug reported by Zdeněk Jílovec.	2012-08-16 12:56:24 +03:00
Tom Lane	c1774d2c81	More fixes for planner's handling of LATERAL. Re-allow subquery pullup for LATERAL subqueries, except when the subquery is below an outer join and contains lateral references to relations outside that outer join. If we pull up in such a case, we risk introducing lateral cross-references into outer joins' ON quals, which is something the code is entirely unprepared to cope with right now; and I'm not sure it'll ever be worth coping with. Support lateral refs in VALUES (this seems to be the only additional path type that needs such support as a consequence of re-allowing subquery pullup). Put in a slightly hacky fix for joinpath.c's refusal to consider parameterized join paths even when there cannot be any unparameterized ones. This was causing "could not devise a query plan for the given query" failures in queries involving more than two FROM items. Put in an even more hacky fix for distribute_qual_to_rels() being unhappy with join quals that contain references to rels outside their syntactic scope; which is to say, disable that test altogether. Need to think about how to preserve some sort of debugging cross-check here, while not expending more cycles than befits a debugging cross-check.	2012-08-12 16:01:26 -04:00
Tom Lane	b53800355f	Fix dependencies generated during ALTER TABLE ADD CONSTRAINT USING INDEX. This command generated new pg_depend entries linking the index to the constraint and the constraint to the table, which match the entries made when a unique or primary key constraint is built de novo. However, it did not bother to get rid of the entries linking the index directly to the table. We had considered the issue when the ADD CONSTRAINT USING INDEX patch was written, and concluded that we didn't need to get rid of the extra entries. But this is wrong: ALTER COLUMN TYPE wasn't expecting such redundant dependencies to exist, as reported by Hubert Depesz Lubaczewski. On reflection it seems rather likely to break other things as well, since there are many bits of code that crawl pg_depend for one purpose or another, and most of them are pretty naive about what relationships they're expecting to find. Fortunately it's not that hard to get rid of the extra dependency entries, so let's do that. Back-patch to 9.1, where ALTER TABLE ADD CONSTRAINT USING INDEX was added.	2012-08-11 12:51:24 -04:00
Tom Lane	c9b0cbe98b	Support having multiple Unix-domain sockets per postmaster. Replace unix_socket_directory with unix_socket_directories, which is a list of socket directories, and adjust postmaster's code to allow zero or more Unix-domain sockets to be created. This is mostly a straightforward change, but since the Unix sockets ought to be created after the TCP/IP sockets for safety reasons (better chance of detecting a port number conflict), AddToDataDirLockFile needs to be fixed to support out-of-order updates of data directory lockfile lines. That's a change that had been foreseen to be necessary someday anyway. Honza Horak, reviewed and revised by Tom Lane	2012-08-10 17:27:15 -04:00
Tom Lane	eaccfded98	Centralize the logic for detecting misplaced aggregates, window funcs, etc. Formerly we relied on checking after-the-fact to see if an expression contained aggregates, window functions, or sub-selects when it shouldn't. This is grotty, easily forgotten (indeed, we had forgotten to teach DefineIndex about rejecting window functions), and none too efficient since it requires extra traversals of the parse tree. To improve matters, define an enum type that classifies all SQL sub-expressions, store it in ParseState to show what kind of expression we are currently parsing, and make transformAggregateCall, transformWindowFuncCall, and transformSubLink check the expression type and throw error if the type indicates the construct is disallowed. This allows removal of a large number of ad-hoc checks scattered around the code base. The enum type is sufficiently fine-grained that we can still produce error messages of at least the same specificity as before. Bringing these error checks together revealed that we'd been none too consistent about phrasing of the error messages, so standardize the wording a bit. Also, rewrite checking of aggregate arguments so that it requires only one traversal of the arguments, rather than up to three as before. In passing, clean up some more comments left over from add_missing_from support, and annotate some tests that I think are dead code now that that's gone. (I didn't risk actually removing said dead code, though.)	2012-08-10 11:36:15 -04:00
Simon Riggs	da4efa13d8	Turn off WalSender keepalives by default, users can enable if desired	2012-08-09 17:07:03 +01:00
Simon Riggs	87d8bd7c9f	Ensure all replication message info is available and correct via WalRcv	2012-08-09 17:03:59 +01:00
Tom Lane	f630157496	Merge parser's p_relnamespace and p_varnamespace lists into a single list. Now that we are storing structs in these lists, the distinction between the two lists can be represented with a couple of extra flags while using only a single list. This simplifies the code and should save a little bit of palloc traffic, since the majority of RTEs are represented in both lists anyway.	2012-08-08 16:41:31 -04:00
Tom Lane	5ebaaa4944	Implement SQL-standard LATERAL subqueries. This patch implements the standard syntax of LATERAL attached to a sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM, since set-returning function calls are expected to be one of the principal use-cases. The main change here is a rewrite of the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. The parser "namespace" lists are no longer lists of bare RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE pointer as well as some visibility-controlling flags. Aside from supporting LATERAL correctly, this lets us get rid of the ancient hacks that required rechecking subqueries and JOIN/ON and function-in-FROM expressions for invalid references after they were initially parsed. Invalid column references are now always correctly detected on sight. In passing, remove assorted parser error checks that are now dead code by virtue of our having gotten rid of add_missing_from, as well as some comments that are obsolete for the same reason. (It was mainly add_missing_from that caused so much fudging here in the first place.) The planner support for this feature is very minimal, and will be improved in future patches. It works well enough for testing purposes, though. catversion bump forced due to new field in RangeTblEntry.	2012-08-07 19:02:54 -04:00
Tom Lane	962e0cc71e	Fix race conditions associated with SPGiST redirection tuples. The correct test for whether a redirection tuple is removable is whether tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding failed to protect index searches being done in concurrent transactions that have no XID. This mirrors the recent fix in btree's page recycling logic made in commit `d3abbbebe5`. Also, WAL-log the newest XID of any removed redirection tuple on an index page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL replay. This protects against concurrent Hot Standby transactions possibly needing to see the redirection tuple(s). Per my query of 2012-03-12 and subsequent discussion.	2012-08-02 15:34:14 -04:00
Tom Lane	f6ce81f55a	Fix WITH attached to a nested set operation (UNION/INTERSECT/EXCEPT). Parse analysis neglected to cover the case of a WITH clause attached to an intermediate-level set operation; it only handled WITH at the top level or WITH attached to a leaf-level SELECT. Per report from Adam Mackler. In HEAD, I rearranged the order of SelectStmt's fields to put withClause with the other fields that can appear on non-leaf SelectStmts. In back branches, leave it alone to avoid a possible ABI break for third-party code. Back-patch to 8.4 where WITH support was added.	2012-07-31 17:56:21 -04:00
Tom Lane	31c7c642b6	Account for SRFs in targetlists in planner rowcount estimates. We made use of the ROWS estimate for set-returning functions used in FROM, but not for those used in SELECT targetlists; which is a bit of an oversight considering there are common usages that require the latter approach. Improve that. (I had initially thought it might be worth folding this into cost_qual_eval, but after investigation concluded that that wouldn't be very helpful, so just do it separately.) Per complaint from David Johnston. Back-patch to 9.2, but not further, for fear of destabilizing plan choices in existing releases.	2012-07-21 17:45:07 -04:00
Alvaro Herrera	f5bcd398ad	connoinherit may be true only for CHECK constraints The code was setting it true for other constraints, which is bogus. Doing so caused bogus catalog entries for such constraints, and in particular caused an error to be raised when trying to drop a constraint of types other than CHECK from a table that has children, such as reported in bug #6712. In 9.2, additionally ignore connoinherit=true for other constraint types, to avoid having to force initdb; existing databases might already contain bogus catalog entries. Includes a catversion bump (in HEAD only). Bug report from Miroslav Šulc Analysis from Amit Kapila and Noah Misch; Amit also contributed the patch.	2012-07-20 14:08:07 -04:00
Tom Lane	8e617e29aa	Fix whole-row Var evaluation to cope with resjunk columns (again). When a whole-row Var is reading the result of a subquery, we need it to ignore any "resjunk" columns that the subquery might have evaluated for GROUP BY or ORDER BY purposes. We've hacked this area before, in commit `68e40998d0`, but that fix only covered whole-row Vars of named composite types, not those of RECORD type; and it was mighty klugy anyway, since it just assumed without checking that any extra columns in the result must be resjunk. A proper fix requires getting hold of the subquery's targetlist so we can actually see which columns are resjunk (whereupon we can use a JunkFilter to get rid of them). So bite the bullet and add some infrastructure to make that possible. Per report from Andrew Dunstan and additional testing by Merlin Moncure. Back-patch to all supported branches. In 8.3, also back-patch commit `292176a118`, which for some reason I had not done at the time, but it's a prerequisite for this change.	2012-07-20 13:10:58 -04:00
Robert Haas	3a0e4d36eb	Make new event trigger facility actually do something. Commit `3855968f32` added syntax, pg_dump, psql support, and documentation, but the triggers didn't actually fire. With this commit, they now do. This is still a pretty basic facility overall because event triggers do not get a whole lot of information about what the user is trying to do unless you write them in C; and there's still no option to fire them anywhere except at the very beginning of the execution sequence, but it's better than nothing, and a good building block for future work. Along the way, add a regression test for ALTER LARGE OBJECT, since testing of event triggers reveals that we haven't got one. Dimitri Fontaine and Robert Haas	2012-07-20 11:39:01 -04:00
Heikki Linnakangas	a7a4add6c4	Refactor the way code is shared between some range type functions. Functions like range_eq, range_before etc. are exposed at the SQL-level, but they're also used internally by the GiST consistent support function. The code sharing was done by a hack, TrickFunctionCall2, which relied on the knowledge that all the functions used fn_extra the same way. This commit splits the functions into internal versions that take a TypeCacheEntry as argument, and thin wrappers to expose the functions at the SQL-level. The internal versions can then be called directly and in a less hacky way from the GiST consistent function. This is just cosmetic, but backpatch to 9.2 anyway, to avoid having a different version of this code in the 9.2 branch. That would make backpatching fixes in this area more difficult. Alexander Korotkov	2012-07-18 23:14:56 +03:00
Tom Lane	4a9c30a8a1	Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.	2012-07-18 15:28:10 -04:00
Robert Haas	3855968f32	Syntax support and documentation for event triggers. They don't actually do anything yet; that will get fixed in a follow-on commit. But this gets the basic infrastructure in place, including CREATE/ALTER/DROP EVENT TRIGGER; support for COMMENT, SECURITY LABEL, and ALTER EXTENSION .. ADD/DROP EVENT TRIGGER; pg_dump and psql support; and documentation for the anticipated initial feature set. Dimitri Fontaine, with review and a bunch of additional hacking by me. Thom Brown extensively reviewed earlier versions of this patch set, but there's not a whole lot of that code left in this commit, as it turns out.	2012-07-18 10:16:16 -04:00
Tom Lane	73b796a52c	Improve coding around the fsync request queue. In all branches back to 8.3, this patch fixes a questionable assumption in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are no uninitialized pad bytes in the request queue structs. This would only cause trouble if (a) there were such pad bytes, which could happen in 8.4 and up if the compiler makes enum ForkNumber narrower than 32 bits, but otherwise would require not-currently-planned changes in the widths of other typedefs; and (b) the kernel has not uniformly initialized the contents of shared memory to zeroes. Still, it seems a tad risky, and we can easily remove any risk by pre-zeroing the request array for ourselves. In addition to that, we need to establish a coding rule that struct RelFileNode can't contain any padding bytes, since such structs are copied into the request array verbatim. (There are other places that are assuming this anyway, it turns out.) In 9.1 and up, the risk was a bit larger because we were also effectively assuming that struct RelFileNodeBackend contained no pad bytes, and with fields of different types in there, that would be much easier to break. However, there is no good reason to ever transmit fsync or delete requests for temp files to the bgwriter/checkpointer, so we can revert the request structs to plain RelFileNode, getting rid of the padding risk and saving some marginal number of bytes and cycles in fsync queue manipulation while we are at it. The savings might be more than marginal during deletion of a temp relation, because the old code transmitted an entirely useless but nonetheless expensive-to-process ForgetRelationFsync request to the background process, and also had the background process perform the file deletion even though that can safely be done immediately. In addition, make some cleanup of nearby comments and small improvements to the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.	2012-07-17 16:56:54 -04:00
Alvaro Herrera	f34c68f096	Introduce timeout handling framework Management of timeouts was getting a little cumbersome; what we originally had was more than enough back when we were only concerned about deadlocks and query cancel; however, when we added timeouts for standby processes, the code got considerably messier. Since there are plans to add more complex timeouts, this seems a good time to introduce a central timeout handling module. External modules register their timeout handlers during process initialization, and later enable and disable them as they see fit using a simple API; timeout.c is in charge of keeping track of which timeouts are in effect at any time, installing a common SIGALRM signal handler, and calling setitimer() as appropriate to ensure timely firing of external handlers. timeout.c additionally supports pluggable modules to add their own timeouts, though this capability isn't exercised anywhere yet. Additionally, as of this commit, walsender processes are aware of timeouts; we had a preexisting bug there that made those ignore SIGALRM, thus being subject to unhandled deadlocks, particularly during the authentication phase. This has already been fixed in back branches in commit `0bf8eb2a`, which see for more details. Main author: Zoltán Böszörményi Some review and cleanup by Álvaro Herrera Extensive reworking by Tom Lane	2012-07-16 22:55:33 -04:00
Tom Lane	c92be3c059	Avoid pre-determining index names during CREATE TABLE LIKE parsing. Formerly, when trying to copy both indexes and comments, CREATE TABLE LIKE had to pre-assign names to indexes that had comments, because it made up an explicit CommentStmt command to apply the comment and so it had to know the name for the index. This creates bad interactions with other indexes, as shown in bug #6734 from Daniele Varrazzo: the preassignment logic couldn't take any other indexes into account so it could choose a conflicting name. To fix, add a field to IndexStmt that allows it to carry a comment to be assigned to the new index. (This isn't a user-exposed feature of CREATE INDEX, only an internal option.) Now we don't need preassignment of index names in any situation. I also took the opportunity to refactor DefineIndex to accept the IndexStmt as such, rather than passing all its fields individually in a mile-long parameter list. Back-patch to 9.2, but no further, because it seems too dangerous to change IndexStmt or DefineIndex's API in released branches. The bug exists back to 9.0 where CREATE TABLE LIKE grew the ability to copy comments, but given the lack of prior complaints we'll just let it go unfixed before 9.2.	2012-07-16 13:25:18 -04:00
Tom Lane	b966dd6c42	Add fsync capability to initdb, and use sync_file_range() if available. Historically we have not worried about fsync'ing anything during initdb (in fact, initdb intentionally passes -F to each backend launch to prevent it from fsync'ing). But with filesystems getting more aggressive about caching data, that's not such a good plan anymore. Make initdb do a pass over the finished data directory tree to fsync everything. For testing purposes, the -N/--nosync flag can be used to restore the old behavior. Also, testing shows that on Linux, sync_file_range() is much faster than posix_fadvise() for hinting to the kernel that an fsync is coming, apparently because the latter blocks on a rather small request queue while the former doesn't. So use this function if available in initdb, and also in the backend's pg_flush_data() (where it currently will affect only the speed of CREATE DATABASE's cloning step). We will later make pg_regress invoke initdb with the --nosync flag to avoid slowing down cases such as "make check" in contrib. But let's not do so until we've shaken out any portability issues in this patch. Jeff Davis, reviewed by Andres Freund	2012-07-13 17:16:58 -04:00
Tom Lane	84a42560c8	Add array_remove() and array_replace() functions. These functions support removing or replacing array element value(s) matching a given search value. Although intended mainly to support a future array-foreign-key feature, they seem useful in their own right. Marco Nenciarini and Gabriele Bartolini, reviewed by Alex Hunsaker	2012-07-11 13:59:35 -04:00
Tom Lane	01215d61a7	Fix bogus macro definition. Per buildfarm complaints.	2012-07-10 22:36:11 -04:00
Tatsuo Ishii	1c7a7faa5b	Add comments about additional mule-internal charsets from emacs's source code(lisp/international/mule-conf.el). These charsets have not been supported up to now anyway, so this is just for adding commentary. Also add mention that we follow emacs's implementation, not xemacs's.	2012-07-11 08:10:50 +09:00
Tom Lane	628cbb50ba	Re-implement extraction of fixed prefixes from regular expressions. To generate btree-indexable conditions from regex WHERE conditions (such as WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed prefix that a regex might have; that is, find any string that must be a prefix of all strings satisfying the regex. We used to do that with entirely ad-hoc code that looked at the source text of the regex. It didn't know very much about regex syntax, which mostly meant that it would fail to identify some optimizable cases; but Viktor Rosenfeld reported that it would produce actively wrong answers for quantified parenthesized subexpressions, such as '^(foo)?bar'. Rather than trying to extend the ad-hoc code to cover this, let's get rid of it altogether in favor of identifying prefixes by examining the compiled form of a regex. To do this, I've added a new entry point "pg_regprefix" to the regex library; hopefully it is defined in a sufficiently general fashion that it can remain in the library when/if that code gets split out as a standalone project. Since this bug has been there for a very long time, this fix needs to get back-patched. However it depends on some other recent commits (particularly the addition of wchar-to-database-encoding conversion), so I'll commit this separately and then go to work on back-porting the necessary fixes.	2012-07-10 14:54:37 -04:00
Tom Lane	00dac6000d	Refactor pattern_fixed_prefix() to avoid dealing in incomplete patterns. Previously, pattern_fixed_prefix() was defined to return whatever fixed prefix it could extract from the pattern, plus the "rest" of the pattern. That definition was sensible for LIKE patterns, but not so much for regexes, where reconstituting a valid pattern minus the prefix could be quite tricky (certainly the existing code wasn't doing that correctly). Since the only thing that callers ever did with the "rest" of the pattern was to pass it to like_selectivity() or regex_selectivity(), let's cut out the middle-man and just have pattern_fixed_prefix's subroutines do this directly. Then pattern_fixed_prefix can return a simple selectivity number, and the question of how to cope with partial patterns is removed from its API specification. While at it, adjust the API spec so that callers who don't actually care about the pattern's selectivity (which is a lot of them) can pass NULL for the selectivity pointer to skip doing the work of computing a selectivity estimate. This patch is only an API refactoring that doesn't actually change any processing, other than allowing a little bit of useless work to be skipped. However, it's necessary infrastructure for my upcoming fix to regex prefix extraction, because after that change there won't be any simple way to identify the "rest" of the regex, not even to the low level of fidelity needed by regex_selectivity. We can cope with that if regex_fixed_prefix and regex_selectivity communicate directly, but not if we have to work within the old API. Hence, back-patch to all active branches.	2012-07-09 23:22:55 -04:00
Tom Lane	e7ef6d7e24	Fix planner to pass correct collation to operator selectivity estimators. We can do this without creating an API break for estimation functions by passing the collation using the existing fmgr functionality for passing an input collation as a hidden parameter. The need for this was foreseen at the outset, but we didn't get around to making it happen in 9.1 because of the decision to sort all pg_statistic histograms according to the database's default collation. That meant that selectivity estimators generally need to use the default collation too, even if they're estimating for an operator that will do something different. The reason it's suddenly become more interesting is that regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE property), and we no longer want to use the wrong collation when examining regexps during planning. It's not that the selectivity estimate is likely to change much from this; rather that we are thinking of caching compiled regexps during planner estimation, and we won't get the intended benefit if we cache them with a different collation than the executor will use. Back-patch to 9.1, both because the regexp change is likely to get back-patched and because we might as well get this right in all collation-supporting branches, in case any third-party code wants to rely on getting the collation. The patch turns out to be minuscule now that I've done it ...	2012-07-08 23:51:08 -04:00
Tom Lane	c6aae3042b	Simplify and document regex library's compact-NFA representation. The previous coding abused the first element of a cNFA state's arcs list to hold a per-state flag bit, which was confusing, undocumented, and not even particularly efficient. Get rid of that in favor of a separate "stflags" vector. Since there's only one bit in use, I chose to allocate a char per state; we could possibly replace this with a bitmap at some point, but that would make accesses a little slower. It's already about 8X smaller than before, so let's not get overly tense. Also document the representation better than it was before, which is to say not at all. This patch is a byproduct of investigations towards extracting a "fixed prefix" string from the compact-NFA representation of regex patterns. Might need to back-patch it if we decide to back-patch that fix, but for now it's just code cleanup so I'll just put it in HEAD.	2012-07-07 17:39:50 -04:00
Tom Lane	fc548b2296	Remove support for using wait3() in place of waitpid(). All Unix-oid platforms that we currently support should have waitpid(), since it's in V2 of the Single Unix Spec. Our git history shows that the wait3 code was added to support NextStep, which we officially dropped support for as of 9.2. So get rid of the configure test, and simplify the macro spaghetti in reaper(). Per suggestion from Fujii Masao.	2012-07-05 14:00:40 -04:00
Robert Haas	72dd6291f2	Add wchar -> mb conversion routines. This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me	2012-07-04 17:10:10 -04:00
Tom Lane	09022de1f5	Improve documentation about MULE encoding. This commit improves the comments in pg_wchar.h and creates #define symbols for some formerly hard-coded values. No substantive code changes. Tatsuo Ishii and Tom Lane	2012-07-04 00:29:57 -04:00
Alvaro Herrera	0c7b9dc7d0	Have REASSIGN OWNED work on extensions, too Per bug #6593, REASSIGN OWNED fails when the affected role has created an extension. Even though the user related to the extension is not nominally the owner, its OID appears on pg_shdepend and thus causes problems when the user is to be dropped. This commit adds code to change the "ownership" of the extension itself, not of the contained objects. This is fine because it's currently only called from REASSIGN OWNED, which would also modify the ownership of the contained objects. However, this is not sufficient for a working ALTER OWNER implementation extension. Back-patch to 9.1, where extensions were introduced. Bug #6593 reported by Emiliano Leporati.	2012-07-03 15:09:59 -04:00
Robert Haas	f83b59997d	Make walsender more responsive. Per testing by Andres Freund, this improves replication performance and reduces replication latency and latency jitter. I was a bit concerned about moving more work into XLogInsert, but testing seems to show that it's not a problem in practice. Along the way, improve comments for WaitLatchOrSocket. Andres Freund. Review and stylistic cleanup by me.	2012-07-02 09:41:01 -04:00
Tom Lane	541ffa65c3	Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars. If a CHECK constraint or index definition contained a whole-row Var (that is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or table inheritance produced incorrect results: the copied Var still claimed to have the rowtype of the source table, rather than the created table. For the LIKE case, it seems reasonable to just throw error for this situation, since the point of LIKE is that the new table is not permanently coupled to the old, so there's no reason to assume its rowtype will stay compatible. In the inheritance case, we should ideally allow such constraints, but doing so will require nontrivial refactoring of CREATE TABLE processing (because we'd need to know the OID of the new table's rowtype before we adjust inherited CHECK constraints). In view of the lack of previous complaints, that doesn't seem worth the risk in a back-patched bug fix, so just make it throw error for the inheritance case as well. Along the way, replace change_varattnos_of_a_node() with a more robust function map_variable_attnos(), which is capable of being extended to handle insertion of ConvertRowtypeExpr whenever we get around to fixing the inheritance case nicely, and in the meantime it returns a failure indication to the caller so that a helpful message with some context can be thrown. Also, this code will do the right thing with subselects (if we ever allow them in CHECK or indexes), and it range-checks varattnos before using them to index into the map array. Per report from Sergey Konoplev. Back-patch to all supported branches.	2012-06-30 16:45:14 -04:00
Robert Haas	b79ab00144	When LWLOCK_STATS is defined, count spindelays. When LWLOCK_STATS is not defined, the only change is that SpinLockAcquire now returns the number of delays. Patch by me, review by Jeff Janes.	2012-06-26 16:06:07 -04:00
Alvaro Herrera	77ed0c6950	Tighten up includes in sinvaladt.h, twophase.h, proc.h Remove proc.h from sinvaladt.h and twophase.h; also replace xlog.h in proc.h with xlogdefs.h.	2012-06-25 18:40:40 -04:00
Peter Eisentraut	eeece9e609	Unify calling conventions for postgres/postmaster sub-main functions There was a wild mix of calling conventions: Some were declared to return void and didn't return, some returned an int exit code, some claimed to return an exit code, which the callers checked, but actually never returned, and so on. Now all of these functions are declared to return void and decorated with attribute noreturn and don't return. That's easiest, and most code already worked that way.	2012-06-25 21:30:12 +03:00
Robert Haas	2dfa87bcb6	Remove sanity test in XRecOffIsValid. Commit `061e7efb1b` changed the rules for splitting xlog records across pages, but neglected to update this test. It's possible that there's some better action here than just removing the test completely, but this at least appears to get some of the things that are currently broken (like initdb on MacOS X) working again.	2012-06-25 12:14:43 -04:00
Peter Eisentraut	b8b2e3b2de	Replace int2/int4 in C code with int16/int32 The latter was already the dominant use, and it's preferable because in C the convention is that intXX means XX bits. Therefore, allowing mixed use of int2, int4, int8, int16, int32 is obviously confusing. Remove the typedefs for int2 and int4 for now. They don't seem to be widely used outside of the PostgreSQL source tree, and the few uses can probably be cleaned up by the time this ships.	2012-06-25 01:51:46 +03:00
Heikki Linnakangas	0687a26002	Use UINT64CONST for 64-bit integer constants. Peter Eisentraut advised me that UINT64CONST is the proper way to do that, not LL suffix.	2012-06-24 21:56:45 +03:00
Heikki Linnakangas	96ff85e2dd	Use LL suffix for 64-bit constants. Per warning from buildfarm member 'locust'. At least I think this what's making it upset.	2012-06-24 20:01:55 +03:00
Heikki Linnakangas	0ab9d1c4b3	Replace XLogRecPtr struct with a 64-bit integer. This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.	2012-06-24 19:19:45 +03:00
Heikki Linnakangas	061e7efb1b	Allow WAL record header to be split across pages. This saves a few bytes of WAL space, but the real motivation is to make it predictable how much WAL space a record requires, as it no longer depends on whether we need to waste the last few bytes at end of WAL page because the header doesn't fit. The total length field of WAL record, xl_tot_len, is moved to the beginning of the WAL record header, so that it is still always found on the first page where a WAL record begins. Bump WAL version number again as this is an incompatible change.	2012-06-24 18:35:56 +03:00
Heikki Linnakangas	20ba5ca64c	Move WAL continuation record information to WAL page header. The continuation record only contained one field, xl_rem_len, so it makes things simpler to just include it in the WAL page header. This wastes four bytes on pages that don't begin with a continuation from previos page, plus four bytes on every page, because of padding. The motivation of this is to make it easier to calculate how much space a WAL record needs. Before this patch, it depended on how many page boundaries the record crosses. The motivation of that, in turn, is to separate the allocation of space in the WAL from the copying of the record data to the allocated space. Keeping the calculation of space required simple helps to keep the critical section of allocating the space from WAL short. But that's not included in this patch yet. Bump WAL version number again, as this is an incompatible change.	2012-06-24 18:35:30 +03:00
Heikki Linnakangas	dfda6ebaec	Don't waste the last segment of each 4GB logical log file. The comments claimed that wasting the last segment made it easier to do calculations with XLogRecPtrs, because you don't have problems representing last-byte-position-plus-1 that way. In my experience, however, it only made things more complicated, because the there was two ways to represent the boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0, or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were picky about which representation was used. Also, use a 64-bit segment number instead of the log/seg combination, to point to a certain WAL segment. We assume that all platforms have a working 64-bit integer type nowadays. This is an incompatible change in WAL format, so bumping WAL version number.	2012-06-24 18:35:29 +03:00
Tom Lane	d14241c2cf	Fix memory leak in ARRAY(SELECT ...) subqueries. Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which I think can only happen if the sub-select is embedded in a larger, correlated subquery) would leak memory for the duration of the query, due to not reclaiming the array generated in the previous execution. Per bug #6698 from Armando Miraglia. Diagnosis and fix idea by Heikki, patch itself by me. This has been like this all along, so back-patch to all supported versions.	2012-06-21 17:27:19 -04:00
Heikki Linnakangas	eeb6f37d89	Add a small cache of locks owned by a resource owner in ResourceOwner. This speeds up reassigning locks to the parent owner, when the transaction holds a lot of locks, but only a few of them belong to the current resource owner. This is particularly helps pg_dump when dumping a large number of objects. The cache can hold up to 15 locks in each resource owner. After that, the cache is marked as overflowed, and we fall back to the old method of scanning the whole local lock table. The tradeoff here is that the cache has to be scanned whenever a lock is released, so if the cache is too large, lock release becomes more expensive. 15 seems enough to cover pg_dump, and doesn't have much impact on lock release. Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.	2012-06-21 15:30:26 +03:00
Tom Lane	cfa0f4255b	Improve tests for whether we can skip queueing RI enforcement triggers. During an update of a PK row, we can skip firing the RI trigger if any old key value is NULL, because then the row could not have had any matching rows in the FK table. Conversely, during an update of an FK row, the outcome is determined if any new key value is NULL. In either case it becomes unnecessary to compare individual key values. This patch was inspired by discussion of Vik Reykja's patch to use IS NOT DISTINCT semantics for the key comparisons. In the event there is no need for that and so this patch looks nothing like his, but he should still get credit for having re-opened consideration of the trigger skip logic.	2012-06-19 20:07:33 -04:00
Tom Lane	f5297bdfe4	Refer to the default foreign key match style as MATCH SIMPLE internally. Previously we followed the SQL92 wording, "MATCH <unspecified>", but since SQL99 there's been a less awkward way to refer to the default style. In addition to the code changes, pg_constraint.confmatchtype now stores this match style as 's' (SIMPLE) rather than 'u' (UNSPECIFIED). This doesn't affect pg_dump or psql because they use pg_get_constraintdef() to reconstruct foreign key definitions. But other client-side code might examine that column directly, so this change will have to be marked as an incompatibility in the 9.3 release notes.	2012-06-17 20:16:44 -04:00
Tom Lane	9e18eacbdf	Fix stats collector to recover nicely when system clock goes backwards. Formerly, if the system clock went backwards, the stats collector would fail to update the stats file any more until the clock reading again exceeds whatever timestamp was last written into the stats file. Such glitches in the clock's behavior are not terribly unlikely on machines not using NTP. Such a scenario has been observed to cause regression test failures in the buildfarm, and it could have bad effects on the behavior of autovacuum, so it seems prudent to install some defenses. We could directly detect the clock going backwards by adding GetCurrentTimestamp calls in the stats collector's main loop, but that would hurt performance on platforms where GetCurrentTimestamp is expensive. To minimize the performance hit in normal cases, adopt a more complicated scheme wherein backends check for clock skew when reading the stats file, and if they see it, signal the stats collector by sending an extra stats inquiry message. The stats collector does an extra GetCurrentTimestamp only when it receives an inquiry with an apparently out-of-order timestamp. To avoid unnecessary GetCurrentTimestamp calls, expand the inquiry messages to carry the backend's current clock reading as well as its stats cutoff time. The latter, being intentionally slightly in-the-past, would trigger more clock rechecks than we need if it were used for this purpose. We might want to backpatch this change at some point, but let's let it shake out in the buildfarm for awhile first.	2012-06-17 17:11:49 -04:00
Peter Eisentraut	15b1918e7d	Improve reporting of permission errors for array types Because permissions are assigned to element types, not array types, complaining about permission denied on an array type would be misleading to users. So adjust the reporting to refer to the element type instead. In order not to duplicate the required logic in two dozen places, refactor the permission denied reporting for types a bit. pointed out by Yeb Havinga during the review of the type privilege feature	2012-06-15 22:55:03 +03:00
Robert Haas	68de499bda	New SQL functons pg_backup_in_progress() and pg_backup_start_time() Darold Gilles, reviewed by Gabriele Bartolini and others, rebased by Marco Nenciarini. Stylistic cleanup and OID fixes by me.	2012-06-14 13:25:43 -04:00
Robert Haas	6cd015bea3	Add new function log_newpage_buffer. When I implemented the ginbuildempty() function as part of implementing unlogged tables, I falsified the note in the header comment for log_newpage. Although we could fix that up by changing the comment, it seems cleaner to add a new function which is specifically intended to handle this case. So do that.	2012-06-14 10:11:16 -04:00
Robert Haas	a475c60367	Remove misplaced sanity check from heap_create(). Even when allow_system_table_mods is not set, we allow creation of any type of SQL object in pg_catalog, except for relations. And you can get relations into pg_catalog, too, by initially creating them in some other schema and then moving them with ALTER .. SET SCHEMA. So this restriction, which prevents relations (only) from being created in pg_catalog directly, is fairly pointless. If we need a safety mechanism for this, it should be placed further upstream, so that it affects all SQL objects uniformly, and picks up both CREATE and SET SCHEMA. For now, just rip it out, per discussion with Tom Lane.	2012-06-14 09:58:53 -04:00
Robert Haas	d2c86a1ccd	Remove RELKIND_UNCATALOGED. This may have been important at some point in the past, but it no longer does anything useful. Review by Tom Lane.	2012-06-14 09:47:30 -04:00
Tom Lane	bed88fceac	Stamp HEAD as 9.3devel. Let the hacking begin ...	2012-06-13 20:03:02 -04:00
Bruce Momjian	927d61eeff	Run pgindent on 9.2 source tree in preparation for first 9.3 commit-fest.	2012-06-10 15:20:04 -04:00
Peter Eisentraut	8570114dc1	Make include files work without having to include other ones first	2012-06-10 12:46:14 +03:00
Tom Lane	ece01aae47	Scan the buffer pool just once, not once per fork, during relation drop. This provides a speedup of about 4X when NBuffers is large enough. There is also a useful reduction in sinval traffic, since we only do CacheInvalidateSmgr() once not once per fork. Simon Riggs, reviewed and somewhat revised by Tom Lane	2012-06-07 17:43:11 -04:00
Simon Riggs	055c352abb	After any checkpoint, close all smgr files handles in bgwriter	2012-06-01 09:24:53 +01:00
Tom Lane	4bec93ac0f	Stamp 9.2beta2.	2012-05-31 19:16:55 -04:00
Tom Lane	ad0009e7be	Force PL and range-type support functions to be owned by a superuser. We allow non-superusers to create procedural languages (with restrictions) and range datatypes. Previously, the automatically-created support functions for these objects ended up owned by the creating user. This represents a rather considerable security hazard, because the owning user might be able to alter a support function's definition in such a way as to crash the server, inject trojan-horse SQL code, or even execute arbitrary C code directly. It appears that right now the only actually exploitable problem is the infinite-recursion bug fixed in the previous patch for CVE-2012-2655. However, it's not hard to imagine that future additions of more ALTER FUNCTION capability might unintentionally open up new hazards. To forestall future problems, cause these support functions to be owned by the bootstrap superuser, not the user creating the parent object.	2012-05-30 23:47:57 -04:00
Tom Lane	cd0ff9c0f4	Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich. We used to only allow offsets less than +/-13 hours, then it was +/14, then it was +/-15. That's still not good enough though, as per today's bug report from Patric Bechtel. This time I actually looked through the Olson timezone database to find the largest offsets used anywhere. The winners are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at +15:13:42 until 1867. So we'd better allow offsets less than +/-16 hours. Given the history, we are way overdue to have some greppable #define symbols controlling this, so make some ... and also remove an obsolete comment that didn't get fixed the last time. Back-patch to all supported branches.	2012-05-30 19:58:35 -04:00
Heikki Linnakangas	d1996ed5e8	Change the way parent pages are tracked during buffered GiST build. We used to mimic the way a stack is constructed when descending the tree during normal GiST inserts, but that was quite complicated during a buffered build. It was also wrong: in GiST, the left-to-right relationships on different levels might not match each other, so that when you know the parent of a child page, you won't necessarily find the parent of the page to the right of the child page by following the rightlinks at the parent level. This sometimes led to "could not re-find parent" errors while building a GiST index. We now use a simple hash table to track the parent of every internal page. Whenever a page is split, and downlinks are moved from one page to another, we update the hash table accordingly. This is also better for performance than the old method, as we never need to move right to re-find the parent page, which could take a significant amount of time for buffers that were created much earlier in the index build.	2012-05-30 12:05:57 +03:00
Heikki Linnakangas	1d27dcf578	Fix bug in gistRelocateBuildBuffersOnSplit(). When we create a temporary copy of the old node buffer, in stack, we mustn't leak that into any of the long-lived data structures. Before this patch, when we called gistPopItupFromNodeBuffer(), it got added to the array of "loaded buffers". After gistRelocateBuildBuffersOnSplit() exits, the pointer added to the loaded buffers array points to garbage. Often that goes unnotied, because when we go through the array of loaded buffers to unload them, buffers with a NULL pageBuffer are ignored, which can often happen by accident even if the pointer points to garbage. This patch fixes that by marking the temporary copy in stack explicitly as temporary, and refrain from adding buffers marked as temporary to the array of loaded buffers. While we're at it, initialize nodeBuffer->pageBlocknum to InvalidBlockNumber and improve comments a bit. This isn't strictly necessary, but makes debugging easier.	2012-05-18 19:38:32 +03:00
Peter Eisentraut	be6d1c88a4	Change COLLATION keyword category It was changed from unreserved to reserved as part of the COLLATION FOR syntax, but it turns out that type_func_name_keyword is sufficient.	2012-05-16 20:19:44 +03:00
Tom Lane	f667747b6d	Put back AC_REQUIRE([AC_STRUCT_TM]). The BSD-ish members of the buildfarm all seem to think removing this was a bad idea. It looks to me like it resulted in omitting the system header inclusion necessary to detect the fields of struct tm correctly.	2012-05-14 23:06:48 -04:00
Peter Eisentraut	ff4628f37a	Remove unused AC_DEFINE symbols ENABLE_DTRACE unused as of `a7b7b07af3` HAVE_ERR_SET_MARK unused as of `4ed4b6c54e` HAVE_FCVT unused as of `4553e1d80f` HAVE_STRUCT_SOCKADDR_UN unused as of `b4cea00a1f` HAVE_SYSCONF unused as of `f83356c7f5` TM_IN_SYS_TIME never used, obsolescent per Autoconf documentation	2012-05-14 22:51:21 +03:00
Heikki Linnakangas	9e4637bf89	Update comments that became out-of-date with the PGXACT struct. When the "hot" members of PGPROC were split off to separate PGXACT structs, many PGPROC fields referred to in comments were moved to PGXACT, but the comments were neglected in the commit. Mostly this is just a search/replace of PGPROC with PGXACT, but the way the dummy PGPROC entries are created for prepared transactions changed more, making some of the comments totally bogus. Noah Misch	2012-05-14 10:28:55 +03:00
Peter Eisentraut	64f09ca386	Remove leftovers of BeOS port These should have been removed when the BeOS port was removed in `44f9021223`.	2012-05-14 04:50:39 +03:00
Robert Haas	1331cc6c1a	Prevent loss of init fork when truncating an unlogged table. Fixes bug #6635, reported by Akira Kurosawa.	2012-05-11 09:48:56 -04:00
Simon Riggs	b06679e012	Ensure age() returns a stable value rather than the latest value	2012-05-11 14:36:24 +01:00
Bruce Momjian	ee24de4001	Revert catalog bump; was post-beta1, and unnecessary.	2012-05-10 18:44:47 -04:00
Bruce Momjian	d2fe836cd2	Update comment for 'name' data type to say 63 "bytes". Catalog version bump so everyone has the same comment for beta1.	2012-05-10 18:40:40 -04:00
Tom Lane	f70fa835e0	Stamp 9.2beta1.	2012-05-10 18:35:09 -04:00
Heikki Linnakangas	60a3dffb72	Fix outdated comment. Multi-insert records observe XLOG_HEAP_INIT_PAGE flag too, as Andres Freund pointed out.	2012-05-10 09:55:48 +03:00
Tom Lane	6308ba05a7	Improve control logic for bgwriter hibernation mode. Commit `6d90eaaa89` added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.	2012-05-09 23:37:10 -04:00
Simon Riggs	8f28789bff	Rename BgWriterShmem/Request to CheckpointerShmem/Request	2012-05-09 14:23:45 +01:00
Simon Riggs	bbd3ec9dce	Rename BgWriterCommLock to CheckpointerCommLock	2012-05-09 14:11:48 +01:00
Tom Lane	acd4c7d58b	Fix an issue in recent walwriter hibernation patch. Users of asynchronous-commit mode expect there to be a guaranteed maximum delay before an async commit's WAL records get flushed to disk. The original version of the walwriter hibernation patch broke that. Add an extra shared-memory flag to allow async commits to kick the walwriter out of hibernation mode, without adding any noticeable overhead in cases where no action is needed.	2012-05-08 23:06:40 -04:00
Tom Lane	5461564a9d	Reduce idle power consumption of walwriter and checkpointer processes. This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom	2012-05-08 20:03:26 -04:00
Peter Eisentraut	3284e03d5d	Remove strdup, strtol, strtoul from libpgport These should not be needed anymore, at least after the recent port removals. So let's see whether we can do without them.	2012-05-07 23:10:28 +03:00
Tom Lane	71b9549d05	Overdue code review for transaction-level advisory locks patch. Commit `62c7bd31c8` had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.	2012-05-04 17:44:31 -04:00
Bruce Momjian	ebcaa5fcde	Remove BSD/OS (BSDi) port. There are no known users upgrading to Postgres 9.2, and perhaps no existing users either.	2012-05-03 10:58:44 -04:00
Robert Haas	8e0c5195df	Add missing parenthesis in comment.	2012-05-02 14:30:58 -04:00
Robert Haas	0038110421	Avoid repeated CLOG access from heap_hot_search_buffer. At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.	2012-05-02 12:40:07 -04:00
Robert Haas	e01e66f808	More duplicate word removal.	2012-05-02 09:28:16 -04:00
Heikki Linnakangas	f291ccd43e	Remove duplicate words in comments. Found these with grep -r "for for ".	2012-05-02 10:20:27 +03:00
Peter Eisentraut	f2f9439fbf	Remove dead ports Remove the following ports: - dgux - nextstep - sunos4 - svr4 - ultrix4 - univel These are obsolete and not worth rescuing. In most cases, there is circumstantial evidence that they wouldn't work anymore anyway.	2012-05-01 22:11:12 +03:00
Tom Lane	809e7e21af	Converge all SQL-level statistics timing values to float8 milliseconds. This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.	2012-04-30 14:03:33 -04:00
Peter Eisentraut	26471a51fc	Mark ReThrowError() with attribute noreturn All related functions were already so marked.	2012-04-30 20:23:41 +03:00
Tom Lane	1dd89eadcd	Rename I/O timing statistics columns to blk_read_time and blk_write_time. This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.	2012-04-29 18:13:33 -04:00
Tom Lane	309c64745e	Rename track_iotiming GUC to track_io_timing. This spelling seems significantly more readable to me.	2012-04-29 16:23:54 -04:00
Peter Eisentraut	81107282a5	Change return type of ExceptionalCondition to void and mark it noreturn In ancient times, it was thought that this wouldn't work because of TrapMacro/AssertMacro, but changing those to use a comma operator appears to work without compiler warnings.	2012-04-29 21:20:14 +03:00
Robert Haas	3424bff90f	Prevent index-only scans from returning wrong answers under Hot Standby. The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.	2012-04-26 20:00:21 -04:00
Tom Lane	9fa82c9809	Fix planner's handling of RETURNING lists in writable CTEs. setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists, which meant they were left with the wrong varnos when the RETURNING list was in a subquery. That was never possible before writable CTEs, of course, but now it's broken. The executor fails to notice any problem because ExecEvalVar just references the ecxt_scantuple for any normal varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a recent complaint from Bartosz Dmytrak. Since the eventual rtoffset of the subquery is not known at the time we are preparing its plan node, the previous scheme of executing set_returning_clause_references() at that time cannot handle this adjustment. Fortunately, it turns out that we don't really need to do it that way, because all the needed information is available during normal setrefs.c execution; we just have to dig it out of the ModifyTable node. So, do that, and get rid of the kluge of early setrefs processing of RETURNING lists. (This is a little bit of a cheat in the case of inherited UPDATE/DELETE, because we are not passing a "root" struct that corresponds exactly to what the subplan was built with. But that doesn't matter, and anyway this is less ugly than early setrefs processing was.) Back-patch to 9.1, where the problem became possible to hit.	2012-04-25 20:20:33 -04:00
Robert Haas	ca1e1a8da1	Remove prototype for nonexistent function.	2012-04-25 15:32:15 -04:00
Robert Haas	5d4b60f2f2	Lots of doc corrections. Josh Kupershmidt	2012-04-23 22:43:09 -04:00
Alvaro Herrera	09ff76fcdb	Recast "ONLY" column CHECK constraints as NO INHERIT The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in `61d81bd28d`, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me	2012-04-20 23:56:57 -03:00
Tom Lane	5b7b5518d0	Revise parameterized-path mechanism to fix assorted issues. This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce all join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.	2012-04-19 15:53:47 -04:00
Robert Haas	4a6fab03f2	Finish rename of FastPathStrongLocks to FastPathStrongRelationLocks. Commit `8e5ac74c12` tried to do this renaming, but I relied on gcc to tell me where I needed to make changes, instead of grep. Noted by Jeff Davis.	2012-04-18 11:29:34 -04:00
Robert Haas	53c5b869b4	Tighten up error recovery for fast-path locking. The previous code could cause a backend crash after BEGIN; SAVEPOINT a; LOCK TABLE foo (interrupted by ^C or statement timeout); ROLLBACK TO SAVEPOINT a; LOCK TABLE foo, and might have leaked strong-lock counts in other situations. Report by Zoltán Böszörményi; patch review by Jeff Davis.	2012-04-18 11:17:30 -04:00
Robert Haas	4a2d7ad76f	pg_size_pretty(numeric) The output of the new pg_xlog_location_diff function is of type numeric, since it could theoretically overflow an int8 due to signedness; this provides a convenient way to format such values. Fujii Masao, with some beautification by me.	2012-04-14 08:07:25 -04:00
Peter Eisentraut	c0cc526e8b	Rename bytea_agg to string_agg and add delimiter argument Per mailing list discussion, we would like to keep the bytea functions parallel to the text functions, so rename bytea_agg to string_agg, which already exists for text. Also, to satisfy the rule that we don't want aggregate functions of the same name with a different number of arguments, add a delimiter argument, just like string_agg for text already has.	2012-04-13 21:36:59 +03:00
Heikki Linnakangas	ef3883d130	Do stack-depth checking in all postmaster children. We used to only initialize the stack base pointer when starting up a regular backend, not in other processes. In particular, autovacuum workers can run arbitrary user code, and without stack-depth checking, infinite recursion in e.g an index expression will bring down the whole cluster. The comment about PL/Java using set_stack_base() is not yet true. As the code stands, PL/java still modifies the stack_base_ptr variable directly. However, it's been discussed in the PL/Java mailing list that it should be changed to use the function, because PL/Java is currently oblivious to the register stack used on Itanium. There's another issues with PL/Java, namely that the stack base pointer it sets is not really the base of the stack, it could be something close to the bottom of the stack. That's a separate issue that might need some further changes to this code, but that's a different story. Backpatch to all supported releases.	2012-04-08 19:07:55 +03:00
Tom Lane	cea49fe82f	Dept of second thoughts: improve the API for AnalyzeForeignTable. If we make the initially-called function return the table physical-size estimate, acquire_inherited_sample_rows will be able to use that to allocate numbers of samples among child tables, when the day comes that we want to support foreign tables in inheritance trees.	2012-04-06 16:04:10 -04:00
Tom Lane	263d9de66b	Allow statistics to be collected for foreign tables. ANALYZE now accepts foreign tables and allows the table's FDW to control how the sample rows are collected. (But only manual ANALYZEs will touch foreign tables, for the moment, since among other things it's not very clear how to handle remote permissions checks in an auto-analyze.) contrib/file_fdw is extended to support this. Etsuro Fujita, reviewed by Shigeru Hanada, some further tweaking by me.	2012-04-06 15:02:35 -04:00
Simon Riggs	8cb53654db	Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLock	2012-04-06 10:21:40 +01:00
Robert Haas	21cc529698	checkopint -> checkpoint Report by Guillaume Lelarge.	2012-04-05 21:37:33 -04:00
Robert Haas	b736aef2ec	Publish checkpoint timing information to pg_stat_bgwriter. Greg Smith, Peter Geoghegan, and Robert Haas	2012-04-05 14:04:37 -04:00
Robert Haas	644828908f	Expose track_iotiming data via the statistics collector. Ants Aasma's original patch to add timing information for buffer I/O requests exposed this data at the relation level, which was judged too costly. I've here exposed it at the database level instead.	2012-04-05 11:40:24 -04:00
Peter Eisentraut	38b9693fd9	Add support for renaming domain constraints	2012-04-03 08:11:51 +03:00
Tom Lane	5e83854d71	Add PGDLLIMPORT to ScanKeywords and NumScanKeywords. Per buildfarm, this is now needed by contrib/pg_stat_statements.	2012-03-31 10:56:21 -04:00
Heikki Linnakangas	5762a4d909	Inherit max_safe_fds to child processes in EXEC_BACKEND mode. Postmaster sets max_safe_fds by testing how many open file descriptors it can open, and that is normally inherited by all child processes at fork(). Not so on EXEC_BACKEND, ie. Windows, however. Because of that, we effectively ignored max_files_per_process on Windows, and always assumed a conservative default of 32 simultaneous open files. That could have an impact on performance, if you need to access a lot of different files in a query. After this patch, the value is passed to child processes by save/restore_backend_variables() among many other global variables. It has been like this forever, but given the lack of complaints about it, I'm not backpatching this.	2012-03-29 08:19:11 +03:00
Andrew Dunstan	d2c1740dc2	Remove now redundant pgpipe code.	2012-03-28 23:24:07 -04:00
Tom Lane	a40fa613b5	Add some infrastructure for contrib/pg_stat_statements. Add a queryId field to Query and PlannedStmt. This is not used by the core backend, except for being copied around at appropriate times. It's meant to allow plug-ins to track a particular query forward from parse analysis to execution. The queryId is intentionally not dumped into stored rules (and hence this commit doesn't bump catversion). You could argue that choice either way, but it seems better that stored rule strings not have any dependency on plug-ins that might or might not be present. Also, add a post_parse_analyze_hook that gets invoked at the end of parse analysis (but only for top-level analysis of complete queries, not cases such as analyzing a domain's default-value expression). This is mainly meant to be used to compute and assign a queryId, but it could have other applications. Peter Geoghegan	2012-03-27 15:17:40 -04:00
Robert Haas	40b9b95769	New GUC, track_iotiming, to track I/O timings. Currently, the only way to see the numbers this gathers is via EXPLAIN (ANALYZE, BUFFERS), but the plan is to add visibility through the stats collector and pg_stat_statements in subsequent patches. Ants Aasma, reviewed by Greg Smith, with some further changes by me.	2012-03-27 14:55:02 -04:00
Robert Haas	7386089d23	Code cleanup for heap_freeze_tuple. It used to be case that lazy vacuum could call this function with only a shared lock on the buffer, but neither lazy vacuum nor any other code path does that any more. Simplify the code accordingly and clean up some related, obsolete comments.	2012-03-26 11:03:06 -04:00
Tom Lane	c7cea267de	Replace empty locale name with implied value in CREATE DATABASE and initdb. setlocale() accepts locale name "" as meaning "the locale specified by the process's environment variables". Historically we've accepted that for Postgres' locale settings, too. However, it's fairly unsafe to store an empty string in a new database's pg_database.datcollate or datctype fields, because then the interpretation could vary across postmaster restarts, possibly resulting in index corruption and other unpleasantness. Instead, we should expand "" to whatever it means at the moment of calling CREATE DATABASE, which we can do by saving the value returned by setlocale(). For consistency, make initdb set up the initial lc_xxx parameter values the same way. initdb was already doing the right thing for empty locale names, but it did not replace non-empty names with setlocale results. On a platform where setlocale chooses to canonicalize the spellings of locale names, this would result in annoying inconsistency. (It seems that popular implementations of setlocale don't do such canonicalization, which is a pity, but the POSIX spec certainly allows it to be done.) The same risk of inconsistency leads me to not venture back-patching this, although it could certainly be seen as a longstanding bug. Per report from Jeff Davis, though this is not his proposed patch.	2012-03-25 21:47:22 -04:00
Tom Lane	8279eb4191	Fix planner's handling of outer PlaceHolderVars within subqueries. For some reason, in the original coding of the PlaceHolderVar mechanism I had supposed that PlaceHolderVars couldn't propagate into subqueries. That is of course entirely possible. When it happens, we need to treat an outer-level PlaceHolderVar much like an outer Var or Aggref, that is SS_replace_correlation_vars() needs to replace the PlaceHolderVar with a Param, and then when building the finished SubPlan we have to provide the PlaceHolderVar expression as an actual parameter for the SubPlan. The handling of the contained expression is a bit delicate but it can be treated exactly like an Aggref's expression. In addition to the missing logic in subselect.c, prepjointree.c was failing to search subqueries for PlaceHolderVars that need their relids adjusted during subquery pullup. It looks like everyplace else that touches PlaceHolderVars got it right, though. Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this oversight would fail with "ERROR: Upper-level PlaceHolderVar found where not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong answers, since the value transmitted into the subquery wouldn't go to null when it should.	2012-03-24 16:21:39 -04:00
Tom Lane	0339047bc9	Code review for protransform patches. Fix loss of previous expression-simplification work when a transform function fires: we must not simply revert to untransformed input tree. Instead build a dummy FuncExpr node to pass to the transform function. This has the additional advantage of providing a simpler, more uniform API for transform functions. Move documentation to a somewhat less buried spot, relocate some poorly-placed code, be more wary of null constants and invalid typmod values, add an opr_sanity check on protransform function signatures, and some other minor cosmetic adjustments. Note: although this patch touches pg_proc.h, no need for catversion bump, because the changes are cosmetic and don't actually change the intended catalog contents.	2012-03-23 17:29:57 -04:00
Peter Eisentraut	0e85abd658	Clean up compiler warnings from unused variables with asserts disabled For those variables only used when asserts are enabled, use a new macro PG_USED_FOR_ASSERTS_ONLY, which expands to __attribute__((unused)) when asserts are not enabled.	2012-03-21 23:33:10 +02:00
Tom Lane	9dbf2b7d75	Restructure SELECT INTO's parsetree representation into CreateTableAsStmt. Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane	2012-03-19 21:38:12 -04:00
Tom Lane	dd4134ea56	Revisit handling of UNION ALL subqueries with non-Var output columns. In commit `57664ed25e` I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit `b28ffd0fcc` fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.	2012-03-16 13:11:55 -04:00
Heikki Linnakangas	aef5fe7efe	Add comments explaining why our Itanium spinlock implementation is safe.	2012-03-16 10:14:45 +02:00
Peter Eisentraut	eb990a2b9e	Add const qualifier to tzn returned by timestamp2tm() The tzn value might come from tm->tm_zone, which libc declares as const, so it's prudent that the upper layers know about this as well.	2012-03-15 21:17:19 +02:00
Peter Eisentraut	ad4fb0d0d2	Improve EncodeDateTime and EncodeTimeOnly APIs Use an explicit argument to tell whether to include the time zone in the output, rather than using some undocumented pointer magic.	2012-03-14 23:03:34 +02:00
Tom Lane	c6a11b89e4	Teach SPGiST to store nulls and do whole-index scans. This patch fixes the other major compatibility-breaking limitation of SPGiST, that it didn't store anything for null values of the indexed column, and so could not support whole-index scans or "x IS NULL" tests. The approach is to create a wholly separate search tree for the null entries, and use fixed "allTheSame" insertion and search rules when processing this tree, instead of calling the index opclass methods. This way the opclass methods do not need to worry about dealing with nulls. Catversion bump is for pg_am updates as well as the change in on-disk format of SPGiST indexes; there are some tweaks in SPGiST WAL records as well. Heavily rewritten version of a patch by Oleg Bartunov and Teodor Sigaev. (The original also stored nulls separately, but it reused GIN code to do so; which required undesirable compromises in the on-disk format, and would likely lead to bugs due to the GIN code being required to work in two very different contexts.)	2012-03-11 16:29:59 -04:00
Tom Lane	03e56f798e	Restructure SPGiST opclass interface API to support whole-index scans. The original API definition was incapable of supporting whole-index scans because there was no way to invoke leaf-value reconstruction without checking any qual conditions. Also, it was inefficient for multiple-qual-condition scans because value reconstruction got done over again for each qual condition, and because other internal work in the consistent functions likewise had to be done for each qual. To fix these issues, pass the whole scankey array to the opclass consistent functions, instead of only letting them see one item at a time. (Essentially, the loop over scankey entries is now inside the consistent functions not outside them. This makes the consistent functions a bit more complicated, but not unreasonably so.) In itself this commit does nothing except save a few cycles in multiple-qual-condition index scans, since we can't support whole-index scans on SPGiST indexes until nulls are included in the index. However, I consider this a must-fix for 9.2 because once we release it will get very much harder to change the opclass API definition.	2012-03-10 18:36:49 -05:00
Peter Eisentraut	39d74e346c	Add support for renaming constraints reviewed by Josh Berkus and Dimitri Fontaine	2012-03-10 20:19:13 +02:00
Robert Haas	07d1edb954	Extend object access hook framework to support arguments, and DROP. This allows loadable modules to get control at drop time, perhaps for the purpose of performing additional security checks or to log the event. The initial purpose of this code is to support sepgsql, but other applications should be possible as well. KaiGai Kohei, reviewed by me.	2012-03-09 14:34:56 -05:00
Tom Lane	b14953932d	Revise FDW planning API, again. Further reflection shows that a single callback isn't very workable if we desire to let FDWs generate multiple Paths, because that forces the FDW to do all work necessary to generate a valid Plan node for each Path. Instead split the former PlanForeignScan API into three steps: GetForeignRelSize, GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain, and it's substantially more flexible for complex FDWs. Add an fdw_private field to RelOptInfo so that the new functions can save state there rather than possibly having to recalculate information two or three times. In addition, we'd not thought through what would be needed to allow an FDW to set up subexpressions of its choice for runtime execution. We could treat ForeignScan.fdw_private as an executable expression but that seems likely to break existing FDWs unnecessarily (in particular, it would restrict the set of node types allowable in fdw_private to those supported by expression_tree_walker). Instead, invent a separate field fdw_exprs which will receive the postprocessing appropriate for expression trees. (One field is enough since it can be a list of expressions; also, we assume the corresponding expression state tree(s) will be held within fdw_state, so we don't need to add anything to ForeignScanState.) Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this further as we continue to work on that patch, but to me it feels a lot closer to being right now.	2012-03-09 12:49:25 -05:00
Tom Lane	08dd23cec7	Fix some issues with temp/transient tables in extension scripts. Phil Sorber reported that a rewriting ALTER TABLE within an extension update script failed, because it creates and then drops a placeholder table; the drop was being disallowed because the table was marked as an extension member. We could hack that specific case but it seems likely that there might be related cases now or in the future, so the most practical solution seems to be to create an exception to the general rule that extension member objects can only be dropped by dropping the owning extension. To wit: if the DROP is issued within the extension's own creation or update scripts, we'll allow it, implicitly performing an "ALTER EXTENSION DROP object" first. This will simplify cases such as extension downgrade scripts anyway. No docs change since we don't seem to have documented the idea that you would need ALTER EXTENSION DROP for such an action to begin with. Also, arrange for explicitly temporary tables to not get linked as extension members in the first place, and the same for the magic pg_temp_nnn schemas that are created to hold them. This prevents assorted unpleasant results if an extension script creates a temp table: the forced drop at session end would either fail or remove the entire extension, and neither of those outcomes is desirable. Note that this doesn't fix the ALTER TABLE scenario, since the placeholder table is not temp (unless the table being rewritten is). Back-patch to 9.1.	2012-03-08 15:53:09 -05:00
Tom Lane	9088d1b965	Add GetForeignColumnOptions() to foreign.c, and add some documentation. GetForeignColumnOptions provides some abstraction for accessing column-specific FDW options, on a par with the access functions that were already provided here for other FDW-related information. Adjust file_fdw.c to use GetForeignColumnOptions instead of equivalent hand-rolled code. In addition, add some SGML documentation for the functions exported by foreign.c that are meant for use by FDW authors. (This is the fdw_helper portion of the proposed pgsql_fdw patch.) Hanada Shigeru, reviewed by KaiGai Kohei	2012-03-07 18:20:58 -05:00
Tom Lane	d4bf3c9c94	Expose an API for calculating catcache hash values. Now that cache invalidation callbacks get only a hash value, and not a tuple TID (per commits `632ae6829f` and `b5282aa893`), the only way they can restrict what they invalidate is to know what the hash values mean. setrefs.c was doing this via a hard-wired assumption but that seems pretty grotty, and it'll only get worse as more cases come up. So let's expose a calculation function that takes the same parameters as SearchSysCache. Per complaint from Marko Kreen.	2012-03-07 14:51:13 -05:00
Tom Lane	19dbc34631	Add a hook for processing messages due to be sent to the server log. Use-cases for this include custom log filtering rules and custom log message transmission mechanisms (for instance, lossy log message collection, which has been discussed several times recently). As is our common practice for hooks, there's no regression test nor user-facing documentation for this, though the author did exhibit a sample module using the hook. Martin Pihlak, reviewed by Marti Raudsepp	2012-03-06 15:35:41 -05:00
Tom Lane	6b289942bf	Redesign PlanForeignScan API to allow multiple paths for a foreign table. The original API specification only allowed an FDW to create a single access path, which doesn't seem like a terribly good idea in hindsight. Instead, move the responsibility for building the Path node and calling add_path() into the FDW's PlanForeignScan function. Now, it can do that more than once if appropriate. There is no longer any need for the transient FdwPlan struct, so get rid of that. Etsuro Fujita, Shigeru Hanada, Tom Lane	2012-03-05 16:15:59 -05:00
Magnus Hagander	bc5ac36865	Add function pg_xlog_location_diff to help comparisons Comparing two xlog locations are useful for example when calculating replication lag. Euler Taveira de Oliveira, reviewed by Fujii Masao, and some cleanups from me	2012-03-04 12:22:38 +01:00
Tom Lane	0e5e167aae	Collect and use element-frequency statistics for arrays. This patch improves selectivity estimation for the array <@, &&, and @> (containment and overlaps) operators. It enables collection of statistics about individual array element values by ANALYZE, and introduces operator-specific estimators that use these stats. In addition, ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)" and "const <> ANY/ALL (array_column)" are estimated by treating them as variants of the containment operators. Since we still collect scalar-style stats about the array values as a whole, the pg_stats view is expanded to show both these stats and the array-style stats in separate columns. This creates an incompatible change in how stats for tsvector columns are displayed in pg_stats: the stats about lexemes are now displayed in the array-related columns instead of the original scalar-related columns. There are a few loose ends here, notably that it'd be nice to be able to suppress either the scalar-style stats or the array-element stats for columns for which they're not useful. But the patch is in good enough shape to commit for wider testing. Alexander Korotkov, reviewed by Noah Misch and Nathan Boley	2012-03-03 20:20:57 -05:00
Peter Eisentraut	6688d2878e	Add COLLATION FOR expression reviewed by Jaime Casanova	2012-03-02 21:12:16 +02:00
Alvaro Herrera	3433c6ba00	Remove TOAST table from pg_database The only toastable column now is datacl, but we don't really support long ACLs anyway. The TOAST table should have been removed when the pg_db_role_setting catalog was introduced in commit `2eda8dfb52`, but I forgot to do that. Per -hackers discussion on March 2011.	2012-03-01 12:50:52 -03:00
Alvaro Herrera	58e9f974dc	Fix typo in comment Haifeng Liu	2012-02-28 23:52:52 -03:00
Tom Lane	5c02a00d44	Move CRC tables to libpgport, and provide them in a separate include file. This makes it much more convenient to build tools for Postgres that are separately compiled and require a matching CRC implementation. To prevent multiple copies of the CRC polynomial tables being introduced into the postgres binaries, they are now included in the static library libpgport that is mainly meant for replacement system functions. That seems like a bit of a kludge, but there's no better place. This cleans up building of the tools pg_controldata and pg_resetxlog, which previously had to build their own copies of pg_crc.o. In the future, external programs that need access to the CRC tables can include the tables directly from the new header file pg_crc_tables.h. Daniel Farina, reviewed by Abhijit Menon-Sen and Tom Lane	2012-02-28 19:53:39 -05:00
Peter Eisentraut	973e9fb294	Add const qualifiers where they are accidentally cast away This only produces warnings under -Wcast-qual, but it's more correct and consistent in any case.	2012-02-28 12:42:08 +02:00
Alvaro Herrera	cb3a7c2b95	ALTER TABLE: skip FK validation when it's safe to do so We already skip rewriting the table in these cases, but we still force a whole table scan to validate the data. This can be skipped, and thus we can make the whole ALTER TABLE operation just do some catalog touches instead of scanning the table, when these two conditions hold: (a) Old and new pg_constraint.conpfeqop match exactly. This is actually stronger than needed; we could loosen things by way of operator families, but it'd require a lot more effort. (b) The functions, if any, implementing a cast from the foreign type to the primary opcintype are the same. For this purpose, we can consider a binary coercion equivalent to an exact type match. When the opcintype is polymorphic, require that the old and new foreign types match exactly. (Since ri_triggers.c does use the executor, the stronger check for polymorphic types is no mere future-proofing. However, no core type exercises its necessity.) Author: Noah Misch Committer's note: catalog version bumped due to change of the Constraint node. I can't actually find any way to have such a node in a stored rule, but given that we have "out" support for them, better be safe.	2012-02-27 19:10:24 -03:00
Peter Eisentraut	66f0cf7da8	Remove useless const qualifier Claiming that the typevar argument to DefineCompositeType() is const was a plain lie. A similar case in DefineVirtualRelation() was already changed in passing in commit `1575fbcb`. Also clean up the now unnecessary casts that used to cast away the const.	2012-02-26 15:22:27 +02:00
Tom Lane	587359479a	Avoid repeated creation/freeing of per-subre DFAs during regex search. In nested sub-regex trees, lower-level nodes created DFAs and then destroyed them again before exiting, which is a bit dumb considering that the recursive search is likely to call those nodes again later. Instead cache each created DFA until the end of pg_regexec(). This is basically a space for time tradeoff, in that it might increase the maximum memory usage. However, in most regex patterns there are not all that many subre nodes, so not that many DFAs --- and in any case, the peak usage occurs when reaching the bottom recursion level, and except for alternation cases that's going to be the same anyway.	2012-02-24 18:40:30 -05:00
Tom Lane	3cbfe485e4	Remove useless "retry memory" logic within regex engine. Apparently some primordial version of Spencer's engine needed cdissect() and child functions to be able to continue matching from a previous position when re-called. That is dead code, though, since trivial inspection shows that cdissect can never be entered without having previously done zapmem which resets the relevant retry counter. I have also verified experimentally that no case in the Tcl regression tests reaches cdissect with a nonzero retry value. Accordingly, remove that logic. This doesn't really save any noticeable number of cycles in itself, but it is one step towards making dissect() and cdissect() equivalent, which will allow removing hundreds of lines of near-duplicated code. Since struct subre's "retry" field is no longer particularly related to any kind of retry, rename it to "id". As of this commit it's only used for identifying a subre node in debug printouts, so you might think we should get rid of the field entirely; but I have a plan for another use.	2012-02-24 18:40:28 -05:00
Tom Lane	173e29aa5d	Fix the general case of quantified regex back-references. Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has only capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.	2012-02-24 01:41:03 -05:00
Tom Lane	077711c2e3	Remove arbitrary limitation on length of common name in SSL certificates. Both libpq and the backend would truncate a common name extracted from a certificate at 32 bytes. Replace that fixed-size buffer with dynamically allocated string so that there is no hard limit. While at it, remove the code for extracting peer_dn, which we weren't using for anything; and don't bother to store peer_cn longer than we need it in libpq. This limit was not so terribly unreasonable when the code was written, because we weren't using the result for anything critical, just logging it. But now that there are options for checking the common name against the server host name (in libpq) or using it as the user's name (in the server), this could result in undesirable failures. In the worst case it even seems possible to spoof a server name or user name, if the correct name is exactly 32 bytes and the attacker can persuade a trusted CA to issue a certificate in which that string is a prefix of the certificate's common name. (To exploit this for a server name, he'd also have to send the connection astray via phony DNS data or some such.) The case that this is a realistic security threat is a bit thin, but nonetheless we'll treat it as one. Back-patch to 8.4. Older releases contain the faulty code, but it's not a security problem because the common name wasn't used for anything interesting. Reported and patched by Heikki Linnakangas Security: CVE-2012-0867	2012-02-23 15:48:04 -05:00
Tom Lane	74e29162a4	Allow MinGW builds to use standardly-named OpenSSL libraries. In the Fedora variant of MinGW, the openssl libraries have their normal names, not libeay32 and libssleay32. Adjust configure probes to allow that, per bug #6486. Tomasz Ostrowski	2012-02-23 15:05:08 -05:00
Robert Haas	2254367435	Make EXPLAIN (BUFFERS) track blocks dirtied, as well as those written. Also expose the new counters through pg_stat_statements. Patch by me. Review by Fujii Masao and Greg Smith.	2012-02-22 20:33:05 -05:00
Peter Eisentraut	a445cb92ef	Add parameters for controlling locations of server-side SSL files This allows changing the location of the files that were previously hard-coded to server.crt, server.key, root.crt, root.crl. server.crt and server.key continue to be the default settings and are thus required to be present by default if SSL is enabled. But the settings for the server-side CA and CRL are now empty by default, and if they are set, the files are required to be present. This replaces the previous behavior of ignoring the functionality if the files were not found.	2012-02-22 23:40:46 +02:00
Alvaro Herrera	a417f85e1d	REASSIGN OWNED: Support foreign data wrappers and servers This was overlooked when implementing those kinds of objects, in commit `cae565e503`. Per report from Pawel Casperek.	2012-02-22 17:33:12 -03:00
Tom Lane	9789c99d01	Cosmetic cleanup for commit `a760893dbd`. Mostly, fixing overlooked comments.	2012-02-21 14:14:16 -05:00
Tom Lane	27af91438b	Create the beginnings of internals documentation for the regex code. Create src/backend/regex/README to hold an implementation overview of the regex package, and fill it in with some preliminary notes about the code's DFA/NFA processing and colormap management. Much more to do there of course. Also, improve some code comments around the colormap and cvec code. No functional changes except to add one missing assert.	2012-02-19 18:58:23 -05:00
Andrew Dunstan	2f582f76b1	Improve pretty printing of viewdefs. Some line feeds are added to target lists and from lists to make them more readable. By default they wrap at 80 columns if possible, but the wrap column is also selectable - if 0 it wraps after every item. Andrew Dunstan, reviewed by Hitoshi Harada.	2012-02-19 11:43:46 -05:00
Tom Lane	4767bc8ff2	Improve statistics estimation to make some use of DISTINCT in sub-queries. Formerly, we just punted when trying to estimate stats for variables coming out of sub-queries using DISTINCT, on the grounds that whatever stats we might have for underlying table columns would be inapplicable. But if the sub-query has only one DISTINCT column, we can consider its output variable as being unique, which is useful information all by itself. The scope of this improvement is pretty narrow, but it costs nearly nothing, so we might as well do it. Per discussion with Andres Freund. This patch differs from the draft I submitted yesterday in updating various comments about vardata.isunique (to reflect its extended meaning) and in tweaking the interaction with security_barrier views. There does not seem to be a reason why we can't use this sort of knowledge even when the sub-query is such a view.	2012-02-16 17:34:00 -05:00
Tom Lane	4bfe68dfab	Run a portal's cleanup hook immediately when pushing it to FAILED state. This extends the changes of commit `6252c4f9e2` so that we run the cleanup hook earlier for failure cases as well as success cases. As before, the point is to avoid an assertion failure from an Assert I added in commit `a874fe7b4c`, which was meant to check that no user-written code can be called during portal cleanup. This fixes a case reported by Pavan Deolasee in which the Assert could be triggered during backend exit (see the new regression test case), and also prevents the possibility that the cleanup hook is run after portions of the portal's state have already been recycled. That doesn't really matter in current usage, but it foreseeably could matter in the future. Back-patch to 9.1 where the Assert in question was added.	2012-02-15 16:19:01 -05:00
Tom Lane	398f70ec07	Preserve column names in the execution-time tupledesc for a RowExpr. The hstore and json datatypes both have record-conversion functions that pay attention to column names in the composite values they're handed. We used to not worry about inserting correct field names into tuple descriptors generated at runtime, but given these examples it seems useful to do so. Observe the nicer-looking results in the regression tests whose results changed. catversion bump because there is a subtle change in requirements for stored rule parsetrees: RowExprs from ROW() constructs now have to include field names. Andrew Dunstan and Tom Lane	2012-02-14 17:34:56 -05:00
Robert Haas	cd30728fb2	Allow LEAKPROOF functions for better performance of security views. We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.	2012-02-13 22:21:14 -05:00
Tom Lane	cbba55d6d7	Support min/max index optimizations on boolean columns. Since bool_and() is equivalent to min(), and bool_or() to max(), we might as well let them be index-optimized in the same way. The practical value of this is debatable at best, but it seems nearly cost-free to enable it. Code-wise, we need only adjust the entries in pg_aggregate. There is a measurable planning speed penalty for a query involving one of these aggregates, but it is only a few percent in simple cases, so that seems acceptable. Marti Raudsepp, reviewed by Abhijit Menon-Sen	2012-02-08 12:41:48 -05:00
Tom Lane	3db6524fe6	Mark some more I/O-conversion-invoking functions as stable not volatile. When written, textanycat, anytextcat, quote_literal, and quote_nullable were marked volatile, because they could invoke arbitrary type-specific output functions as part of casting their anyelement arguments to text. Since then, we have defined a project policy that I/O functions must not be volatile, as per commit `aab353a60b`. So these functions can safely be downgraded to stable. Most of the time this makes no difference since they'll get inlined anyway, but as noted by Andrew Dunstan, there are cases where the volatile marking prevents optimizations that the planner does before function inlining. (I think I might have overlooked these functions in the earlier commit on the grounds that inlining would make it moot, but not so --- tgl) This change results in a change in the expected output of the json regression tests, because the planner can now flatten a sub-select that it failed to before. The old output is preferable, but getting that back will require some as-yet-unfinished work on RowExpr handling. Marti Raudsepp	2012-02-08 11:29:29 -05:00
Robert Haas	c13897983a	Add transform functions for various temporal typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds in some cases. Noah Misch, with trivial changes by me.	2012-02-08 09:33:37 -05:00
Heikki Linnakangas	1a01560cbb	Rename LWLockWaitUntilFree to LWLockAcquireOrWait. LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.	2012-02-08 09:17:13 +02:00
Robert Haas	4f658dc851	Support fls(). The immediate impetus for this is that Noah Misch's patch to elide unnecessary table and index rebuilds when changing typmod for temporal types uses it; and this is extracted from that patch, with some further commentary by me. But it seems logically separate from the remainder of the patch, so I'm committing it separately; this is not the first time someone has wanted fls() in the backend and probably won't be the last. If we end up using this in more performance-critical spots it may be worthwhile to add some architecture-specific optimizations to our src/port version of fls() - e.g. any x86 platform can implement this using the assembly instruction BSRL. But performance won't matter a bit for assessing typmod changes, so I'm not worried about that right now.	2012-02-07 13:45:46 -05:00
Robert Haas	f7d7dade8a	Add a transform function for varbit typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when the new type is unconstraint varbit, or when the allowable number of bits is not decreasing. Noah Misch, with review and a fix for an OID collision by me.	2012-02-07 12:42:50 -05:00
Robert Haas	3cc0800829	Add a transform function for numeric typmod coercisions. This enables ALTER TABLE to skip table and index rebuilds when a column is changed to an unconstrained numeric, or when the scale is unchanged and the precision does not decrease. Noah Misch, with a few stylistic changes and a fix for an OID collision by me.	2012-02-07 12:08:26 -05:00
Robert Haas	af7914c662	Add TIMING option to EXPLAIN, to allow eliminating of timing overhead. Sometimes it may be useful to get actual row counts out of EXPLAIN (ANALYZE) without paying the cost of timing every node entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.	2012-02-07 11:23:04 -05:00
Andrew Dunstan	39909d1d39	Add array_to_json and row_to_json functions. Also move the escape_json function from explain.c to json.c where it seems to belong. Andrew Dunstan, Reviewd by Abhijit Menon-Sen.	2012-02-03 12:11:16 -05:00
Robert Haas	0ed7445d73	Allow spgist's text_ops to handle pattern-matching operators. This was presumably intended to work this way all along, but a few key bits of indxpath.c didn't get the memo. Robert Haas and Tom Lane	2012-02-02 13:10:56 -05:00
Robert Haas	c327108140	Catversion bump for JSON patch. Sigh.	2012-01-31 11:51:51 -05:00
Robert Haas	5384a73f98	Built-in JSON data type. Like the XML data type, we simply store JSON data as text, after checking that it is valid. More complex operations such as canonicalization and comparison may come later, but this is enough for not. There are a few open issues here, such as whether we should attempt to detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets the basic framework in place.	2012-01-31 11:48:23 -05:00
Heikki Linnakangas	9b38d46d9f	Make group commit more effective. When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that.	2012-01-30 16:53:48 +02:00
Simon Riggs	73f617f13f	Various minor comments changes from bgwriter to checkpointer.	2012-01-30 14:34:25 +00:00
Tom Lane	ad10853b30	Assorted comment fixes, mostly just typos, but some obsolete statements. YAMAMOTO Takashi	2012-01-29 19:23:56 -05:00
Tom Lane	dd243b3e40	Fix typo in comment. Peter Geoghegan	2012-01-29 18:56:35 -05:00
Tom Lane	e2fa76d80b	Use parameterized paths to generate inner indexscans more flexibly. This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.	2012-01-27 19:26:38 -05:00
Peter Eisentraut	b376ec6fa5	Show default privileges in information schema Hitherto, the information schema only showed explicitly granted privileges that were visible in the *acl catalog columns. If no privileges had been granted, the implicit privileges were not shown. To fix that, add an SQL-accessible version of the acldefault() function, and use that inside the aclexplode() calls to substitute the catalog-specific default privilege set for null values. reviewed by Abhijit Menon-Sen	2012-01-27 21:58:51 +02:00
Peter Eisentraut	2787458362	Disallow ALTER DOMAIN on non-domain type everywhere This has been the behavior already in most cases, but through omission, ALTER DOMAIN / OWNER TO and ALTER DOMAIN / SET SCHEMA would silently work on non-domain types as well.	2012-01-27 21:20:34 +02:00
Peter Eisentraut	8137f2c323	Hide most variable-length fields from Form_pg_* structs Those fields only appear in the structs so that genbki.pl can create the BKI bootstrap files for the catalogs. But they are not actually usable from C. So hiding them can prevent coding mistakes, saves stack space, and can help the compiler. In certain catalogs, the first variable-length field has been kept visible after manual inspection. These exceptions are noted in C comments. reviewed by Tom Lane	2012-01-27 20:16:17 +02:00
Heikki Linnakangas	6d90eaaa89	Make bgwriter sleep longer when it has no work to do, to save electricity. To make it wake up promptly when activity starts again, backends nudge it by setting a latch in MarkBufferDirty(). The latch is kept set while bgwriter is active, so there is very little overhead from that when the system is busy. It is only armed before going into longer sleep. Peter Geoghegan, with some changes by me.	2012-01-26 18:39:13 +02:00
Magnus Hagander	61cb8c5abb	Add deadlock counter to pg_stat_database Adds a counter that tracks number of deadlocks that occurred in each database to pg_stat_database. Magnus Hagander, reviewed by Jaime Casanova	2012-01-26 15:58:19 +01:00
Robert Haas	0e549697d1	Classify DROP operations by whether or not they are user-initiated. This doesn't do anything useful just yet, but is intended as supporting infrastructure for allowing sepgsql to sensibly check DROP permissions. KaiGai Kohei and Robert Haas	2012-01-26 09:30:27 -05:00
Magnus Hagander	bc3347484a	Track temporary file count and size in pg_stat_database Add counters for number and size of temporary files used for spill-to-disk queries for each database to the pg_stat_database view. Tomas Vondra, review by Magnus Hagander	2012-01-26 14:41:19 +01:00
Robert Haas	9f9135d129	Instrument index-only scans to count heap fetches performed. Patch by me; review by Tom Lane, Jeff Davis, and Peter Geoghegan.	2012-01-25 20:41:52 -05:00
Simon Riggs	8366c7803e	Allow pg_basebackup from standby node with safety checking. Base backup follows recommended procedure, plus goes to great lengths to ensure that partial page writes are avoided. Jun Ishizuka and Fujii Masao, with minor modifications	2012-01-25 18:02:04 +00:00
Alvaro Herrera	74ab96a45e	Add pg_trigger_depth() function This reports the depth level of triggers currently in execution, or zero if not called from inside a trigger. No catversion bump in this patch, but you have to initdb if you want access to the new function. Author: Kevin Grittner	2012-01-25 13:22:54 -03:00
Simon Riggs	443b4821f1	Add new replication mode synchronous_commit = 'write'. Replication occurs only to memory on standby, not to disk, so provides additional performance if user wishes to reduce durability level slightly. Adds concept of multiple independent sync rep queues. Fujii Masao and Simon Riggs	2012-01-24 20:22:37 +00:00
Simon Riggs	c172b7b02e	Resolve timing issue with logging locks for Hot Standby. We log AccessExclusiveLocks for replay onto standby nodes, but because of timing issues on ProcArray it is possible to log a lock that is still held by a just committed transaction that is very soon to be removed. To avoid any timing issue we avoid applying locks made by transactions with InvalidXid. Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee	2012-01-23 23:37:32 +00:00
Simon Riggs	b8a91d9d1c	ALTER <thing> [IF EXISTS] ... allows silent DDL if required, e.g. ALTER FOREIGN TABLE IF EXISTS foo RENAME TO bar Pavel Stehule	2012-01-23 23:25:04 +00:00
Robert Haas	cc53a1e7cc	Add bitwise AND, OR, and NOT operators for macaddr data type. Brendan Jurd, reviewed by Fujii Masao	2012-01-19 15:25:14 -05:00
Magnus Hagander	4f42b546fd	Separate state from query string in pg_stat_activity This separates the state (running/idle/idleintransaction etc) into it's own field ("state"), and leaves the query field containing just query text. The query text will now mean "current query" when a query is running and "last query" in other states. Accordingly,the field has been renamed from current_query to query. Since backwards compatibility was broken anyway to make that, the procpid field has also been renamed to pid - along with the same field in pg_stat_replication for consistency. Scott Mead and Magnus Hagander, review work from Greg Smith	2012-01-19 14:19:20 +01:00
Robert Haas	1575fbcb79	Prevent adding relations to a concurrently dropped schema. In the previous coding, it was possible for a relation to be created via CREATE TABLE, CREATE VIEW, CREATE SEQUENCE, CREATE FOREIGN TABLE, etc. in a schema while that schema was meanwhile being concurrently dropped. This led to a pg_class entry with an invalid relnamespace value. The same problem could occur if a relation was moved using ALTER .. SET SCHEMA while the target schema was being concurrently dropped. This patch prevents both of those scenarios by locking the schema to which the relation is being added using AccessShareLock, which conflicts with the AccessExclusiveLock taken by DROP. As a desirable side effect, this also prevents the use of CREATE OR REPLACE VIEW to queue for an AccessExclusiveLock on a relation on which you have no rights: that will now fail immediately with a permissions error, before trying to obtain a lock. We need similar protection for all other object types, but as everything other than relations uses a slightly different set of code paths, I'm leaving that for a separate commit. Original complaint (as far as I could find) about CREATE by Nikhil Sontakke; risk for ALTER .. SET SCHEMA pointed out by Tom Lane; further details by Dan Farina; patch by me; review by Hitoshi Harada.	2012-01-16 09:49:34 -05:00
Tom Lane	21b446dd09	Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows. In commit `7b0d0e9356`, I made CLUSTER and VACUUM FULL try to preserve toast value OIDs from the original toast table to the new one. However, if we have to copy both live and recently-dead versions of a row that has a toasted column, those versions may well reference the same toast value with the same OID. The patch then led to duplicate-key failures as we tried to insert the toast value twice with the same OID. (The previous behavior was not very desirable either, since it would have silently inserted the same value twice with different OIDs. That wastes space, but what's worse is that the toast values inserted for already-dead heap rows would not be reclaimed by subsequent ordinary VACUUMs, since they go into the new toast table marked live not deleted.) To fix, check if the copied OID already exists in the new toast table, and if so, assume that it stores the desired value. This is reasonably safe since the only case where we will copy an OID from a previous toast pointer is when toast_insert_or_update was given that toast pointer and so we just pulled the data from the old table; if we got two different values that way then we have big problems anyway. We do have to assume that no other backend is inserting items into the new toast table concurrently, but that's surely safe for CLUSTER and VACUUM FULL. Per bug #6393 from Maxim Boguk. Back-patch to 9.0, same as the previous patch.	2012-01-12 16:40:14 -05:00
Heikki Linnakangas	1b9dea04b5	Remove useless 'needlock' argument from GetXLogInsertRecPtr. It was always passed as 'true'.	2012-01-11 11:01:47 +02:00
Peter Eisentraut	db49517c62	Rename the internal structures of the CREATE TABLE (LIKE ...) facility The original implementation of this interpreted it as a kind of "inheritance" facility and named all the internal structures accordingly. This turned out to be very confusing, because it has nothing to do with the INHERITS feature. So rename all the internal parser infrastructure, update the comments, adjust the error messages, and split up the regression tests.	2012-01-07 23:02:33 +02:00
Tom Lane	0a41e86584	Use __sync_lock_test_and_set() for spinlocks on ARM, if available. Historically we've used the SWPB instruction for TAS() on ARM, but this is deprecated and not available on ARMv6 and later. Instead, make use of a GCC builtin if available. We'll still fall back to SWPB if not, so as not to break existing ports using older GCC versions. Eventually we might want to try using __sync_lock_test_and_set() on some other architectures too, but for now that seems to present only risk and not reward. Back-patch to all supported versions, since people might want to use any of them on more recent ARM chips. Martin Pitt	2012-01-07 15:38:52 -05:00
Robert Haas	1fc3d18faa	Slightly reorganize struct SnapshotData. This squeezes out a bunch of alignment padding, reducing the size from 72 to 56 bytes on my machine. At least in my testing, this didn't produce any measurable performance improvement, but the space savings seem like enough justification. Andres Freund	2012-01-06 22:56:00 -05:00
Robert Haas	1489e2f26a	Improve behavior of concurrent ALTER TABLE, and do some refactoring. ALTER TABLE (and ALTER VIEW, ALTER SEQUENCE, etc.) now use a RangeVarGetRelid callback to check permissions before acquiring a table lock. We also now use the same callback for all forms of ALTER TABLE, rather than having separate, almost-identical callbacks for ALTER TABLE .. SET SCHEMA and ALTER TABLE .. RENAME, and no callback at all for everything else. I went ahead and changed the code so that no form of ALTER TABLE works on foreign tables; you must use ALTER FOREIGN TABLE instead. In 9.1, it was possible to use ALTER TABLE .. SET SCHEMA or ALTER TABLE .. RENAME on a foreign table, but not any other form of ALTER TABLE, which did not seem terribly useful or consistent. Patch by me; review by Noah Misch.	2012-01-06 22:42:26 -05:00
Robert Haas	33aaa139e6	Make the number of CLOG buffers adaptive, based on shared_buffers. Previously, this was hardcoded: we always had 8. Performance testing shows that isn't enough, especially on big SMP systems, so we allow it to scale up as high as 32 when there's adequate memory. On the flip side, when shared_buffers is very small, drop the number of CLOG buffers down to as little as 4, so that we can start the postmaster even when very little shared memory is available. Per extensive discussion with Simon Riggs, Tom Lane, and others on pgsql-hackers.	2012-01-06 14:32:18 -05:00
Peter Eisentraut	104e7dac28	Improve ALTER DOMAIN / DROP CONSTRAINT with nonexistent constraint ALTER DOMAIN / DROP CONSTRAINT on a nonexistent constraint name did not report any error. Now it reports an error. The IF EXISTS option was added to get the usual behavior of ignoring nonexistent objects to drop.	2012-01-05 19:48:55 +02:00
Tom Lane	bc2a050d40	Use a non-locking initial test in TAS_SPIN on PPC. Further testing convinces me that this is helpful at sufficiently high contention levels, though it's still worrisome that it loses slightly at lower contention levels. Per Manabu Ori.	2012-01-03 16:00:06 -05:00
Andrew Dunstan	63876d3bac	Support for building with MS Visual Studio 2010. Brar Piening, reviewed by Craig Ringer.	2012-01-03 08:44:26 -05:00
Tom Lane	631beeac35	Use LWSYNC in place of SYNC/ISYNC in PPC spinlocks, where possible. This is allegedly a win, at least on some PPC implementations, according to the PPC ISA documents. However, as with LWARX hints, some PPC platforms give an illegal-instruction failure. Use the same trick as before of assuming that PPC64 platforms will accept it; we might need to refine that based on experience, but there are other projects doing likewise according to google. I did not add an assembler compatibility test because LWSYNC has been around much longer than hint bits, and it seems unlikely that any toolchains currently in use don't recognize it.	2012-01-02 00:02:02 -05:00
Tom Lane	8496c6cd77	Use 4-byte slock_t on both PPC and PPC64. Previously we defined slock_t as 8 bytes on PPC64, but the TAS assembly code uses word-wide operations regardless, so that the second word was just wasted space. There doesn't appear to be any performance benefit in adding the second word, so get rid of it to simplify the code.	2012-01-02 00:02:01 -05:00
Tom Lane	5cfa8dd300	Use mutex hint bit in PPC LWARX instructions, where possible. The hint bit makes for a small but measurable performance improvement in access to contended spinlocks. On the other hand, some PPC chips give an illegal-instruction failure. There doesn't seem to be a completely bulletproof way to tell whether the hint bit will cause an illegal-instruction failure other than by trying it; but most if not all 64-bit PPC machines should accept it, so follow the Linux kernel's lead and assume it's okay to use it in 64-bit builds. Of course we must also check whether the assembler accepts the command, since even with a recent CPU the toolchain could be old. Patch by Manabu Ori, significantly modified by me.	2012-01-02 00:02:00 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Simon Riggs	64233902d2	Send new protocol keepalive messages to standby servers. Allows streaming replication users to calculate transfer latency and apply delay via internal functions. No external functions yet.	2011-12-31 13:30:26 +00:00
Peter Eisentraut	d383c23f6f	Remove support for on_exit() All supported platforms support the C89 standard function atexit() (SunOS 4 probably being the last one not to), and supporting both makes the code clumsy.	2011-12-27 20:57:59 +02:00
Tom Lane	472d3935a2	Rethink representation of index clauses' mapping to index columns. In commit `e2c2c2e8b1` I made use of nested list structures to show which clauses went with which index columns, but on reflection that's a data structure that only an old-line Lisp hacker could love. Worse, it adds unnecessary complication to the many places that don't much care which clauses go with which index columns. Revert to the previous arrangement of flat lists of clauses, and instead add a parallel integer list of column numbers. The places that care about the pairing can chase both lists with forboth(), while the places that don't care just examine one list the same as before. The only real downside to this is that there are now two more lists that need to be passed to amcostestimate functions in case they care about column matching (which btcostestimate does, so not passing the info is not an option). Rather than deal with 11-argument amcostestimate functions, pass just the IndexPath and expect the functions to extract fields from it. That gets us down to 7 arguments which is better than 11, and it seems more future-proof against likely additions to the information we keep about an index path.	2011-12-24 19:03:21 -05:00
Tom Lane	e2c2c2e8b1	Improve planner's handling of duplicated index column expressions. It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.	2011-12-23 18:45:14 -05:00
Robert Haas	d5448c7d31	Add bytea_agg, parallel to string_agg. Pavel Stehule	2011-12-23 08:40:25 -05:00
Robert Haas	99b60fc04e	Catversion bump for commit `0e4611c023`. It changed the format of stored rules.	2011-12-22 17:25:35 -05:00
Robert Haas	0e4611c023	Add a security_barrier option for views. When a view is marked as a security barrier, it will not be pulled up into the containing query, and no quals will be pushed down into it, so that no function or operator chosen by the user can be applied to rows not exposed by the view. Views not configured with this option cannot provide robust row-level security, but will perform far better. Patch by KaiGai Kohei; original problem report by Heikki Linnakangas (in October 2009!). Review (in earlier versions) by Noah Misch and others. Design advice by Tom Lane and myself. Further review and cleanup by me.	2011-12-22 16:16:31 -05:00
Peter Eisentraut	f90dd28062	Add ALTER DOMAIN ... RENAME You could already rename domains using ALTER TYPE, but with this new command it is more consistent with how other commands treat domains as a subcategory of types.	2011-12-22 22:43:56 +02:00
Robert Haas	cbe24a6dd8	Improve behavior of concurrent CLUSTER. In the previous coding, a user could queue up for an AccessExclusiveLock on a table they did not have permission to cluster, thus potentially interfering with access by authorized users who got stuck waiting behind the AccessExclusiveLock. This approach avoids that. cluster() has the same permissions-checking requirements as REINDEX TABLE, so this commit moves the now-shared callback to tablecmds.c and renames it, per discussion with Noah Misch.	2011-12-21 15:17:28 -05:00
Robert Haas	d573e239f0	Take fewer snapshots. When a PORTAL_ONE_SELECT query is executed, we can opportunistically reuse the parse/plan shot for the execution phase. This cuts down the number of snapshots per simple query from 2 to 1 for the simple protocol, and 3 to 2 for the extended protocol. Since we are only reusing a snapshot taken early in the processing of the same protocol message, the change shouldn't be user-visible, except that the remote possibility of the planning and execution snapshots being different is eliminated. Note that this change does not make it safe to assume that the parse/plan snapshot will certainly be reused; that will currently only happen if PortalStart() decides to use the PORTAL_ONE_SELECT strategy. It might be worth trying to provide some stronger guarantees here in the future, but for now we don't. Patch by me; review by Dimitri Fontaine.	2011-12-21 09:16:55 -05:00
Peter Eisentraut	729205571e	Add support for privileges on types This adds support for the more or less SQL-conforming USAGE privilege on types and domains. The intent is to be able restrict which users can create dependencies on types, which restricts the way in which owners can alter types. reviewed by Yeb Havinga	2011-12-20 00:05:19 +02:00
Alvaro Herrera	05e992e90e	Forgot catversion bump on previous patch Per Tom	2011-12-19 17:45:17 -03:00
Tom Lane	8f57b064fd	Rename updateNodeLink to spgUpdateNodeLink. On reflection, the original name seems way too generic for a global symbol. A quick check shows this is the only exported function name in SP-GiST that doesn't begin with "spg" or contain "SpGist", so the rest of them seem all right.	2011-12-19 15:38:32 -05:00
Alvaro Herrera	61d81bd28d	Allow CHECK constraints to be declared ONLY This makes them enforceable only on the parent table, not on children tables. This is useful in various situations, per discussion involving people bitten by the restrictive behavior introduced in 8.4. Message-Id: 8762mp93iw.fsf@comcast.net CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com Authors: Nikhil Sontakke, Alex Hunsaker Reviewed by Robert Haas and myself	2011-12-19 17:30:23 -03:00
Tom Lane	9220362493	Teach SP-GiST to do index-only scans. Operator classes can specify whether or not they support this; this preserves the flexibility to use lossy representations within an index. In passing, move constant data about a given index into the rd_amcache cache area, instead of doing fresh lookups each time we start an index operation. This is mainly to try to make sure that spgcanreturn() has insignificant cost; I still don't have any proof that it matters for actual index accesses. Also, get rid of useless copying of FmgrInfo pointers; we can perfectly well use the relcache's versions in-place.	2011-12-19 14:58:41 -05:00
Tom Lane	3695a55513	Replace simple constant pg_am.amcanreturn with an AM support function. The need for this was debated when we put in the index-only-scan feature, but at the time we had no near-term expectation of having AMs that could support such scans for only some indexes; so we kept it simple. However, the SP-GiST AM forces the issue, so let's fix it. This patch only installs the new API; no behavior actually changes.	2011-12-18 15:50:37 -05:00
Tom Lane	5577ca5bfb	Remove bogus entries in gist point_ops operator class. These entries could never be matched to an index clause because they don't have the index datatype on the left-hand side of the operator. (Their commutators are in the opclass, which is sensible, but that doesn't mean these operators should be.) Spotted by a test that I recently added to opr_sanity to catch exactly this type of thinko. AFAICT there is no code in gistproc.c that is specifically meant to cover these cases, so nothing to remove at that level.	2011-12-17 18:51:00 -05:00
Tom Lane	8daeb5ddd6	Add SP-GiST (space-partitioned GiST) index access method. SP-GiST is comparable to GiST in flexibility, but supports non-balanced partitioned search structures rather than balanced trees. As described at PGCon 2011, this new indexing structure can beat GiST in both index build time and query speed for search problems that it is well matched to. There are a number of areas that could still use improvement, but at this point the code seems committable. Teodor Sigaev and Oleg Bartunov, with considerable revisions by Tom Lane	2011-12-17 16:42:30 -05:00
Robert Haas	0d76b60db4	Various micro-optimizations for GetSnapshopData(). Heikki Linnakangas had the idea of rearranging GetSnapshotData to avoid checking for sub-XIDs when no top-level XID is present. This patch does that plus further a bit of further, related rearrangement. Benchmarking show a significant improvement on unlogged tables at higher concurrency levels, and mostly indifferent result on permanent tables (which are presumably bottlenecked elsewhere). Most of the benefit seems to come from using the new NormalTransactionIdPrecedes() macro rather than the function call TransactionIdPrecedes().	2011-12-16 21:48:47 -05:00
Andrew Dunstan	6d09b2105f	include_if_exists facility for config file. This works the same as include, except that an error is not thrown if the file is missing. Instead the fact that it's missing is logged. Greg Smith, reviewed by Euler Taveira de Oliveira.	2011-12-15 19:40:58 -05:00
Robert Haas	74a1d4fe7c	Improve behavior of concurrent rename statements. Previously, renaming a table, sequence, view, index, foreign table, column, or trigger checked permissions before locking the object, which meant that if permissions were revoked during the lock wait, we would still allow the operation. Similarly, if the original object is dropped and a new one with the same name is created, the operation will be allowed if we had permissions on the old object; the permissions on the new object don't matter. All this is now fixed. Along the way, attempting to rename a trigger on a foreign table now gives the same error message as trying to create one there in the first place (i.e. that it's not a table or view) rather than simply stating that no trigger by that name exists. Patch by me; review by Noah Misch.	2011-12-15 19:02:38 -05:00
Robert Haas	f6835ea90a	Fix typo.	2011-12-15 18:22:29 -05:00
Tom Lane	2dd9322ba6	Move BKP_REMOVABLE bit from individual WAL records to WAL page headers. Removing this bit from xl_info allows us to restore the old limit of four (not three) separate pages touched by a WAL record, which is needed for the upcoming SP-GiST feature, and will likely be useful elsewhere in future. When we implemented XLR_BKP_REMOVABLE in 2007, we had to do it like that because no special WAL-visible action was taken when starting a backup. However, now we force a segment switch when starting a backup, so a compressing WAL archiver (such as pglesslog) that uses the state shown in the current page header will not be fooled as to removability of backup blocks. The only downside is that the archiver will not return to compressing mode for up to one WAL page after the backup is over, which is a small price to pay for getting back the extra xl_info bit. In any case the archiver could look for XLOG_BACKUP_END records if it thought it was worth the trouble to do so. Bump XLOG_PAGE_MAGIC since this is effectively a change in WAL format.	2011-12-12 16:22:14 -05:00
Heikki Linnakangas	8409b60476	Revert the behavior of inet/cidr functions to not unpack the arguments. I forgot to change the functions to use the PG_GETARG_INET_PP() macro, when I changed DatumGetInetP() to unpack the datum, like Datum*P macros usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP() macro, and didn't notice because it wasn't used. This fixes the memory leak when sorting inet values, as reported by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like the previous patch that broke it.	2011-12-12 10:10:53 +02:00
Andrew Dunstan	8e461ca5a9	Remove define inadvertantly left over from testing.	2011-12-10 16:29:37 -05:00
Andrew Dunstan	1a0c76c32f	Enable compiling with the mingw-w64 32 bit compiler. Original patch by Lars Kanis, reviewed by Nishiyama Tomoaki and tweaked some by me. This compiler, or at least the latest version of it, is currently broken, and only passes the regression tests if built with -O0.	2011-12-10 15:35:41 -05:00
Peter Eisentraut	5bcf8ede45	Add ALTER FOREIGN DATA WRAPPER / RENAME and ALTER SERVER / RENAME	2011-12-09 20:42:30 +02:00
Heikki Linnakangas	9f0d2bdc88	Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery, we don't reach consistency before replaying all of the WAL. Rename the variable to reachedConsistency, to make its intention clearer. In master, that was an active bug because of the recent patch to immediately PANIC if a reference to a missing page is found in WAL after reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0, the only consequence was a misleading "consistent recovery state reached at %X/%X" message in the log at the beginning of crash recovery (the database is not consistent at that point yet). In 8.4, the log message was not printed in crash recovery, even though there was a similar reachedMinRecoveryPoint local variable that was also set early. So, backpatch to 9.1 and 9.0.	2011-12-09 15:21:12 +02:00
Heikki Linnakangas	5d8a894e30	Cancel running query if it is detected that the connection to the client is lost. The only way we detect that at the moment is when write() fails when we try to write to the socket. Florian Pflug with small changes by me, reviewed by Greg Jaskiewicz.	2011-12-09 14:21:36 +02:00
Peter Eisentraut	d5f23af6bf	Add const qualifiers to node inspection functions Thomas Munro	2011-12-07 21:46:56 +02:00
Magnus Hagander	16d8e594ac	Remove spclocation field from pg_tablespace Instead, add a function pg_tablespace_location(oid) used to return the same information, and do this by reading the symbolic link. Doing it this way makes it possible to relocate a tablespace when the database is down by simply changing the symbolic link.	2011-12-07 10:37:33 +01:00
Tom Lane	c6e3ac11b6	Create a "sort support" interface API for faster sorting. This patch creates an API whereby a btree index opclass can optionally provide non-SQL-callable support functions for sorting. In the initial patch, we only use this to provide a directly-callable comparator function, which can be invoked with a bit less overhead than the traditional SQL-callable comparator. While that should be of value in itself, the real reason for doing this is to provide a datatype-extensible framework for more aggressive optimizations, as in Peter Geoghegan's recent work. Robert Haas and Tom Lane	2011-12-07 00:19:39 -05:00
Heikki Linnakangas	1e616f6391	During recovery, if we reach consistent state and still have entries in the invalid-page hash table, PANIC immediately. Immediate PANIC is much better than waiting for end-of-recovery, which is what we did before, because the end-of-recovery might not come until months later if this is a standby server. Also refrain from creating a restartpoint if there are invalid-page entries in the hash table. Restarting recovery from such a restartpoint would not see the invalid references, and wouldn't be able to cross-check them when consistency is reached. That wouldn't matter when things are going smoothly, but the more sanity checks you have the better. Fujii Masao	2011-12-02 10:49:54 +02:00
Robert Haas	2ad36c4e44	Improve table locking behavior in the face of current DDL. In the previous coding, callers were faced with an awkward choice: look up the name, do permissions checks, and then lock the table; or look up the name, lock the table, and then do permissions checks. The first choice was wrong because the results of the name lookup and permissions checks might be out-of-date by the time the table lock was acquired, while the second allowed a user with no privileges to interfere with access to a table by users who do have privileges (e.g. if a malicious backend queues up for an AccessExclusiveLock on a table on which AccessShareLock is already held, further attempts to access the table will be blocked until the AccessExclusiveLock is obtained and the malicious backend's transaction rolls back). To fix, allow callers of RangeVarGetRelid() to pass a callback which gets executed after performing the name lookup but before acquiring the relation lock. If the name lookup is retried (because invalidation messages are received), the callback will be re-executed as well, so we get the best of both worlds. RangeVarGetRelid() is renamed to RangeVarGetRelidExtended(); callers not wishing to supply a callback can continue to invoke it as RangeVarGetRelid(), which is now a macro. Since the only one caller that uses nowait = true now passes a callback anyway, the RangeVarGetRelid() macro defaults nowait as well. The callback can also be used for supplemental locking - for example, REINDEX INDEX needs to acquire the table lock before the index lock to reduce deadlock possibilities. There's a lot more work to be done here to fix all the cases where this can be a problem, but this commit provides the general infrastructure and fixes the following specific cases: REINDEX INDEX, REINDEX TABLE, LOCK TABLE, and and DROP TABLE/INDEX/SEQUENCE/VIEW/FOREIGN TABLE. Per discussion with Noah Misch and Alvaro Herrera.	2011-11-30 10:27:00 -05:00
Tom Lane	dd3bab5fd7	Ensure that whole-row junk Vars are always of composite type. The EvalPlanQual machinery assumes that whole-row Vars generated for the outputs of non-table RTEs will be of composite types. However, for the case where the RTE is a function call returning a scalar type, we were doing the wrong thing, as a result of sharing code with a parser case where the function's scalar output is wanted. (Or at least, that's what that case has done historically; it does seem a bit inconsistent.) To fix, extend makeWholeRowVar's API so that it can support both use-cases. This fixes Belinda Cussen's report of crashes during concurrent execution of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED mode, we'd run the EvalPlanQual machinery after a conflicting row update commits, and it was expecting to get a HeapTuple not a scalar datum from the "wholerowN" variable referencing the function RTE. Back-patch to 9.0 where the current EvalPlanQual implementation appeared. In 9.1 and up, this patch also fixes failure to attach the correct collation to the Var generated for a scalar-result case. An example: regression=# select upper(x.*) from textcat('ab', 'cd') x; ERROR: could not determine which collation to use for upper() function	2011-11-27 22:27:24 -05:00
Tom Lane	c66e4f138b	Improve GiST range-contained-by searches by adding a flag for empty ranges. In the original implementation, a range-contained-by search had to scan the entire index because an empty range could be lurking anywhere. Improve that by adding a flag to upper GiST entries that says whether the represented subtree contains any empty ranges. Also, make a simple mod to the penalty function to discourage empty ranges from getting pushed into subtrees without any. This needs more work, and the picksplit function should be taught about it too, but that code can be improved without causing an on-disk compatibility break; so we'll leave it for another day. Since we're breaking on-disk compatibility of range values anyway, I took the opportunity to reorganize the range flags bits; the unused RANGE_xB_NULL bits are now adjacent, which might open the door for using them in some other way later. In passing, remove the GiST range opclass entry for <>, which doesn't seem like it can really be indexed usefully. Alexander Korotkov, with some editorializing by Tom	2011-11-27 16:51:29 -05:00
Alvaro Herrera	9d3b502443	Improve logging of autovacuum I/O activity This adds some I/O stats to the logging of autovacuum (when the operation takes long enough that log_autovacuum_min_duration causes it to be logged), so that it is easier to tune. Notably, it adds buffer I/O counts (hits, misses, dirtied) and read and write rate. Authors: Greg Smith and Noah Misch	2011-11-25 16:34:32 -03:00
Robert Haas	ed0b409d22	Move "hot" members of PGPROC into a separate PGXACT array. This speeds up snapshot-taking and reduces ProcArrayLock contention. Also, the PGPROC (and PGXACT) structures used by two-phase commit are now allocated as part of the main array, rather than in a separate array, and we keep ProcArray sorted in pointer order. These changes are intended to minimize the number of cache lines that must be pulled in to take a snapshot, and testing shows a substantial increase in performance on both read and write workloads at high concurrencies. Pavan Deolasee, Heikki Linnakangas, Robert Haas	2011-11-25 08:02:10 -05:00
Tom Lane	9ed439a9c0	Fix unsupported options in CREATE TABLE ... AS EXECUTE. The WITH [NO] DATA option was not supported, nor the ability to specify replacement column names; the former limitation wasn't even documented, as per recent complaint from Naoya Anzai. Fix by moving the responsibility for supporting these options into the executor. It actually takes less code this way ... catversion bump due to change in representation of IntoClause, which might affect stored rules.	2011-11-24 23:21:45 -05:00
Tom Lane	74c1723fc8	Remove user-selectable ANALYZE option for range types. It's not clear that a per-datatype typanalyze function would be any more useful than a generic typanalyze for ranges. What is clear is that letting unprivileged users select typanalyze functions is a crash risk or worse. So remove the option from CREATE TYPE AS RANGE, and instead put in a generic typanalyze function for ranges. The generic function does nothing as yet, but hopefully we'll improve that before 9.2 release.	2011-11-23 00:03:22 -05:00
Tom Lane	df73584431	Remove zero- and one-argument range constructor functions. Per discussion, the zero-argument forms aren't really worth the catalog space (just write 'empty' instead). The one-argument forms have some use, but they also have a serious problem with looking too much like functional cast notation; to the point where in many real use-cases, the parser would misinterpret what was wanted. Committing this as a separate patch, with the thought that we might want to revert part or all of it if we can think of some way around the cast ambiguity.	2011-11-22 20:45:05 -05:00
Tom Lane	cddc819e45	Improve implementation of range-contains-element tests. Implement these tests directly instead of constructing a singleton range and then applying range-contains. This saves a range serialize/deserialize cycle as well as a couple of redundant bound-comparison steps, and adds very little code on net. Remove elem_contained_by_range from the GiST opclass: it doesn't belong there because there is no way to use it in an index clause (where the indexed column would have to be on the left). Its commutator is in the opclass, and that's what counts.	2011-11-22 17:45:37 -05:00
Tom Lane	766948bedd	Still more review for range-types patch. Per discussion, relax the range input/construction rules so that the only hard error is lower bound > upper bound. Cases where the lower bound is <= upper bound, but the range nonetheless normalizes to empty, are now permitted. Fix core dump in range_adjacent when bounds are infinite. Marginal cleanup of regression test cases, some more code commenting.	2011-11-22 16:06:26 -05:00
Tom Lane	a4ffcc8e11	More code review for rangetypes patch. Fix up some infelicitous coding in DefineRange, and add some missing error checks. Rearrange operator strategy number assignments for GiST anyrange opclass so that they don't make such a mess of opr_sanity's table of operator names associated with different strategy numbers. Assign hopefully-temporary selectivity estimators to range operators that didn't have one --- poor as the estimates are, they're still a lot better than the default 0.5 estimate, and they'll shut up the opr_sanity test that wants to see selectivity estimators on all built-in operators.	2011-11-21 16:19:53 -05:00
Tom Lane	b985d48779	Further code review for range types patch. Fix some bugs in coercion logic and pg_dump; more comment cleanup; minor cosmetic improvements.	2011-11-20 23:50:27 -05:00
Tom Lane	a1a233af66	Further review of range-types patch. Lots of documentation cleanup today, and still more type_sanity tests.	2011-11-18 18:24:32 -05:00
Tom Lane	f6438f6622	Do missed autoheader run for previous commit.	2011-11-17 22:39:14 -05:00
Robert Haas	fc6d1006bd	Further consolidation of DROP statement handling. This gets rid of an impressive amount of duplicative code, with only minimal behavior changes. DROP FOREIGN DATA WRAPPER now requires object ownership rather than superuser privileges, matching the documentation we already have. We also eliminate the historical warning about dropping a built-in function as unuseful. All operations are now performed in the same order for all object types handled by dropcmds.c. KaiGai Kohei, with minor revisions by me	2011-11-17 21:32:34 -05:00
Tom Lane	709aca5960	Declare range inclusion operators as taking anyelement not anynonarray. Use of anynonarray was a crude hack to get around ambiguity versus the array inclusion operators of the same names. My previous patch to extend the parser's type resolution heuristics makes that unnecessary, so use the more general declaration instead. This eliminates a wart that these operators couldn't be used with ranges over arrays, which are otherwise supported just fine. Also, mark range_before and range_after as commutator operators, per discussion with Jeff Davis.	2011-11-17 18:56:33 -05:00
Robert Haas	67dc4eed42	Remove ancient downcasing code from procedural language operations. A very long time ago, language names were specified as literals rather than identifiers, so this code was added to do case-folding. But that style has ben deprecated for many years so this isn't needed any more. Language names will still be downcased when specified as unquoted identifiers, but quoted identifiers or the old style using string literals will be left as-is.	2011-11-17 14:25:18 -05:00
Tom Lane	4509033a00	Code review for range-types catalog entries. Fix assorted infelicities, such as dependency on OIDs that aren't hardwired, as well as outright misdeclaration of daterange_canonical(), which resulted in crashes if you invoked it directly. Add some more regression tests to try to catch similar mistakes in future.	2011-11-16 18:21:34 -05:00
Tom Lane	37ee4b75db	Restructure function-internal caching in the range type code. Move the responsibility for caching specialized information about range types into the type cache, so that the catalog lookups only have to occur once per session. Rearrange APIs a bit so that fn_extra caching is actually effective in the GiST support code. (Use of OidFunctionCallN is bad enough for performance in itself, but it also prevents the function from exploiting fn_extra caching.) The range I/O functions are still not very bright about caching repeated lookups, but that seems like material for a separate patch. Also, avoid unnecessary use of memcpy to fetch/store the range type OID and flags, and don't use the full range_deserialize machinery when all we need to see is the flags value. Also fix API error in range_gist_penalty --- it was failing to set *penalty for any case involving an empty range.	2011-11-15 13:05:45 -05:00
Tom Lane	ad50934eaa	Fix alignment and toasting bugs in range types. A range type whose element type has 'd' alignment must have 'd' alignment itself, else there is no guarantee that the element value can be used in-place. (Because range_deserialize uses att_align_pointer which forcibly aligns the given pointer, violations of this rule did not lead to SIGBUS but rather to garbage data being extracted, as in one of the added regression test cases.) Also, you can't put a toast pointer inside a range datum, since the referenced value could disappear with the range datum still present. For consistency with the handling of arrays and records, I also forced decompression of in-line-compressed bound values. It would work to store them as-is, but our policy is to avoid situations that might result in double compression. Add assorted regression tests for this, and bump catversion because of fixes to built-in pg_type entries. Also some marginal cleanup of inconsistent/unnecessary error checks.	2011-11-14 21:42:04 -05:00
Tom Lane	f158536285	Fix copyright notices, other minor editing in new range-types code. No functional changes in this commit (except I could not resist the temptation to re-word a couple of error messages). This is just manual cleanup after pgindent to make the code look reasonably like other PG code, in preparation for more detailed code review to come.	2011-11-14 13:59:34 -05:00
Simon Riggs	4de82f7d7c	Wakeup WALWriter as needed for asynchronous commit performance. Previously we waited for wal_writer_delay before flushing WAL. Now we also wake WALWriter as soon as a WAL buffer page has filled. Significant effect observed on performance of asynchronous commits by Robert Haas, attributed to the ability to set hint bits on tuples earlier and so reducing contention caused by clog lookups.	2011-11-13 09:00:57 +00:00
Robert Haas	71b2b657c0	Revert removal of trace_userlocks, because userlocks aren't gone. This reverts commit `0180bd6180`. contrib/userlock is gone, but user-level locking still exists, and is exposed via the pg_advisory* family of functions.	2011-11-10 17:54:27 -05:00
Peter Eisentraut	409b8c75ba	Fix server header file installation with vpath builds Several server header files would not be installed in vpath builds because they live in the build directory.	2011-11-10 20:52:54 +02:00
Heikki Linnakangas	d326d9e8ea	In COPY, insert tuples to the heap in batches. This greatly reduces the WAL volume, especially when the table is narrow. The overhead of locking the heap page is also reduced. Reduced WAL traffic also makes it scale a lot better, if you run multiple COPY processes at the same time.	2011-11-09 10:54:41 +02:00
Tom Lane	57664ed25e	Wrap appendrel member outputs in PlaceHolderVars in additional cases. Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output expressions appear non-constant and distinct from each other. This makes the world safe for add_child_rel_equivalences to do what it does. Before, it was possible for that function to add identical expressions to different EquivalenceClasses, which logically should imply merging such ECs, which would be wrong; or to improperly add a constant to an EquivalenceClass, drastically changing its behavior. Per report from Teodor Sigaev. The only currently known consequence of this bug is "MergeAppend child's targetlist doesn't match MergeAppend" planner failures in 9.1 and later. I am suspicious that there may be other failure modes that could affect older release branches; but in the absence of any hard evidence, I'll refrain from back-patching further than 9.1.	2011-11-08 21:14:21 -05:00
Heikki Linnakangas	3b8161723c	Make DatumGetInetP() unpack inet datums with a 1-byte header, and add a new macro, DatumGetInetPP(), that does not. This brings these macros in line with other DatumGet*P() macros. Backpatch to 8.3, where 1-byte header varlenas were introduced.	2011-11-08 22:39:43 +02:00
Robert Haas	bbb6e559c4	Make VACUUM avoid waiting for a cleanup lock, where possible. In a regular VACUUM, it's OK to skip pages for which a cleanup lock isn't immediately available; the next VACUUM will deal with them. If we're scanning the entire relation to advance relfrozenxid, we might need to wait, but only if there are tuples on the page that actually require freezing. These changes should greatly reduce the incidence of of vacuum processes getting "stuck". Simon Riggs and Robert Haas	2011-11-07 21:39:40 -05:00
Heikki Linnakangas	780571cc9f	Oops, forgot to fix the catversion when I committed the range types patch. It was inadvertently changed to 201111111, which is a wrong date. Change it to current date, and remove the comment that was supposed to remind me to fix it before committing.	2011-11-06 14:36:36 +02:00
Simon Riggs	a030bfa6e4	Move user functions related to WAL into xlogfuncs.c	2011-11-04 09:37:17 +00:00
Tom Lane	a0d2f05a0d	Improve comments for TSLexeme data structure. Mostly, clean up long-ago pgindent damage.	2011-11-03 18:47:28 -04:00
Heikki Linnakangas	4429f6a9e3	Support range data types. Selectivity estimation functions are missing for some range type operators, which is a TODO. Jeff Davis	2011-11-03 13:42:15 +02:00
Tom Lane	7e3bf99baa	Fix handling of PlaceHolderVars in nestloop parameter management. If we use a PlaceHolderVar from the outer relation in an inner indexscan, we need to reference the PlaceHolderVar as such as the value to be passed in from the outer relation. The previous code effectively tried to reconstruct the PHV from its component expression, which doesn't work since (a) the Vars therein aren't necessarily bubbled up far enough, and (b) it would be the wrong semantics anyway because of the possibility that the PHV is supposed to have gone to null at some point before the current join. Point (a) led to "variable not found in subplan target list" planner errors, but point (b) would have led to silently wrong answers. Per report from Roger Niederland.	2011-11-03 00:50:58 -04:00
Simon Riggs	9aceb6ab3c	Refactor xlog.c to create src/backend/postmaster/startup.c Startup process now has its own dedicated file, just like all other special/background processes. Reduces role and size of xlog.c	2011-11-02 14:25:01 +00:00
Simon Riggs	86e3364899	Derive oldestActiveXid at correct time for Hot Standby. There was a timing window between when oldestActiveXid was derived and when it should have been derived that only shows itself under heavy load. Move code around to ensure correct timing of derivation. No change to StartupSUBTRANS() code, which is where this failed. Bug report by Chris Redekop	2011-11-02 08:54:56 +00:00
Simon Riggs	f8409b39d1	Fix timing of Startup CLOG and MultiXact during Hot Standby Patch by me, bug report by Chris Redekop, analysis by Florian Pflug	2011-11-02 08:07:44 +00:00
Tom Lane	08e261cbc9	Fix race condition with toast table access from a stale syscache entry. If a tuple in a syscache contains an out-of-line toasted field, and we try to fetch that field shortly after some other transaction has committed an update or deletion of the tuple, there is a race condition: vacuum could come along and remove the toast tuples before we can fetch them. This leads to transient failures like "missing chunk number 0 for toast value NNNNN in pg_toast_2619", as seen in recent reports from Andrew Hammond and Tim Uckun. The design idea of syscache is that access to stale syscache entries should be prevented by relation-level locks, but that fails for at least two cases where toasted fields are possible: ANALYZE updates pg_statistic rows without locking out sessions that might want to plan queries on the same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without any meaningful lock at all. The least risky fix seems to be an idea that Heikki suggested when we were dealing with a related problem back in August: forcibly detoast any out-of-line fields before putting a tuple into syscache in the first place. This avoids the problem because at the time we fetch the parent tuple from the catalog, we should be holding an MVCC snapshot that will prevent removal of the toast tuples, even if the parent tuple is outdated immediately after we fetch it. (Note: I'm not convinced that this statement holds true at every instant where we could be fetching a syscache entry at all, but it does appear to hold true at the times where we could fetch an entry that could have a toasted field. We will need to be a bit wary of adding toast tables to low-level catalogs that don't have them already.) An additional benefit is that subsequent uses of the syscache entry should be faster, since they won't have to detoast the field. Back-patch to all supported versions. The problem is significantly harder to reproduce in pre-9.0 releases, because of their willingness to flush every entry in a syscache whenever the underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but there is still a window for trouble.	2011-11-01 19:49:58 -04:00
Simon Riggs	806a2aee37	Split work of bgwriter between 2 processes: bgwriter and checkpointer. bgwriter is now a much less important process, responsible for page cleaning duties only. checkpointer is now responsible for checkpoints and so has a key role in shutdown. Later patches will correct doc references to the now old idea that bgwriter performs checkpoints. Has beneficial effect on performance at high write rates, but mainly refactoring to more easily allow changes for power reduction by simplifying previously tortuous code around required to allow page cleaning and checkpointing to time slice in the same process. Patch by me, Review by Dickson Guedes	2011-11-01 17:14:47 +00:00
Robert Haas	78d523b633	Improve make_greater_string() with encoding-specific incrementers. This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.	2011-10-29 14:22:20 -04:00
Robert Haas	53f1ca59b5	Allow hint bits to be set sooner for temporary and unlogged tables. We need not wait until the commit record is durably on disk, because in the event of a crash the page we're updating with hint bits will be gone anyway. Per off-list report from Heikki Linnakangas, this can significantly degrade the performance of unlogged tables; I was able to show a 2x speedup from this patch on a pgbench run with scale factor 15. In practice, this will mostly help small, heavily updated tables, because on larger tables you're unlikely to run into the same row again before the commit record makes it out to disk.	2011-10-28 17:08:09 -04:00
Robert Haas	b6335a3f1b	Demote some sanity checks in BufferIsValid() to assertions. Testing reveals that this macro is a hot-spot for index-only-scans. Per discussion with Tom Lane.	2011-10-28 17:04:22 -04:00
Tom Lane	3e4b3465b6	Improve planner's ability to recognize cases where an IN's RHS is unique. If the right-hand side of a semijoin is unique, then we can treat it like a normal join (or another way to say that is: we don't need to explicitly unique-ify the data before doing it as a normal join). We were recognizing such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP BY decoration, but there's another way: if the RHS is a plain relation with unique indexes, we can check if any of the indexes prove the output is unique. Most of the infrastructure for that was there already in the join removal code, though I had to rearrange it a bit. Per reflection about a recent example in pgsql-performance.	2011-10-26 17:52:29 -04:00
Tom Lane	0f39d5050d	Don't trust deferred-unique indexes for join removal. The uniqueness condition might fail to hold intra-transaction, and assuming it does can give incorrect query results. Per report from Marti Raudsepp, though this is not his proposed patch. Back-patch to 9.0, where both these features were introduced. In the released branches, add the new IndexOptInfo field to the end of the struct, to try to minimize ABI breakage for third-party code that may be examining that struct.	2011-10-23 00:43:39 -04:00
Tom Lane	bb446b689b	Support synchronization of snapshots through an export/import procedure. A transaction can export a snapshot with pg_export_snapshot(), and then others can import it with SET TRANSACTION SNAPSHOT. The data does not leave the server so there are not security issues. A snapshot can only be imported while the exporting transaction is still running, and there are some other restrictions. I'm not totally convinced that we've covered all the bases for SSI (true serializable) mode, but it works fine for lesser isolation modes. Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified by Tom Lane	2011-10-22 18:23:30 -04:00
Tom Lane	f9c92a5a3e	Code review for pgstat_get_crashed_backend_activity patch. Avoid possibly dumping core when pgstat_track_activity_query_size has a less-than-default value; avoid uselessly searching for the query string of a successfully-exited backend; don't bother putting out an ERRDETAIL if we don't have a query to show; some other minor stylistic improvements.	2011-10-21 16:36:04 -04:00
Robert Haas	c8e8b5a6e2	Try to log current the query string when a backend crashes. To avoid minimize risk inside the postmaster, we subject this feature to a number of significant limitations. We very much wish to avoid doing any complex processing inside the postmaster, due to the posssibility that the crashed backend has completely corrupted shared memory. To that end, no encoding conversion is done; instead, we just replace anything that doesn't look like an ASCII character with a question mark. We limit the amount of data copied to 1024 characters, and carefully sanity check the source of that data. While these restrictions would doubtless be unacceptable in a general-purpose logging facility, even this limited facility seems like an improvement over the status quo ante. Marti Raudsepp, reviewed by PDXPUG and myself	2011-10-21 13:26:40 -04:00
Robert Haas	82a4a777d9	Consolidate DROP handling for some object types. This gets rid of a significant amount of duplicative code. KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with further review and cleanup by me.	2011-10-19 23:27:19 -04:00
Tom Lane	7c19e0446c	Remove unnecessary AssertMacro() to suppress gcc 4.6 compiler warning. There's no particular value in doing AssertMacro((tup) != NULL) in front of code that's certain to crash anyway if tup is NULL. And if "tup" is actually the address of a local variable, gcc 4.6 whinges about it. That's arguably pretty broken on gcc's part, but we might as well remove the useless test to silence the warnings. This gets rid of all the -Waddress warnings in the backend; there are some in libpq and psql that are a bit harder to avoid.	2011-10-18 17:39:14 -04:00
Tom Lane	336c1d7a51	Avoid assuming that index-only scan data matches the index's rowtype. In general the data returned by an index-only scan should have the datatypes originally computed by FormIndexDatum. If the index opclasses use "storage" datatypes different from their input datatypes, the scan tuple will not have the same rowtype attributed to the index; but we had a hard-wired assumption that that was true in nodeIndexonlyscan.c. We'd already hacked around the issue for the one case where the types are different in btree indexes (btree name_ops), but this would definitely come back to bite us if we ever implement index-only scans in GiST. To fix, require the index AM to explicitly provide the tupdesc for the tuple it is returning. btree can just pass back the index's tupdesc, but GiST will have to work harder when and if it supports index-only scans. I had previously proposed fixing this by allowing the index AM to fill the scan tuple slot directly; but on reflection that seemed like a module layering violation, since TupleTableSlots are creatures of the executor. At least in the btree case, it would also be less efficient, since the tuple deconstruction work would occur even for rows later found to be invisible to the scan's snapshot.	2011-10-16 19:15:04 -04:00
Tom Lane	9e8da0f757	Teach btree to handle ScalarArrayOpExpr quals natively. This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain indexscans, and particularly in index-only scans.	2011-10-16 15:39:24 -04:00
Tom Lane	e6858e6657	Measure the number of all-visible pages for use in index-only scan costing. Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.	2011-10-14 17:23:46 -04:00
Bruce Momjian	0180bd6180	Remove all "traces" of trace_userlocks, because userlocks were removed in PG 8.2.	2011-10-13 19:59:57 -04:00
Bruce Momjian	484af9b376	Modify RelationGetBufferForTuple() to use a typedef, rather than a struct, to help pgindent.	2011-10-12 16:53:54 -04:00
Tom Lane	a0185461dd	Rearrange the implementation of index-only scans. This commit changes index-only scans so that data is read directly from the index tuple without first generating a faux heap tuple. The only immediate benefit is that indexes on system columns (such as OID) can be used in index-only scans, but this is necessary infrastructure if we are ever to support index-only scans on expression indexes. The executor is now ready for that, though the planner still needs substantial work to recognize the possibility. To do this, Vars in index-only plan nodes have to refer to index columns not heap columns. I introduced a new special varno, INDEX_VAR, to mark such Vars to avoid confusion. (In passing, this commit renames the two existing special varnos to OUTER_VAR and INNER_VAR.) This allows ruleutils.c to handle them with logic similar to what we use for subplan reference Vars. Since index-only scans are now fundamentally different from regular indexscans so far as their expression subtrees are concerned, I also chose to change them to have their own plan node type (and hence, their own executor source file).	2011-10-11 14:21:30 -04:00
Robert Haas	c980426c69	Revert accidental change to pg_config_manual.h. This was broken in commit `53dbc27c62`, which introduced unlogged tables. Fortunately, as debugging tools go, this one is pretty cheap, which is probably why it took nine months for someone to notice, but it's not intended to be enabled by default, so revert. Noted by Fujii Masao.	2011-10-09 22:20:44 -04:00
Tom Lane	cbfa92c23c	Improve index-only scans to avoid repeated access to the index page. We copy all the matched tuples off the page during _bt_readpage, instead of expensively re-locking the page during each subsequent tuple fetch. This costs a bit more local storage, but not more than 2*BLCKSZ worth, and the reduction in LWLock traffic is certainly worth that. What's more, this lets us get rid of the API wart in the original patch that said an index AM could randomly decline to supply an index tuple despite having asserted pg_am.amcanreturn. That will be important for future improvements in the index-only-scan feature, since the executor will now be able to rely on having the index data available.	2011-10-09 00:21:08 -04:00
Heikki Linnakangas	041dceb259	Fix typo.	2011-10-08 11:04:07 +03:00
Robert Haas	6a6082c27c	Try to fix memory barriers on x86_64. %esp is no good; must use %rsp there.	2011-10-07 23:42:57 -04:00
Tom Lane	a2822fb933	Support index-only scans using the visibility map to avoid heap fetches. When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.	2011-10-07 20:14:13 -04:00
Tom Lane	ba6f629326	Improve and simplify CREATE EXTENSION's management of GUC variables. CREATE EXTENSION needs to transiently set search_path, as well as client_min_messages and log_min_messages. We were doing this by the expedient of saving the current string value of each variable, doing a SET LOCAL, and then doing another SET LOCAL with the previous value at the end of the command. This is a bit expensive though, and it also fails badly if there is anything funny about the existing search_path value, as seen in a recent report from Roger Niederland. Fortunately, there's a much better way, which is to piggyback on the GUC infrastructure previously developed for functions with SET options. We just open a new GUC nesting level, do our assignments with GUC_ACTION_SAVE, and then close the nesting level when done. This automatically restores the prior settings without a re-parsing pass, so (in principle anyway) there can't be an error. And guc.c still takes care of cleanup in event of an error abort. The CREATE EXTENSION code for this was modeled on some much older code in ri_triggers.c, which I also changed to use the better method, even though there wasn't really much risk of failure there. Also improve the comments in guc.c to reflect this additional usage.	2011-10-05 20:44:16 -04:00
Tom Lane	41e461d36f	Improve define_custom_variable's handling of pre-existing settings. Arrange for any problems with pre-existing settings to be reported as WARNING not ERROR, so that we don't undesirably abort the loading of the incoming add-on module. The bad setting is just discarded, as though it had never been applied at all. (This requires a change in the API of set_config_option. After some thought I decided the most potentially useful addition was to allow callers to just pass in a desired elevel.) Arrange to restore the complete stacked state of the variable, rather than cheesily reinstalling only the active value. This ensures that custom GUCs will behave unsurprisingly even when the module loading operation occurs within nested subtransactions that have changed the active value. Since a module load could occur as a result of, eg, a PL function call, this is not an unlikely scenario.	2011-10-04 19:57:21 -04:00
Tom Lane	9f5836d224	Remember the source GucContext for each GUC parameter. We used to just remember the GucSource, but saving GucContext too provides a little more information --- notably, whether a SET was done by a superuser or regular user. This allows us to rip out the fairly dodgy code that define_custom_variable used to use to try to infer the context to re-install a pre-existing setting with. In particular, it now works for a superuser to SET a extension's SUSET custom variable before loading the associated extension, because GUC can remember whether the SET was done as a superuser or not. The plperl regression tests contain an example where this is useful.	2011-10-04 16:13:50 -04:00
Alvaro Herrera	09e196e453	Use callbacks in SlruScanDirectory for the actual action Previously, the code assumed that the only possible action to take was to delete files behind a certain cutoff point. The async notify code was already a crock: it used a different "pagePrecedes" function for truncation than for regular operation. By allowing it to pass a callback to SlruScanDirectory it can do cleanly exactly what it needs to do. The clog.c code also had its own use for SlruScanDirectory, which is made a bit simpler with this.	2011-10-04 14:03:23 -03:00
Tom Lane	d56b3afc03	Restructure error handling in reading of postgresql.conf. This patch has two distinct purposes: to report multiple problems in postgresql.conf rather than always bailing out after the first one, and to change the policy for whether changes are applied when there are unrelated errors in postgresql.conf. Formerly the policy was to apply no changes if any errors could be detected, but that had a significant consistency problem, because in some cases specific values might be seen as valid by some processes but invalid by others. This meant that the latter processes would fail to adopt changes in other parameters even though the former processes had done so. The new policy is that during SIGHUP, the file is rejected as a whole if there are any errors in the "name = value" syntax, or if any lines attempt to set nonexistent built-in parameters, or if any lines attempt to set custom parameters whose prefix is not listed in (the new value of) custom_variable_classes. These tests should always give the same results in all processes, and provide what seems a reasonably robust defense against loading values from badly corrupted config files. If these tests pass, all processes will apply all settings that they individually see as good, ignoring (but logging) any they don't. In addition, the postmaster does not abandon reading a configuration file after the first syntax error, but continues to read the file and report syntax errors (up to a maximum of 100 syntax errors per file). The postmaster will still refuse to start up if the configuration file contains any errors at startup time, but these changes allow multiple errors to be detected and reported before quitting. Alexey Klyukin, reviewed by Andy Colson and av (Alexander ?) with some additional hacking by Tom Lane	2011-10-02 16:50:04 -04:00
Tom Lane	d22a09dc70	Support GiST index support functions that want to cache data across calls. pg_trgm was already doing this unofficially, but the implementation hadn't been thought through very well and leaked memory. Restructure the core GiST code so that it actually works, and document it. Ordinarily this would have required an extra memory context creation/destruction for each GiST index search, but I was able to avoid that in the normal case of a non-rescanned search by finessing the handling of the RBTree. It used to have its own context always, but now shares a context with the scan-lifespan data structures, unless there is more than one rescan call. This should make the added overhead unnoticeable in typical cases.	2011-09-30 19:48:57 -04:00
Tom Lane	57eb009092	Allow snapshot references to still work during transaction abort. In REPEATABLE READ (nee SERIALIZABLE) mode, an attempt to do GetTransactionSnapshot() between AbortTransaction and CleanupTransaction failed, because GetTransactionSnapshot would recompute the transaction snapshot (which is already wrong, given the isolation mode) and then re-register it in the TopTransactionResourceOwner, leading to an Assert because the TopTransactionResourceOwner should be empty of resources after AbortTransaction. This is the root cause of bug #6218 from Yamamoto Takashi. While changing plancache.c to avoid requesting a snapshot when handling a ROLLBACK masks the problem, I think this is really a snapmgr.c bug: it's lower-level than the resource manager mechanism and should not be shutting itself down before we unwind resource manager resources. However, just postponing the release of the transaction snapshot until cleanup time didn't work because of the circular dependency with TopTransactionResourceOwner. Fix by managing the internal reference to that snapshot manually instead of depending on TopTransactionResourceOwner. This saves a few cycles as well as making the module layering more straightforward. predicate.c's dependencies on TopTransactionResourceOwner go away too. I think this is a longstanding bug, but there's no evidence that it's more than a latent bug, so it doesn't seem worth any risk of back-patching.	2011-09-26 22:25:28 -04:00
Tom Lane	7741dd6590	Recognize self-contradictory restriction clauses for non-table relations. The constraint exclusion feature checks for contradictions among scan restriction clauses, as well as contradictions between those clauses and a table's CHECK constraints. The first aspect of this testing can be useful for non-table relations (such as subqueries or functions-in-FROM), but the feature was coded with only the CHECK case in mind so we were applying it only to plain-table RTEs. Move the relation_excluded_by_constraints call so that it is applied to all RTEs not just plain tables. With the default setting of constraint_exclusion this results in no extra work, but with constraint_exclusion = ON we will detect optimizations that we missed before (at the cost of more planner cycles than we expended before). Per a gripe from Gunnlaugur Þór Briem. Experimentation with his example also showed we were not being very bright about the case where constraint exclusion is proven within a subquery within UNION ALL, so tweak the code to allow set_append_rel_pathlist to recognize such cases.	2011-09-24 19:33:16 -04:00
Robert Haas	0c8eda6258	Memory barrier support for PostgreSQL. This is not actually used anywhere yet, but it gets the basic infrastructure in place. It is fairly likely that there are bugs, and support for some important platforms may be missing, so we'll need to refine this as we go along.	2011-09-23 17:52:43 -04:00
Tom Lane	f197272365	Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps. This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom	2011-09-22 11:30:11 -04:00
Tom Lane	e6faf910d7	Redesign the plancache mechanism for more flexibility and efficiency. Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy. In addition, restructure the API to greatly reduce the amount of extraneous data copying needed. The main compromise needed to make that possible was to split the initial creation of a CachedPlanSource into two steps. It's worth noting in particular that SPI_saveplan is now deprecated in favor of SPI_keepplan, which accomplishes the same end result with zero data copying, and no need to then spend even more cycles throwing away the original SPIPlan. The risk of long-term memory leaks while manipulating SPIPlans has also been greatly reduced. Most of this improvement is based on use of the recently-added MemoryContextSetParent primitive.	2011-09-16 00:43:52 -04:00
Alvaro Herrera	86822df9b5	Split walsender.h in public/private headers This dramatically cuts short the number of headers the public one brings into whatever includes it.	2011-09-13 21:42:49 -03:00
Tom Lane	b0025bd957	Invent a new memory context primitive, MemoryContextSetParent. This function will be useful for altering the lifespan of a context after creation (for example, by creating it under a transient context and later reparenting it to belong to a long-lived context). It costs almost no new code, since we can refactor what was there. Per my proposal of yesterday.	2011-09-11 16:29:42 -04:00
Peter Eisentraut	1b81c2fe6e	Remove many -Wcast-qual warnings This addresses only those cases that are easy to fix by adding or moving a const qualifier or removing an unnecessary cast. There are many more complicated cases remaining.	2011-09-11 21:54:32 +03:00
Peter Eisentraut	52ce20589a	Add missing format attributes Add __attribute__ decorations for printf format checking to the places that were missing them. Fix the resulting warnings. Add -Wmissing-format-attribute to the standard set of warnings for GCC, so these don't happen again. The warning fixes here are relatively harmless. The one serious problem discovered by this was already committed earlier in `cf15fb5cab`.	2011-09-10 23:12:46 +03:00
Itagaki Takahiro	96a8aed4cb	Add datatype directory to SUBDIRS. New header datatype/timestamp.h should be installed for server-side dev.	2011-09-11 04:07:12 +09:00
Tom Lane	ca4af308c3	Simplify handling of the timezone GUC by making initdb choose the default. We were doing some amazingly complicated things in order to avoid running the very expensive identify_system_timezone() procedure during GUC initialization. But there is an obvious fix for that, which is to do it once during initdb and have initdb install the system-specific default into postgresql.conf, as it already does for most other GUC variables that need system-environment-dependent defaults. This means that the timezone (and log_timezone) settings no longer have any magic behavior in the server. Per discussion.	2011-09-09 17:59:11 -04:00
Tom Lane	a7801b62f2	Move Timestamp/Interval typedefs and basic macros into datatype/timestamp.h. As per my recent proposal, this refactors things so that these typedefs and macros are available in a header that can be included in frontend-ish code. I also changed various headers that were undesirably including utils/timestamp.h to include datatype/timestamp.h instead. Unsurprisingly, this showed that half the system was getting utils/timestamp.h by way of xlog.h. No actual code changes here, just header refactoring.	2011-09-09 13:23:41 -04:00
Heikki Linnakangas	5edb24a898	Buffering GiST index build algorithm. When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov	2011-09-08 17:51:23 +03:00
Bruce Momjian	f81fb4f690	Fix bug introduced by pgrminclude where the tablespace version name was not expanded. Bump catalog version number to force initdb for all tablespaces.	2011-09-07 12:41:16 -04:00
Tom Lane	4c2777d0b7	Change get_variable_numdistinct's API to flag default estimates explicitly. Formerly, callers tested for DEFAULT_NUM_DISTINCT, which had the problem that a perfectly solid estimate might be mistaken for a content-free default.	2011-09-04 15:41:49 -04:00
Tom Lane	1609797c25	Clean up the #include mess a little. walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.	2011-09-04 01:13:16 -04:00
Tom Lane	f116b1f5b8	Remove unnecessary and circular #include. storage/proc.h should not include replication/syncrep.h, especially not when the latter includes storage/proc.h; but in any case this was a pretty poor thing from a modular layering standpoint.	2011-09-03 22:14:45 -04:00
Bruce Momjian	5bce637a4b	walsender.h doesn't need xlog.h, per Tom.	2011-09-03 21:25:00 -04:00
Bruce Momjian	85e6e1662b	Move AllowCascadeReplication() define from xlog.h to replication include file. Per suggestion from Alvaro.	2011-09-03 20:46:19 -04:00
Tom Lane	b3aaf9081a	Rearrange planner to save the whole PlannerInfo (subroot) for a subquery. Formerly, set_subquery_pathlist and other creators of plans for subqueries saved only the rangetable and rowMarks lists from the lower-level PlannerInfo. But there's no reason not to remember the whole PlannerInfo, and indeed this turns out to simplify matters in a number of places. The immediate reason for doing this was so that the subroot will still be accessible when we're trying to extract column statistics out of an already-planned subquery. But now that I've done it, it seems like a good code-beautification effort in its own right. I also chose to get rid of the transient subrtable and subrowmark fields in SubqueryScan nodes, in favor of having setrefs.c look up the subquery's RelOptInfo. That required changing all the APIs in setrefs.c to pass PlannerInfo not PlannerGlobal, which was a large but quite mechanical transformation. One side-effect not foreseen at the beginning is that this finally broke inheritance_planner's assumption that replanning the same subquery RTE N times would necessarily give interchangeable results each time. That assumption was always pretty risky, but now we really have to make a separate RTE for each instance so that there's a place to carry the separate subroots.	2011-09-03 15:36:24 -04:00
Tom Lane	5b562644fe	Teach ANALYZE to clear pg_class.relhassubclass when appropriate. In the past, relhassubclass always remained true if a relation had ever had child relations, even if the last subclass was long gone. While this had only marginal performance implications in most cases, it was annoying, and I'm now considering some planner changes that would raise the cost of a false positive. It was previously impractical to fix this because of race condition concerns. However, given the recent change that made tablecmds.c take ShareExclusiveLock on relations that are gaining a child (commit `fbcf4b92aa`), we can now allow ANALYZE to clear the flag when it's no longer relevant. There is no additional locking cost to do so, since ANALYZE takes ShareExclusiveLock anyway.	2011-09-02 14:29:31 -04:00
Bruce Momjian	418d04ea73	Improve method of avoiding fcinfo compile errors. Fix pgrminclude C comment marker.	2011-09-01 14:16:13 -04:00
Bruce Momjian	7805b11856	Add C comment about necessary NetBSD include.	2011-09-01 11:20:47 -04:00
Bruce Momjian	5352bf39ff	Add missing hba.h include for NetBSD.	2011-09-01 10:34:04 -04:00
Bruce Momjian	6416a82a62	Remove unnecessary #include references, per pgrminclude script.	2011-09-01 10:04:27 -04:00
Heikki Linnakangas	a88b6e4cfb	setlocale() on Windows doesn't work correctly if the locale name contains dots. I previously worked around this in initdb, mapping the known problematic locale names to aliases that work, but Hiroshi Inoue pointed out that that's not enough because even if you use one of the aliases, like "Chinese_HKG", setlocale(LC_CTYPE, NULL) returns back the long form, ie. "Chinese_Hong Kong S.A.R.". When we try to restore an old locale value by passing that value back to setlocale(), it fails. Note that you are affected by this bug also if you use one of those short-form names manually, so just reverting the hack in initdb won't fix it. To work around that, move the locale name mapping from initdb to a wrapper around setlocale(), so that the mapping is invoked on every setlocale() call. Also, add a few checks for failed setlocale() calls in the backend. These calls shouldn't fail, and if they do there isn't much we can do about it, but at least you'll get a warning. Backpatch to 9.1, where the initdb hack was introduced. The Windows bug affects older versions too if you set locale manually to one of the aliases, but given the lack of complaints from the field, I'm hesitent to backpatch.	2011-09-01 11:08:32 +03:00
Heikki Linnakangas	8ea0257067	Move the line to undefine setlocale() macro on Win32 outside USE_REPL_SNPRINTF ifdef block. It has nothing to do with whether the replacement snprintf function is used. It caused no live bug, because the replacement snprintf function is always used on Win32, but it was nevertheless misplaced.	2011-09-01 09:13:37 +03:00
Tom Lane	be1e8053f4	Use a non-locking test in TAS_SPIN() on all IA64 platforms. Per my testing, this works just as well with gcc as it does with HP's compiler; and there is no reason to think that the effect doesn't occur with icc, either. Also, rewrite the header comment about enforcing sequencing around spinlock operations, per Robert's gripe that it was misleading.	2011-08-29 13:18:44 -04:00
Robert Haas	c01c25fbe5	Improve spinlock performance for HP-UX, ia64, non-gcc. At least on this architecture, it's very important to spin on a non-atomic instruction and only retry the atomic once it appears that it will succeed. To fix this, split TAS() into two macros: TAS(), for trying to grab the lock the first time, and TAS_SPIN(), for spinning until we get it. TAS_SPIN() defaults to same as TAS(), but we can override it when we know there's a better way. It's likely that some of the other cases in s_lock.h require similar treatment, but this is the only one we've got conclusive evidence for at present.	2011-08-29 10:05:48 -04:00
Bruce Momjian	4bd7333b14	Allow more include files to be compiled in their own by adding missing include dependencies. Modify pgcompinclude to skip a common fcinfo error.	2011-08-27 11:05:33 -04:00
Bruce Momjian	48423d949f	Add markers.	2011-08-26 18:15:14 -04:00
Bruce Momjian	f8fc37b337	Add markers for skips.	2011-08-26 18:15:13 -04:00
Tom Lane	40271811cb	Improve comments describing tsvector data structure.	2011-08-26 16:17:42 -04:00
Bruce Momjian	2ab15afcdd	Add missing include so include file compiles cleanly on its own.	2011-08-22 23:19:21 -04:00
Tom Lane	b33f78df17	Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist. Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.	2011-08-21 18:15:55 -04:00
Tom Lane	b5282aa893	Revise sinval code to remove no-longer-used tuple TID from inval messages. This requires adjusting the API for syscache callback functions: they now get a hash value, not a TID, to identify the target tuple. Most of them weren't paying any attention to that argument anyway, but plancache did require a small amount of fixing. Also, improve performance a trifle by avoiding sending duplicate inval messages when a heap_update isn't changing the catcache lookup columns.	2011-08-16 19:27:46 -04:00
Tom Lane	2ada6779c5	Fix race condition in relcache init file invalidation. The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.	2011-08-16 13:11:54 -04:00
Bruce Momjian	6d7bd5dec9	Make USECS_PER_* timestamp macros visible even when we are not using integer timestamps.	2011-08-12 21:32:19 -04:00
Tom Lane	cff75130b5	Remove wal_sender_delay GUC, because it's no longer useful. The latch infrastructure is now capable of detecting all cases where the walsender loop needs to wake up, so there is no reason to have an arbitrary timeout. Also, modify the walsender loop logic to follow the standard pattern of ResetLatch, test for work to do, WaitLatch. The previous coding was both hard to follow and buggy: it would sometimes busy-loop despite having nothing available to do, eg between receipt of a signal and the next time it was caught up with new WAL, and it also had interesting choices like deciding to update to WALSNDSTATE_STREAMING on the strength of information known to be obsolete.	2011-08-10 18:50:28 -04:00
Tom Lane	4dab3d5ae1	Change the autovacuum launcher to use WaitLatch instead of a poll loop. In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom	2011-08-10 12:22:21 -04:00
Heikki Linnakangas	1f1b70a7cf	Oops, we're working on version 9.2 already, not 9.1. Update the PG_CONTROL_VERSION accordingly; I updated it wrong in previous commit.	2011-08-10 09:28:26 +03:00
Heikki Linnakangas	41f9ffd928	If backup-end record is not seen, and we reach end of recovery from a streamed backup, throw an error and refuse to start up. The restore has not finished correctly in that case and the data directory is possibly corrupt. We already errored out in case of archive recovery, but could not during crash recovery because we couldn't distinguish between the case that pg_start_backup() was called and the database then crashed (must not error, data is OK), and the case that we're restoring from a backup and not all the needed WAL was replayed (data can be corrupt). To distinguish those cases, add a line to backup_label to indicate whether the backup was taken with pg_start/stop_backup(), or by streaming (ie. pg_basebackup). This requires re-initdb, because of a new field added to the control file.	2011-08-10 09:22:49 +03:00
Tom Lane	4e15a4db5e	Documentation improvement and minor code cleanups for the latch facility. Improve the documentation around weak-memory-ordering risks, and do a pass of general editorialization on the comments in the latch code. Make the Windows latch code more like the Unix latch code where feasible; in particular provide the same Assert checks in both implementations. Fix poorly-placed WaitLatch call in syncrep.c. This patch resolves, for the moment, concerns around weak-memory-ordering bugs in latch-related code: we have documented the restrictions and checked that existing calls meet them. In 9.2 I hope that we will install suitable memory barrier instructions in SetLatch/ResetLatch, so that their callers don't need to be quite so careful.	2011-08-09 15:30:45 -04:00
Heikki Linnakangas	77949a2913	Change the way string relopts are allocated. Don't try to allocate the default value for a string relopt in the same palloc chunk as the relopt_string struct. That didn't work too well if you added a built-in string relopt in the stringRelOpts array, as it's not possible to have an initializer for a variable length struct in C. This makes the code slightly simpler too. While we're at it, move the call to validator function in add_string_reloption to before the allocation, so that if someone does pass a bogus default value, we don't leak memory.	2011-08-09 15:25:44 +03:00
Tom Lane	77ba232564	Fix nested PlaceHolderVar expressions that appear only in targetlists. A PlaceHolderVar's expression might contain another, lower-level PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one certainly will be also, and so we have to make sure that both of them get into the placeholder_list with correct ph_may_need values during the initial pre-scan of the query (before deconstruct_jointree starts). We did this correctly for PlaceHolderVars appearing in the query quals, but overlooked the issue for those appearing in the top-level targetlist; with the result that nested placeholders referenced only in the targetlist did not work correctly, as illustrated in bug #6154. While at it, add some error checking to find_placeholder_info to ensure that we don't try to create new placeholders after it's too late to do so; they have to all be created before deconstruct_jointree starts. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2011-08-09 00:50:07 -04:00
Andrew Dunstan	f54e373d93	Correct the lie in pg_config.h.win32 about having inttypes.h. This lie has been harmless until now, but has been exposed by the change to include postgres.h before the python headers, which in some versions include inttypes.h if HAVE_INTTYPES_H is set.	2011-08-08 08:52:05 -04:00
Tom Lane	05e8396892	Clean up ill-advised attempt to invent a private set of Node tags. Somebody thought it'd be cute to invent a set of Node tag numbers that were defined independently of, and indeed conflicting with, the main tag-number list. While this accidentally failed to fail so far, it would certainly lead to trouble as soon as anyone wanted to, say, apply copyObject to these node types. Clang was already complaining about the use of makeNode on these tags, and I think quite rightly so. Fix by pushing these node definitions into the mainstream, including putting replnodes.h where it belongs.	2011-08-06 14:53:49 -04:00
Robert Haas	c4096c7639	Allow per-column foreign data wrapper options. Shigeru Hanada, with fairly minor editing by me.	2011-08-05 13:24:03 -04:00
Robert Haas	84e3712677	Create VXID locks "lazily" in the main lock table. Instead of entering them on transaction startup, we materialize them only when someone wants to wait, which will occur only during CREATE INDEX CONCURRENTLY. In Hot Standby mode, the startup process must also be able to probe for conflicting VXID locks, but the lock need never be fully materialized, because the startup process does not use the normal lock wait mechanism. Since most VXID locks never need to touch the lock manager partition locks, this can significantly reduce blocking contention on read-heavy workloads. Patch by me. Review by Jeff Davis.	2011-08-04 12:38:33 -04:00
Robert Haas	4af43ee3f1	Make pgbench use erand48() rather than random(). glibc renders random() thread-safe by wrapping a futex lock around it; testing reveals that this limits the performance of pgbench on machines with many CPU cores. Rather than switching to random_r(), which is only available on GNU systems and crashes unless you use undocumented alchemy to initialize the random state properly, switch to our built-in implementation of erand48(), which is both thread-safe and concurrent. Since the list of reasons not to use the operating system's erand48() is getting rather long, rename ours to pg_erand48() (and similarly for our implementations of lrand48() and srand48()) and just always use those. We were already doing this on Cygwin anyway, and the glibc implementation is not quite thread-safe, so pgbench wouldn't be able to use that either. Per discussion with Tom Lane.	2011-08-03 16:26:40 -04:00
Tom Lane	ac36e6f71f	Move CheckRecoveryConflictDeadlock() call to a safer place. This kluge was inserted in a spot apparently chosen at random: the lock manager's state is not yet fully set up for the wait, and in particular LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock will not get cleaned up if the ereport is thrown. This seems to not cause any observable problem in trivial test cases, because LockReleaseAll will silently clean up the debris; but I was able to cause failures with tests involving subtransactions. Fixes breakage induced by commit `c85c941470`. Back-patch to all affected branches.	2011-08-02 15:16:29 -04:00
Tom Lane	2e53bd5517	Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId. It was initialized in the wrong place and to the wrong value. With bad luck this could result in incorrect query-cancellation failures in hot standby sessions, should a HS backend be holding pin on buffer number 1 while trying to acquire a lock.	2011-08-02 13:23:52 -04:00
Tom Lane	988cccc620	Rethink behavior of CREATE OR REPLACE during CREATE EXTENSION. The original implementation simply did nothing when replacing an existing object during CREATE EXTENSION. The folly of this was exposed by a report from Marc Munro: if the existing object belongs to another extension, we are left in an inconsistent state. We should insist that the object does not belong to another extension, and then add it to the current extension if not already a member.	2011-07-23 16:59:39 -04:00
Robert Haas	463f2625a5	Support SECURITY LABEL on databases, tablespaces, and roles. This requires a new shared catalog, pg_shseclabel. Along the way, fix the security_label regression tests so that they don't monkey with the labels of any pre-existing objects. This is unlikely to matter in practice, since only the label for the "dummy" provider was being manipulated. But this way still seems cleaner. KaiGai Kohei, with fairly extensive hacking by me.	2011-07-20 13:18:24 -04:00
Tom Lane	cacd42d62c	Rewrite libxml error handling to be more robust. libxml reports some errors (like invalid xmlns attributes) via the error handler hook, but still returns a success indicator to the library caller. This causes us to miss some errors that are important to report. Since the "generic" error handler hook doesn't know whether the message it's getting is for an error, warning, or notice, stop using that and instead start using the "structured" error handler hook, which gets enough information to be useful. While at it, arrange to save and restore the error handler hook setting in each libxml-using function, rather than assuming we can set and forget the hook. This should improve the odds of working nicely with third-party libraries that also use libxml. In passing, volatile-ize some local variables that get modified within PG_TRY blocks. I noticed this while testing with an older gcc version than I'd previously tried to compile xml.c with. Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch	2011-07-20 13:03:49 -04:00
Simon Riggs	4bd8ed31b7	Introduce sending servers as new category for replication params Fujii Masao	2011-07-19 08:59:55 +01:00
Simon Riggs	5286105800	Cascading replication feature for streaming log-based replication. Standby servers can now have WALSender processes, which can work with either WALReceiver or archive_commands to pass data. Fully updated docs, including new conceptual terms of sending server, upstream and downstream servers. WALSenders terminated when promote to master. Fujii Masao, review, rework and doc rewrite by Simon Riggs	2011-07-19 03:40:03 +01:00
Robert Haas	367bc426a1	Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE. Noah Misch. Review and minor cosmetic changes by me.	2011-07-18 11:04:43 -04:00
Robert Haas	3cba8999b3	Create a "fast path" for acquiring weak relation locks. When an AccessShareLock, RowShareLock, or RowExclusiveLock is requested on an unshared database relation, and we can verify that no conflicting locks can possibly be present, record the lock in a per-backend queue, stored within the PGPROC, rather than in the primary lock table. This eliminates a great deal of contention on the lock manager LWLocks. This patch also refactors the interface between GetLockStatusData() and pg_lock_status() to be a bit more abstract, so that we don't rely so heavily on the lock manager's internal representation details. The new fast path lock structures don't have a LOCK or PROCLOCK structure to return, so we mustn't depend on that for purposes of listing outstanding locks. Review by Jeff Davis.	2011-07-18 00:49:28 -04:00
Robert Haas	b59d2fe497	Add pg_opfamily_is_visible. We already have similar functions for many other object types, including operator classes, so it seems like we should have this one, too. Extracted from a larger patch by Josh Kupershmidt	2011-07-17 23:23:55 -04:00
Tom Lane	23e5b16c71	Add temp_file_limit GUC parameter to constrain temporary file space usage. The limit is enforced against the total amount of temp file space used by each session. Mark Kirkwood, reviewed by Cédric Villemain and Tatsuo Ishii	2011-07-17 14:19:31 -04:00
Tom Lane	ed7ed76712	Add an errdetail_internal() ereport auxiliary routine. This function supports untranslated detail messages, in the same way that errmsg_internal supports untranslated primary messages. We've needed this for some time IMO, but discussion of some cases in the SSI code provided the impetus to actually add it. Kevin Grittner, with minor adjustments by me	2011-07-16 14:22:15 -04:00
Heikki Linnakangas	8d260911e8	Change the way the offset of downlink is stored in GISTInsertStack. GISTInsertStack.childoffnum used to mean "offset of the downlink in this node, pointing to the child node in the stack". It's now replaced with downlinkoffnum, which means "offset of the downlink in the parent of this node". gistFindPath() already used childoffnum with this new meaning, and had an extra step at the end to pull all the childoffnum values down one node in the stack, to adjust the stack for the meaning that childoffnum had elsewhere. That's no longer required. The reason to do this now is this new representation is more convenient for the GiST fast build patch that Alexander Korotkov is working on. While we're at it, replace the linked list used in gistFindPath with a standard List, and make gistFindPath() static. Alexander Korotkov, with some changes by me.	2011-07-15 12:18:30 +03:00
Tom Lane	c1d9579dd8	Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window. Regular aggregate functions in combination with, or within the arguments of, window functions are OK per spec; they have the semantics that the aggregate output rows are computed and then we run the window functions over that row set. (Thus, this combination is not really useful unless there's a GROUP BY so that more than one aggregate output row is possible.) The case without GROUP BY could fail, as recently reported by Jeff Davis, because sloppy construction of the Agg node's targetlist resulted in extra references to possibly-ungrouped Vars appearing outside the aggregate function calls themselves. See the added regression test case for an example. Fixing this requires modifying the API of flatten_tlist and its underlying function pull_var_clause. I chose to make pull_var_clause's API for aggregates identical to what it was already doing for placeholders, since the useful behaviors turn out to be the same (error, report node as-is, or recurse into it). I also tightened the error checking in this area a bit: if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here, that was a long time ago, so complain instead of ignoring them. Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing that it only occurs in a basically-useless corner case, it doesn't seem worth the risks of changing a function API in a minor release. There might be third-party code using pull_var_clause.	2011-07-12 18:24:39 -04:00
Robert Haas	4240e429d0	Try to acquire relation locks in RangeVarGetRelid. In the previous coding, we would look up a relation in RangeVarGetRelid, lock the resulting OID, and then AcceptInvalidationMessages(). While this was sufficient to ensure that we noticed any changes to the relation definition before building the relcache entry, it didn't handle the possibility that the name we looked up no longer referenced the same OID. This was particularly problematic in the case where a table had been dropped and recreated: we'd latch on to the entry for the old relation and fail later on. Now, we acquire the relation lock inside RangeVarGetRelid, and retry the name lookup if we notice that invalidation messages have been processed meanwhile. Many operations that would previously have failed with an error in the presence of concurrent DDL will now succeed. There is a good deal of work remaining to be done here: many callers of RangeVarGetRelid still pass NoLock for one reason or another. In addition, nothing in this patch guards against the possibility that the meaning of an unqualified name might change due to the creation of a relation in a schema earlier in the user's search path than the one where it was previously found. Furthermore, there's nothing at all here to guard against similar race conditions for non-relations. For all that, it's a start. Noah Misch and Robert Haas	2011-07-08 22:19:30 -04:00
Tom Lane	9d522cb35d	Fix another oversight in logging of changes in postgresql.conf settings. We were using GetConfigOption to collect the old value of each setting, overlooking the possibility that it didn't exist yet. This does happen in the case of adding a new entry within a custom variable class, as exhibited in bug #6097 from Maxim Boguk. To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1 and HEAD --- it seems possible that some third-party code is using that function, so changing its API in a minor release would cause problems. In 9.0, create a near-duplicate function instead.	2011-07-08 17:02:58 -04:00
Heikki Linnakangas	89fd72cbf2	Introduce a pipe between postmaster and each backend, which can be used to detect postmaster death. Postmaster keeps the write-end of the pipe open, so when it dies, children get EOF in the read-end. That can conveniently be waited for in select(), which allows eliminating some of the polling loops that check for postmaster death. This patch doesn't yet change all the loops to use the new mechanism, expect a follow-on patch to do that. This changes the interface to WaitLatch, so that it takes as argument a bitmask of events that it waits for. Possible events are latch set, timeout, postmaster death, and socket becoming readable or writeable. The pipe method behaves slightly differently from the kill() method previously used in PostmasterIsAlive() in the case that postmaster has died, but its parent has not yet read its exit code with waitpid(). The pipe returns EOF as soon as the process dies, but kill() continues to return true until waitpid() has been called (IOW while the process is a zombie). Because of that, change PostmasterIsAlive() to use the pipe too, otherwise WaitLatch() would return immediately with WL_POSTMASTER_DEATH, while PostmasterIsAlive() would claim it's still alive. That could easily lead to busy-waiting while postmaster is in zombie state. Peter Geoghegan with further changes by me, reviewed by Fujii Masao and Florian Pflug.	2011-07-08 18:44:07 +03:00
Heikki Linnakangas	406d61835b	SSI has a race condition, where the order of commit sequence numbers of transactions might not match the order the work done in those transactions become visible to others. The logic in SSI, however, assumed that it does. Fix that by having two sequence numbers for each serializable transaction, one taken before a transaction becomes visible to others, and one after it. This is easier than trying to make the the transition totally atomic, which would require holding ProcArrayLock and SerializableXactHashLock at the same time. By using prepareSeqNo instead of commitSeqNo in a few places where commit sequence numbers are compared, we can make those comparisons err on the safe side when we don't know for sure which committed first. Per analysis by Kevin Grittner and Dan Ports, but this approach to fix it is different from the original patch.	2011-07-07 23:26:34 +03:00
Tom Lane	60a81ad133	Reclassify replication-related GUC variables as "master" and "standby". Per discussion, this structure seems more understandable than what was there before. Make config.sgml and postgresql.conf.sample agree. In passing do a bit of editorial work on the variable descriptions.	2011-07-07 15:11:41 -04:00
Tom Lane	14f67192c2	Remove assumptions that not-equals operators cannot be in any opclass. get_op_btree_interpretation assumed this in order to save some duplication of code, but it's not true in general anymore because we added <> support to btree_gist. (We still assume it for btree opclasses, though.) Also, essentially the same logic was baked into predtest.c. Get rid of that duplication by generalizing get_op_btree_interpretation so that it can be used by predtest.c. Per bug report from Denis de Bernardy and investigation by Jeff Davis, though I didn't use Jeff's patch exactly as-is. Back-patch to 9.1; we do not support this usage before that.	2011-07-06 14:53:16 -04:00
Robert Haas	c7f23494c1	Add \ir command to psql. \ir is short for "include relative"; when used from a script, the supplied pathname will be interpreted relative to the input file, rather than to the current working directory. Gurjeet Singh, reviewed by Josh Kupershmidt, with substantial further cleanup by me.	2011-07-06 11:45:13 -04:00
Alvaro Herrera	b93f5a5673	Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.h This lets us stop including rel.h into execnodes.h, which is a widely used header.	2011-07-04 14:35:58 -04:00
Tom Lane	e54ae784e6	Remove missed reference to SilentMode.	2011-07-04 10:35:52 -04:00
Robert Haas	5da79169d3	Fix bugs in relpersistence handling during table creation. Unlike the relistemp field which it replaced, relpersistence must be set correctly quite early during the table creation process, as we rely on it quite early on for a number of purposes, including security checks. Normally, this is set based on whether the user enters CREATE TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a relation may also be made implicitly temporary by creating it in pg_temp. This patch fixes the handling of that case, and also disables creation of unlogged tables in temporary tablespace (such table indeed skip WAL-logging, but we reject an explicit specification) and creation of relations in the temporary schemas of other sessions (which is not very sensible, and didn't work right anyway). Report by Amit Khandekar.	2011-07-03 17:34:47 -04:00
Magnus Hagander	24e2d4b6ba	Mark pg_stat_reset_shared as strict This is the proper fix for bug #6082 about pg_stat_reset_shared(NULL) causing a crash, and it reverts commit `79aa44536f` on head. The workaround of throwing an error from inside the function is left on backbranches (including 9.1) since this change requires a new initdb.	2011-07-03 13:15:58 +02:00
Alvaro Herrera	897795240c	Enable CHECK constraints to be declared NOT VALID This means that they can initially be added to a large existing table without checking its initial contents, but new tuples must comply to them; a separate pass invoked by ALTER TABLE / VALIDATE can verify existing data and ensure it complies with the constraint, at which point it is marked validated and becomes a normal part of the table ecosystem. An non-validated CHECK constraint is ignored in the planner for constraint_exclusion purposes; when validated, cached plans are recomputed so that partitioning starts working right away. This patch also enables domains to have unvalidated CHECK constraints attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT VALID, which can later be validated with ALTER DOMAIN / VALIDATE CONSTRAINT. Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various reviews, and Robert Hass for documentation wording improvement suggestions. This patch was sponsored by Enova Financial.	2011-06-30 11:24:31 -04:00
Heikki Linnakangas	cd70dd6bef	Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me	2011-06-29 21:57:43 +03:00
Simon Riggs	465883b0a2	Introduce compact WAL record for the common case of commit (non-DDL). XLOG_XACT_COMMIT_COMPACT leaves out invalidation messages and relfilenodes, saving considerable space for the vast majority of transaction commits. XLOG_XACT_COMMIT keeps same definition as XLOG_PAGE_MAGIC 0xD067 and earlier. Leonardo Francalanci and Simon Riggs	2011-06-28 22:58:17 +01:00
Alvaro Herrera	6f3efa76b0	Remove rel.h from objectaddress.h; only relcache.h is necessary. Add rel.h to some files that now need it.	2011-06-28 17:08:29 -04:00
Alvaro Herrera	e5e2fc842c	Modernise pg_hba.conf token processing The previous coding was ugly, as it marked special tokens as such in the wrong stage, relying on workarounds to figure out if they had been quoted in the original or not. This made it impossible to have specific keywords be recognized as such only in certain positions in HBA lines, for example. Fix by restructuring the parser code so that it remembers whether tokens were quoted or not. This eliminates widespread knowledge of possible known keywords for all fields. Also improve memory management in this area, to use memory contexts that are reset as a whole instead of using retail pfrees; this removes a whole lotta crufty (and probably slow) code. Instead of calling strlen() three times in next_field_expand on the returned token to find out whether there was a comma (and strip it), pass back the info directly from the callee, which is simpler. In passing, update historical artifacts in hba.c API. Authors: Brendan Jurd, Alvaro Herrera Reviewed by Pavel Stehule	2011-06-28 15:57:24 -04:00
Robert Haas	c533c1477f	Add a missing_ok argument to get_object_address(). This lays the groundwork for an upcoming patch to streamline the handling of DROP commands. KaiGai Kohei	2011-06-27 21:19:31 -04:00
Robert Haas	9abbed0629	Allow callers to pass a missing_ok flag when opening a relation. Since the names try_relation_openrv() and try_heap_openrv() don't seem quite appropriate, rename the functions to relation_openrv_extended() and heap_openrv_extended(). This is also more general, if we have a future need for additional parameters that are of interest to only a few callers. This is infrastructure for a forthcoming patch to allow get_object_address() to take a missing_ok argument as well. Patch by me, review by Noah Misch.	2011-06-27 15:25:44 -04:00
Robert Haas	e16954f3d2	Try again to make the visibility map crash safe. My previous attempt was quite a bit less than half-baked with respect to heap_update().	2011-06-27 13:55:55 -04:00
Robert Haas	4da99ea423	Avoid having two copies of the HOT-chain search logic. It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.	2011-06-27 10:27:17 -04:00
Heikki Linnakangas	5da417f7c4	Remove pointless const qualifiers from function arguments in the SSI code. As Tom Lane pointed out, "const Relation foo" doesn't guarantee that you can't modify the data the "foo" pointer points to. It just means that you can't change the pointer to point to something else within the function, which is not very useful.	2011-06-22 12:18:39 +03:00
Robert Haas	503c7305a1	Make the visibility map crash-safe. This involves two main changes from the previous behavior. First, when we set a bit in the visibility map, emit a new WAL record of type XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and the visibility map bit. Second, when inserting, updating, or deleting a tuple, we can no longer get away with clearing the visibility map bit after releasing the lock on the corresponding heap page, because an intervening crash might leave the visibility map bit set and the page-level bit clear. Making this work requires a bit of interface refactoring. In passing, a few minor but related cleanups: change the test in visibilitymap_set and visibilitymap_clear to throw an error if the wrong page (or no page) is pinned, rather than silently doing nothing; this case should never occur. Also, remove duplicate definitions of InvalidXLogRecPtr. Patch by me, review by Noah Misch.	2011-06-21 23:04:40 -04:00
Robert Haas	8f9fe6edce	Add notion of a "transform function" that can simplify function calls. Initially, we use this only to eliminate calls to the varchar() function in cases where the length is not being reduced and, therefore, the function call is equivalent to a RelabelType operation. The most significant effect of this is that we can avoid a table rewrite when changing a varchar(X) column to a varchar(Y) column, where Y > X. Noah Misch, reviewed by me and Alexey Klyukin	2011-06-21 22:21:24 -04:00
Heikki Linnakangas	7cb2ff9621	Fix bug introduced by recent SSI patch to merge ROLLED_BACK and MARKED_FOR_DEATH flags into one. We still need the ROLLED_BACK flag to mark transactions that are in the process of being rolled back. To be precise, ROLLED_BACK now means that a transaction has already been discounted from the count of transactions with the oldest xmin, but not yet removed from the list of active transactions. Dan Ports	2011-06-21 14:49:50 +03:00
Andrew Dunstan	ddef31c15c	Set FLEXIBLE_ARRAY_MEMBER to empty for MSVC. Per gripe from Tom Lane. I have tested this with VC 2008, and assume it will work with earlier versions.	2011-06-17 18:22:03 -04:00
Andrew Dunstan	236a11dc65	Define FLEXIBLE_ARRAY_MEMBER for MSVC.	2011-06-16 22:30:24 -04:00
Peter Eisentraut	dbbba5279f	Start using flexible array members Flexible array members are a C99 feature that avoids "cheating" in the declaration of variable-length arrays at the end of structs. With Autoconf support, this should be transparent for older compilers. We start with one use in gist.h because gcc 4.6 started to raise a warning there. Over time, it can be expanded to other places in the source, but they will likely need some review of sizeof and offsetof usage. The current change in gist.h appears to be safe in this regard.	2011-06-16 22:45:38 +03:00
Heikki Linnakangas	cb94db91b2	pgindent run of recent SSI changes. Also, remove an unnecessary #include. Kevin Grittner	2011-06-16 16:17:22 +03:00
Tom Lane	e1ccaff6ee	Rework parsing of ConstraintAttributeSpec to improve NOT VALID handling. The initial commit of the ALTER TABLE ADD FOREIGN KEY NOT VALID feature failed to support labeling such constraints as deferrable. The best fix for this seems to be to fold NOT VALID into ConstraintAttributeSpec. That's a bit more general than the documented syntax, but it allows better-targeted syntax error messages. In addition, do some mostly-but-not-entirely-cosmetic code review for the whole NOT VALID patch.	2011-06-15 19:06:21 -04:00
Heikki Linnakangas	264a6b127a	The rolled-back flag on serializable xacts was pointless and redundant with the marked-for-death flag. It was only set for a fleeting moment while a transaction was being cleaned up at rollback. All the places that checked for the rolled-back flag should also check the marked-for-death flag, as both flags mean that the transaction will roll back. I also renamed the marked-for-death into "doomed", which is a lot shorter name.	2011-06-15 13:35:28 +03:00
Heikki Linnakangas	0a0e2b52a5	Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.	2011-06-15 12:11:18 +03:00
Heikki Linnakangas	b81831acbc	Renumber 2PC resource managers so that compared to 9.0, predicate lock rmgr is added to the end, and existing resource managers keep their old ids. We're not going to guarantee on-disk compatibility for 2PC state files over major releases, but it seems better to avoid changing the ids them anyway. It will help anyone who might want to write external tools to inspect the state files to work with files from different versions, if nothing else. Per complaint from Tom Lane.	2011-06-14 12:36:31 +03:00
Tom Lane	c962792211	Stamp HEAD as 9.2devel.	2011-06-11 17:46:49 -04:00
Tom Lane	c2ba0121c7	Work around gcc 4.6.0 bug that breaks WAL replay. ReadRecord's habit of using both direct references to tmpRecPtr and references to *RecPtr (which is pointing at tmpRecPtr) triggers an optimization bug in gcc 4.6.0, which apparently has forgotten about aliasing rules. Avoid the compiler bug, and make the code more readable to boot, by getting rid of the direct references. Improve the comments while at it. Back-patch to all supported versions, in case they get built with 4.6.0. Tom Lane, with some cosmetic suggestions from Alex Hunsaker	2011-06-10 17:04:29 -04:00
Heikki Linnakangas	cb2d158c58	Fix locking while setting flags in MySerializableXact. Even if a flag is modified only by the backend owning the transaction, it's not safe to modify it without a lock. Another backend might be setting or clearing a different flag in the flags field concurrently, and that operation might be lost because setting or clearing a bit in a word is not atomic. Make did-write flag a simple backend-private boolean variable, because it was only set or tested in the owning backend (except when committing a prepared transaction, but it's not worthwhile to optimize for the case of a read-only prepared transaction). This also eliminates the need to add locking where that flag is set. Also, set the did-write flag when doing DDL operations like DROP TABLE or TRUNCATE -- that was missed earlier.	2011-06-10 23:41:10 +03:00
Alvaro Herrera	fba105b109	Use "transient" files for blind writes, take 2 "Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.) This commit fixes a bug in the previous one, which neglected to cleanly handle the LRU ring that fd.c uses to manage open files, and caused an unacceptable failure just before beta2 and was thus reverted.	2011-06-10 13:43:02 -04:00
Heikki Linnakangas	c79c570bd8	Small comment fixes and enhancements.	2011-06-10 17:22:46 +03:00
Tom Lane	829ae4bf83	Tag 9.1beta2.	2011-06-09 19:40:42 -04:00
Alvaro Herrera	9261557eb1	Revert "Use "transient" files for blind writes" This reverts commit `54d9e8c6c1`, which caused a failure on the buildfarm. Not a good thing to have just before a beta release.	2011-06-09 16:41:44 -04:00
Alvaro Herrera	54d9e8c6c1	Use "transient" files for blind writes "Blind writes" are a mechanism to push buffers down to disk when evicting them; since they may belong to different databases than the one a backend is connected to, the backend does not necessarily have a relation to link them to, and thus no way to blow them away. We were keeping those files open indefinitely, which would cause a problem if the underlying table was deleted, because the operating system would not be able to reclaim the disk space used by those files. To fix, have bufmgr mark such files as transient to smgr; the lower layer is allowed to close the file descriptor when the current transaction ends. We must be careful to have any other access of the file to remove the transient markings, to prevent unnecessary expensive system calls when evicting buffers belonging to our own database (which files we're likely to require again soon.)	2011-06-09 16:25:49 -04:00
Bruce Momjian	6560407c7d	Pgindent run before 9.1 beta2.	2011-06-09 14:32:50 -04:00
Heikki Linnakangas	8f9622bbb3	Make DDL operations play nicely with Serializable Snapshot Isolation. Truncating or dropping a table is treated like deletion of all tuples, and check for conflicts accordingly. If a table is clustered or rewritten by ALTER TABLE, all predicate locks on the heap are promoted to relation-level locks, because the tuple or page ids of any existing tuples will change and won't be valid after rewriting the table. Arguably ALTER TABLE should be treated like a mass-UPDATE of every row, but if you e.g change the datatype of a column, you could also argue that it's just a change to the physical layout, not a logical change. Reindexing promotes all locks on the index to relation-level lock on the heap. Kevin Grittner, with a lot of cosmetic changes by me.	2011-06-08 14:02:43 +03:00
Tom Lane	ea8e42f3a0	Fix failure to check whether a rowtype's component types are sortable. The existence of a btree opclass accepting composite types caused us to assume that every composite type is sortable. This isn't true of course; we need to check if the column types are all sortable. There was logic for this for the case of array comparison (ie, check that the element type is sortable), but we missed the point for rowtypes. Per Teodor's report of an ANALYZE failure for an unsortable composite type. Rather than just add some more ad-hoc logic for this, I moved knowledge of the issue into typcache.c. The typcache will now only report out array_eq, record_cmp, and friends as usable operators if the array or composite type will work with those functions. Unfortunately we don't have enough info to do this for anonymous RECORD types; in that case, just assume it will work, and take the runtime failure as before if it doesn't. This patch might be a candidate for back-patching at some point, but given the lack of complaints from the field, I'd rather just test it in HEAD for now. Note: most of the places touched in this patch will need further work when we get around to supporting hashing of record types.	2011-06-03 15:39:17 -04:00
Heikki Linnakangas	c8630919e0	SSI comment fixes and enhancements. Notably, document that the conflict-out flag actually means that the transaction has a conflict out to a transaction that committed before the flagged transaction. Kevin Grittner	2011-06-03 12:45:42 +03:00
Tom Lane	680ea6a6df	Looks like we can't declare getpeereid on Windows anyway. ... for lack of the uid_t and gid_t typedefs. Per buildfarm.	2011-06-02 17:27:30 -04:00
Tom Lane	3980f7fc6e	Implement getpeereid() as a src/port compatibility function. This unifies a bunch of ugly #ifdef's in one place. Per discussion, we only need this where HAVE_UNIX_SOCKETS, so no need to cover Windows. Marko Kreen, some adjustment by Tom Lane	2011-06-02 13:05:01 -04:00
Tom Lane	be4585b1c2	Replace use of credential control messages with getsockopt(LOCAL_PEERCRED). It turns out the reason we hadn't found out about the portability issues with our credential-control-message code is that almost no modern platforms use that code at all; the ones that used to need it now offer getpeereid(), which we choose first. The last holdout was NetBSD, and they added getpeereid() as of 5.0. So far as I can tell, the only live platform on which that code was being exercised was Debian/kFreeBSD, ie, FreeBSD kernel with Linux userland --- since glibc doesn't provide getpeereid(), we fell back to the control message code. However, the FreeBSD kernel provides a LOCAL_PEERCRED socket parameter that's functionally equivalent to Linux's SO_PEERCRED. That is both much simpler to use than control messages, and superior because it doesn't require receiving a message from the other end at just the right time. Therefore, add code to use LOCAL_PEERCRED when necessary, and rip out all the credential-control-message code in the backend. (libpq still has such code so that it can still talk to pre-9.1 servers ... but eventually we can get rid of it there too.) Clean up related autoconf probes, too. This means that libpq's requirepeer parameter now works on exactly the same platforms where the backend supports peer authentication, so adjust the documentation accordingly.	2011-05-31 16:10:46 -04:00
Tom Lane	b4b6923e03	Fix VACUUM so that it always updates pg_class.reltuples/relpages. When we added the ability for vacuum to skip heap pages by consulting the visibility map, we made it just not update the reltuples/relpages statistics if it skipped any pages. But this could leave us with extremely out-of-date stats for a table that contains any unchanging areas, especially for TOAST tables which never get processed by ANALYZE. In particular this could result in autovacuum making poor decisions about when to process the table, as in recent report from Florian Helmberger. And in general it's a bad idea to not update the stats at all. Instead, use the previous values of reltuples/relpages as an estimate of the tuple density in unvisited pages. This approach results in a "moving average" estimate of reltuples, which should converge to the correct value over multiple VACUUM and ANALYZE cycles even when individual measurements aren't very good. This new method for updating reltuples is used by both VACUUM and ANALYZE, with the result that we no longer need the grotty interconnections that caused ANALYZE to not update the stats depending on what had happened in the parent VACUUM command. Also, fix the logic for skipping all-visible pages during VACUUM so that it looks ahead rather than behind to decide what to do, as per a suggestion from Greg Stark. This eliminates useless scanning of all-visible pages at the start of the relation or just after a not-all-visible page. In particular, the first few pages of the relation will not be invariably included in the scanned pages, which seems to help in not overweighting them in the reltuples estimate. Back-patch to 8.4, where the visibility map was introduced.	2011-05-30 17:06:52 -04:00
Heikki Linnakangas	3103f9a77d	The row-version chaining in Serializable Snapshot Isolation was still wrong. On further analysis, it turns out that it is not needed to duplicate predicate locks to the new row version at update, the lock on the version that the transaction saw as visible is enough. However, there was a different bug in the code that checks for dangerous structures when a new rw-conflict happens. Fix that bug, and remove all the row-version chaining related code. Kevin Grittner & Dan Ports, with some comment editorialization by me.	2011-05-30 20:47:17 +03:00
Robert Haas	7149b128dc	Improve hash_array() logic for combining hash values. The new logic is less vulnerable to transpositions. This invalidates the contents of hash indexes built with the old functions; hence, bump catversion. Dean Rasheed	2011-05-23 15:17:18 -04:00
Tom Lane	299d171652	Install defenses against overflow in BuildTupleHashTable(). The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.	2011-05-23 12:52:46 -04:00
Heikki Linnakangas	30e98a7e6e	Pull up isReset flag from AllocSetContext to MemoryContext struct. This avoids the overhead of one function call when calling MemoryContextReset(), and it seems like the isReset optimization would be applicable to any new memory context we might invent in the future anyway. This buys back the overhead I just added in previous patch to always call MemoryContextReset() in ExecScan, even when there's no quals or projections.	2011-05-21 14:47:19 -04:00
Robert Haas	9bb6d97952	More cleanup of FOREIGN TABLE permissions handling. This commit fixes psql, pg_dump, and the information schema to be consistent with the backend changes which I made as part of commit `be90032e0d`, and also includes a related documentation tweak. Shigeru Hanada, with slight adjustment.	2011-05-13 15:51:03 -04:00
Tom Lane	e05b866447	Split PGC_S_DEFAULT into two values, for true boot_val vs computed default. Failure to distinguish these cases is the real cause behind the recent reports of Windows builds crashing on 'infinity'::timestamp, which was directly due to failure to establish a value of timezone_abbreviations in postmaster child processes. The postmaster had the desired value, but write_one_nondefault_variable() didn't transmit it to backends. To fix that, invent a new value PGC_S_DYNAMIC_DEFAULT, and be sure to use that or PGC_S_ENV_VAR (as appropriate) for "default" settings that are computed during initialization. (We need both because there's at least one variable that could receive a value from either source.) This commit also fixes ProcessConfigFile's failure to restore the correct default value for certain GUC variables if they are set in postgresql.conf and then removed/commented out of the file. We have to recompute and reinstall the value for any GUC variable that could have received a value from PGC_S_DYNAMIC_DEFAULT or PGC_S_ENV_VAR sources, and there were a number of oversights. (That whole thing is a crock that needs to be redesigned, but not today.) However, I intentionally didn't make it work "exactly right" for the cases of timezone and log_timezone. The exactly right behavior would involve running select_default_timezone, which we'd have to do independently in each postgres process, causing the whole database to become entirely unresponsive for as much as several seconds. That didn't seem like a good idea, especially since the variable's removal from postgresql.conf might be just an accidental edit. Instead the behavior is to adopt the previously active setting as if it were default. Note that this patch creates an ABI break for extensions that use any of the PGC_S_XXX constants; they'll need to be recompiled.	2011-05-11 19:57:38 -04:00
Andrew Dunstan	c02d5b7c27	Use a macro variable PG_PRINTF_ATTRIBUTE for the style used for checking printf type functions. The style is set to "printf" for backwards compatibility everywhere except on Windows, where it is set to "gnu_printf", which eliminates hundreds of false error messages from modern versions of gcc arising from %m and %ll{d,u} formats.	2011-04-28 10:56:14 -04:00
Tom Lane	993c5e5904	Tag 9.1beta1.	2011-04-27 17:17:22 -04:00
Andrew Dunstan	6693fec0e8	Revert "Force use of "%I64d" format for 64 bit ints on MinGW." This reverts commit `52d01c2f52`. the UINT64_FORMAT bit broke the b uildfarm, so I'm reverting the whole thing pending further investigation.	2011-04-27 14:55:18 -04:00
Andrew Dunstan	52d01c2f52	Force use of "%I64d" format for 64 bit ints on MinGW. Both this and "%lld" work, but the compiler's format checking doesn't like "%lld", so we get all sorts of spurious warnings.	2011-04-27 10:09:23 -04:00
Robert Haas	68ef051f5c	Refactor broken CREATE TABLE IF NOT EXISTS support. Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.	2011-04-25 16:55:11 -04:00
Robert Haas	be90032e0d	Remove partial and undocumented GRANT .. FOREIGN TABLE support. Instead, foreign tables are treated just like views: permissions can be granted using GRANT privilege ON [TABLE] foreign_table_name TO role, and revoked similarly. GRANT/REVOKE .. FOREIGN TABLE is no longer supported, just as we don't support GRANT/REVOKE .. VIEW. The set of accepted permissions for foreign tables is now identical to the set for regular tables, and views. Per report from Thom Brown, and subsequent discussion.	2011-04-25 16:39:18 -04:00
Andrew Dunstan	860be17ec3	Assorted minor changes to silence Windows compiler warnings. Mostly to do with macro redefinitions or object signedness.	2011-04-25 12:56:53 -04:00
Bruce Momjian	76dd09bbec	Add postmaster/postgres undocumented -b option for binary upgrades. This option turns off autovacuum, prevents non-super-user connections, and enables oid setting hooks in the backend. The code continues to use the old autoavacuum disable settings for servers with earlier catalog versions. This includes a catalog version bump to identify servers that support the -b option.	2011-04-25 12:00:21 -04:00
Tom Lane	e6a30a8c3c	Improve cost estimation for aggregates and window functions. The previous coding failed to account properly for the costs of evaluating the input expressions of aggregates and window functions, as seen in a recent gripe from Claudio Freire. (I said at the time that it wasn't counting these costs at all; but on closer inspection, it was effectively charging these costs once per output tuple. That is completely wrong for aggregates, and not exactly right for window functions either.) There was also a hard-wired assumption that aggregates and window functions had procost 1.0, which is now fixed to respect the actual cataloged costs. The costing of WindowAgg is still pretty bogus, since it doesn't try to estimate the effects of spilling data to disk, but that seems like a separate issue.	2011-04-24 16:55:20 -04:00
Tom Lane	2ab0796d7a	Fix char2wchar/wchar2char to support collations properly. These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.	2011-04-23 12:35:41 -04:00
Tom Lane	ae20bf1740	Make GIN and GIST pass the index collation to all their support functions. Experimentation with contrib/btree_gist shows that the majority of the GIST support functions potentially need collation information. Safest policy seems to be to pass it to all of them, instead of making assumptions about which ones could possibly need it.	2011-04-22 20:13:12 -04:00
Robert Haas	68739ba856	Allow ALTER TABLE name {OF type \| NOT OF}. This syntax allows a standalone table to be made into a typed table, or a typed table to be made standalone. This is possibly a mildly useful feature in its own right, but the real motivation for this change is that we need it to make pg_upgrade work with typed tables. This doesn't actually fix that problem, but it's necessary infrastructure. Noah Misch	2011-04-20 21:38:47 -04:00
Tom Lane	8c19977e9c	Avoid changing an index's indcheckxmin horizon during REINDEX. There can never be a need to push the indcheckxmin horizon forward, since any HOT chains that are actually broken with respect to the index must pre-date its original creation. So we can just avoid changing pg_index altogether during a REINDEX operation. This offers a cleaner solution than my previous patch for the problem found a few days ago that we mustn't try to update pg_index while we are reindexing it. System catalog indexes will always be created with indcheckxmin = false during initdb, and with this modified code we should never try to change their pg_index entries. This avoids special-casing system catalogs as the former patch did, and should provide a performance benefit for many cases where REINDEX formerly caused an index to be considered unusable for a short time. Back-patch to 8.3 to cover all versions containing HOT. Note that this patch changes the API for index_build(), but I believe it is unlikely that any add-on code is calling that directly.	2011-04-19 18:50:56 -04:00
Tom Lane	918854cc08	Fix handling of collations in multi-row VALUES constructs. Per spec we ought to apply select_common_collation() across the expressions in each column of the VALUES table. The original coding was just taking the first row and assuming it was representative. This patch adds a field to struct RangeTblEntry to carry the resolved collations, so initdb is forced for changes in stored rule representation.	2011-04-18 15:31:52 -04:00
Tom Lane	2d3320d3d2	Simplify reindex_relation's API. For what seem entirely historical reasons, a bitmask "flags" argument was recently added to reindex_relation without subsuming its existing boolean argument into that bitmask. This seems a bit bizarre, so fold them together.	2011-04-16 17:26:41 -04:00
Tom Lane	121f49a00e	Clean up collation processing in prepunion.c. This area was a few bricks shy of a load, and badly under-commented too. We have to ensure that the generated targetlist entries for a set-operation node expose the correct collation for each entry, since higher-level processing expects the tlist to reflect the true ordering of the plan's output. This hackery wouldn't be necessary if SortGroupClause carried collation info ... but making it do so would inject more pain in the parser than would be saved here. Still, we might want to rethink that sometime.	2011-04-16 16:40:42 -04:00
Tom Lane	d64713df7e	Pass collations to functions in FunctionCallInfoData, not FmgrInfo. Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...	2011-04-12 19:19:24 -04:00
Tom Lane	921b993677	Fix RI_Initial_Check to use a COLLATE clause when needed in its query. If the referencing and referenced columns have different collations, the parser will be unable to resolve which collation to use unless it's helped out in this way. The effects are sometimes masked, if we end up using a non-collation-sensitive plan; but if we do use a mergejoin we'll see a failure, as recently noted by Robert Haas. The SQL spec states that the referenced column's collation should be used to resolve RI checks, so that's what we do. Note however that we currently don't append a COLLATE clause when writing a query that examines only the referencing column. If we ever support collations that have varying notions of equality, that will have to be changed. For the moment, though, it's preferable to leave it off so that we can use a normal index on the referencing column.	2011-04-11 21:32:53 -04:00
Tom Lane	3c381a55b0	Teach pattern_fixed_prefix() about collations. This is necessary, not optional, now that ILIKE and regexes are collation aware --- else we might derive a wrong comparison constant for index optimized pattern matches.	2011-04-11 12:28:28 -04:00
Heikki Linnakangas	7c797e7194	Fix the size of predicate lock manager's shared memory hash tables at creation. This way they don't compete with the regular lock manager for the slack shared memory, making the behavior more predictable.	2011-04-11 13:43:31 +03:00
Tom Lane	f510fc1d90	Add some more mapping macros for Microsoft wide-character API. Per buildfarm.	2011-04-10 19:37:24 -04:00
Tom Lane	1e16a8107d	Teach regular expression operators to honor collations. This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.	2011-04-10 18:03:09 -04:00
Bruce Momjian	bf50caf105	pgindent run before PG 9.1 beta 1.	2011-04-10 11:42:00 -04:00
Peter Eisentraut	11745364d0	Add collation support on Windows (MSVC build) There is not yet support in initdb to populate the pg_collation catalog, but if that is done manually, the rest should work.	2011-04-10 00:15:41 +03:00
Tom Lane	c5ff3ff492	Avoid an unnecessary syscache lookup in parse_coerce.c. All the other fields of the constant are being extracted from the syscache entry we already have, so handle collation similarly. (There don't seem to be any other uses for the new function at the moment.)	2011-04-08 16:11:41 -04:00
Tom Lane	2594cf0e8c	Revise the API for GUC variable assign hooks. The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".	2011-04-07 00:12:02 -04:00
Robert Haas	f5e524d92b	Add casts from int4 and int8 to numeric. Joey Adams, per gripe from Ramanujam. Review by myself and Tom Lane.	2011-04-05 09:35:43 -04:00
Simon Riggs	88f32b7ca2	Avoid assuming there will be only 3 states for synchronous_commit. Also avoid hardcoding the current default state by giving it the name "on" and replace with a meaningful name that reflects its behaviour. Coding only, no change in behaviour.	2011-04-04 23:23:13 +01:00
Robert Haas	240067b3b0	Merge synchronous_replication setting into synchronous_commit. This means one less thing to configure when setting up synchronous replication, and also avoids some ambiguity around what the behavior should be when the settings of these variables conflict. Fujii Masao, with additional hacking by me.	2011-04-04 16:25:52 -04:00
Robert Haas	6c57239985	Rearrange "add column" logic to merge columns at exec time. The previous coding set attinhcount too high in some cases, resulting in an undumpable, undroppable column. Per bug #5856, reported by Naoya Anzai. See also commit `31b6fc06d8`, which fixes a similar bug in ALTER TABLE .. ADD CONSTRAINT. Patch by Noah Misch.	2011-04-03 21:53:32 -04:00
Robert Haas	38b27792ea	Avoid possible hang during smart shutdown. If a smart shutdown occurs just as a child is starting up, and the child subsequently becomes a walsender, there is a race condition: the postmaster might count the exstant backends, determine that there is one normal backend, and wait for it to die off. Had the walsender transition already occurred before the postmaster counted, it would have proceeded with the shutdown. To fix this, have each child that transforms into a walsender kick the postmaster just after doing so, so that the state machine is certain to advance. Fujii Masao	2011-04-03 19:42:00 -04:00
Robert Haas	50533a6dc5	Support comments on FOREIGN DATA WRAPPER and SERVER objects. This mostly involves making it work with the objectaddress.c framework, which does most of the heavy lifting. In that vein, change GetForeignDataWrapperOidByName to get_foreign_data_wrapper_oid and GetForeignServerOidByName to get_foreign_server_oid, to match the pattern we use for other object types. Robert Haas and Shigeru Hanada	2011-04-01 11:28:28 -04:00
Heikki Linnakangas	c8ae318cbe	Increase SHMEM_INDEX_SIZE from 32 to 64. We're currently at 40 entries in ShmemIndex, so 64 leaves some headroom. Kevin Grittner	2011-03-31 13:37:01 +03:00
Heikki Linnakangas	754baa21f7	Automatically terminate replication connections that are idle for more than replication_timeout (a new GUC) milliseconds. The TCP timeout is often too long, you want the master to notice a dead connection much sooner. People complained about that in 9.0 too, but with synchronous replication it's even more important to notice dead connections promptly. Fujii Masao and Heikki Linnakangas	2011-03-30 10:20:37 +03:00
Peter Eisentraut	6c0dfc0356	Add maintainer-check target This can do various source code checks that are not appropriate for either the build or the regression tests. Currently: duplicate_oids, SGML syntax and tabs check, NLS syntax check.	2011-03-28 22:56:52 +03:00
Peter Eisentraut	aa6fdd186c	Make duplicate_oids return nonzero exit status if duplicates were found Automatic detection of errors is easier that way.	2011-03-28 22:56:52 +03:00
Tom Lane	eb51af71f2	Prevent a rowtype from being included in itself. Eventually we might be able to allow that, but it's not clear how many places need to be fixed to prevent infinite recursion when there's a direct or indirect inclusion of a rowtype in itself. One such place is CheckAttributeType(), which will recurse to stack overflow in cases such as those exhibited in bug #5950 from Alex Perepelica. If we were sure it was the only such place, we could easily modify the code added by this patch to stop the recursion without a complaint ... but it probably isn't the only such place. Hence, throw error until such time as someone is excited enough about this type of usage to put work into making it safe. Back-patch as far as 8.3. 8.2 doesn't have the recursive call in CheckAttributeType in the first place, so I see no need to add code there in the absence of clear evidence of a problem elsewhere.	2011-03-28 15:46:04 -04:00
Tom Lane	7208fae18f	Clean up cruft around collation initialization for tupdescs and scankeys. I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.	2011-03-26 18:28:40 -04:00
Tom Lane	0c9d9e8dd6	More collations cleanup, from trawling for missed collation assignments. Mostly cosmetic, though I did find that generateClonedIndexStmt failed to clone the index's collations.	2011-03-26 16:35:25 -04:00
Tom Lane	b23c9fa929	Clean up a few failures to set collation fields in expression nodes. I'm not sure these have any non-cosmetic implications, but I'm not sure they don't, either. In particular, ensure the CaseTestExpr generated by transformAssignmentIndirection to represent the base target column carries the correct collation, because parse_collate.c won't fix that. Tweak lsyscache.c API so that we can get the appropriate collation without an extra syscache lookup.	2011-03-26 14:25:48 -04:00
Tom Lane	bfa4440ca5	Pass collation to makeConst() instead of looking it up internally. In nearly all cases, the caller already knows the correct collation, and in a number of places, the value the caller has handy is more correct than the default for the type would be. (In particular, this patch makes it significantly less likely that eval_const_expressions will result in changing the exposed collation of an expression.) So an internal lookup is both expensive and wrong.	2011-03-25 20:10:42 -04:00
Tom Lane	27dc7e240b	Fix handling of collation in SQL-language functions. Ensure that parameter symbols receive collation from the function's resolved input collation, and fix inlining to behave properly. BTW, this commit lays about 90% of the infrastructure needed to support use of argument names in SQL functions. Parsing of parameters is now done via the parser-hook infrastructure ... we'd just need to supply a column-ref hook ...	2011-03-24 20:30:23 -04:00
Simon Riggs	ec497a5ad6	Make FKs valid at creation when added as column constraints. Bug report from Alvaro Herrera	2011-03-22 23:10:35 +00:00
Tom Lane	8df08c8489	Reimplement planner's handling of MIN/MAX aggregate optimization (again). Instead of playing cute games with pathkeys, just build a direct representation of the intended sub-select, and feed it through query_planner to get a Path for the index access. This is a bit slower than 9.1's previous method, since we'll duplicate most of the overhead of query_planner; but since the whole optimization only applies to rather simple single-table queries, that probably won't be much of a problem in practice. The advantage is that we get to do the right thing when there's a partial index that needs the implicit IS NOT NULL clause to be usable. Also, although this makes planagg.c be a bit more closely tied to the ordering of operations in grouping_planner, we can get rid of some coupling to lower-level parts of the planner. Per complaint from Marti Raudsepp.	2011-03-22 00:34:31 -04:00
Tom Lane	176d5bae1d	Fix up handling of C/POSIX collations. Install just one instance of the "C" and "POSIX" collations into pg_collation, rather than one per encoding. Make these instances exist and do something useful even in machines without locale_t support: to wit, it's now possible to force comparisons and case-folding functions to use C locale in an otherwise non-C database, whether or not the platform has support for using any additional collations. Fix up severely broken upper/lower/initcap functions, too: the C/POSIX fastpath now does what it is supposed to, and non-default collations are handled correctly in single-byte database encodings. Merge the two separate collation hashtables that were being maintained in pg_locale.c, and be more wary of the possibility that we fail partway through filling a cache entry.	2011-03-20 12:44:13 -04:00
Tom Lane	b310b6e31c	Revise collation derivation method and expression-tree representation. All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.	2011-03-19 20:30:08 -04:00
Magnus Hagander	6f9192df61	Rename ident authentication over local connections to peer This removes an overloading of two authentication options where one is very secure (peer) and one is often insecure (ident). Peer is also the name used in libpq from 9.1 to specify the same type of authentication. Also make initdb select peer for local connections when ident is chosen, and ident for TCP connections when peer is chosen. ident keyword in pg_hba.conf is still accepted and maps to peer authentication.	2011-03-19 18:44:35 +01:00
Robert Haas	9a56dc3389	Fix various possible problems with synchronous replication. 1. Don't ignore query cancel interrupts. Instead, if the user asks to cancel the query after we've already committed it, but before it's on the standby, just emit a warning and let the COMMIT finish. 2. Don't ignore die interrupts (pg_terminate_backend or fast shutdown). Instead, emit a warning message and close the connection without acknowledging the commit. Other backends will still see the effect of the commit, but there's no getting around that; it's too late to abort at this point, and ignoring die interrupts altogether doesn't seem like a good idea. 3. If synchronous_standby_names becomes empty, wake up all backends waiting for synchronous replication to complete. Without this, someone attempting to shut synchronous replication off could easily wedge the entire system instead. 4. Avoid depending on the assumption that if a walsender updates MyProc->syncRepState, we'll see the change even if we read it without holding the lock. The window for this appears to be quite narrow (and probably doesn't exist at all on machines with strong memory ordering) but protecting against it is practically free, so do that. 5. Remove useless state SYNC_REP_MUST_DISCONNECT, which isn't needed and doesn't actually do anything. There's still some further work needed here to make the behavior of fast shutdown plausible, but that looks complex, so I'm leaving it for a separate commit. Review by Fujii Masao.	2011-03-17 13:12:21 -04:00
Bruce Momjian	ddd6ff289f	Add database comments to template0 and postgres databases, and improve the comments on the template1 database. No catalog version bump because they are just comments.	2011-03-15 11:26:57 -04:00
Robert Haas	5ca4dfc79f	Remove 13 keywords that are used only for ROLE options. Review by Tom Lane.	2011-03-15 10:22:58 -04:00
Bruce Momjian	b051a34fd8	Remove duplicate time-based macros recently added.	2011-03-14 10:40:14 -04:00
Tom Lane	696d1f7f06	Make all comparisons done for/with statistics use the default collation. While this will give wrong answers when estimating selectivity for a comparison operator that's using a non-default collation, the estimation error probably won't be large; and anyway the former approach created estimation errors of its own by trying to use a histogram that might have been computed with some other collation. So we'll adopt this simplified approach for now and perhaps improve it sometime in the future. This patch incorporates changes from Andres Freund to make sure that selfuncs.c passes a valid collation OID to any datatype-specific function it calls, in case that function wants collation information. Said OID will now always be DEFAULT_COLLATION_OID, but at least we won't get errors.	2011-03-12 16:30:36 -05:00
Bruce Momjian	3a3f39fdc0	Use macros for time-based constants, rather than constants.	2011-03-12 09:35:56 -05:00
Tom Lane	8acdb8bf9c	Split CollateClause into separate raw and analyzed node types. CollateClause is now used only in raw grammar output, and CollateExpr after parse analysis. This is for clarity and to avoid carrying collation names in post-analysis parse trees: that's both wasteful and possibly misleading, since the collation's name could be changed while the parsetree still exists. Also, clean up assorted infelicities and omissions in processing of the node type.	2011-03-11 16:28:18 -05:00
Tom Lane	e3c732a85c	Create an explicit concept of collations that work for any encoding. Use collencoding = -1 to represent such a collation in pg_collation. We need this to make the "default" entry work sanely, and a later patch will fix the C/POSIX entries to be represented this way instead of duplicating them across all encodings. All lookup operations now search first for an entry that's database-encoding-specific, and then for the same name with collencoding = -1. Also some incidental code cleanup in collationcmds.c and pg_collation.c.	2011-03-11 13:20:11 -05:00
Bruce Momjian	7d23e0f803	Update C comment about O_DIRECT and fsync().	2011-03-11 06:46:44 -05:00
Tom Lane	7564654adf	Revert addition of third argument to format_type(). Including collation in the behavior of that function promotes a world view we do not want. Moreover, it was producing the wrong behavior for pg_dump anyway: what we want is to dump a COLLATE clause on attributes whose attcollation is different from the underlying type, and likewise for domains, and the function cannot do that for us. Doing it the hard way in pg_dump is a bit more tedious but produces more correct output. In passing, fix initdb so that the initial entry in pg_collation is properly pinned. It was droppable before :-(	2011-03-10 17:30:46 -05:00
Robert Haas	2e019c8611	More synchronous replication typo fixes. Fujii Masao	2011-03-10 15:56:18 -05:00
Robert Haas	b8bb8dbf20	More synchronous replication tweaks. SyncRepRequested() must check not only the value of the synchronous_replication GUC but also whether max_wal_senders > 0. Otherwise, we might end up waiting for sync rep even when there's no possibility of a standby ever managing to connect. There are some existing cross-checks to prevent this, but they're not quite sufficient: the user can start the server with max_wal_senders=0, synchronous_standby_names='', and synchronous_replication=off and then subsequent make synchronous_standby_names not empty using pg_ctl reload, and then SET synchronous_standby=on, leading to an indefinite hang. Along the way, rename the global variable for the synchronous_replication GUC to match the name of the GUC itself, for clarity. Report by Fujii Masao, though I didn't use his patch.	2011-03-10 15:43:37 -05:00
Robert Haas	e397d2ee64	Remove obsolete comment. In earlier versions of the sync rep patch, waiters removed themselves from the queue, but now walsender removes them before doing the wakeup. Report by Fujii Masao.	2011-03-10 15:00:20 -05:00
Robert Haas	6436098795	Minor sync rep corrections. Fujii Masao, with a bit of additional wordsmithing by me.	2011-03-10 14:57:02 -05:00
Itagaki Takahiro	2d8de0a50b	Cleanup copyright years and file names in the header comments of some files.	2011-03-10 15:05:33 +09:00
Tom Lane	a051ef699c	Remove collation information from TypeName, where it does not belong. The initial collations patch treated a COLLATE spec as part of a TypeName, following what can only be described as brain fade on the part of the SQL committee. It's a lot more reasonable to treat COLLATE as a syntactically separate object, so that it can be added in only the productions where it actually belongs, rather than needing to reject it in a boatload of places where it doesn't belong (something the original patch mostly failed to do). In addition this change lets us meet the spec's requirement to allow COLLATE anywhere in the clauses of a ColumnDef, and it avoids unfriendly behavior for constructs such as "foo::type COLLATE collation". To do this, pull collation information out of TypeName and put it in ColumnDef instead, thus reverting most of the collation-related changes in parse_type.c's API. I made one additional structural change, which was to use a ColumnDef as an intermediate node in AT_AlterColumnType AlterTableCmd nodes. This provides enough room to get rid of the "transform" wart in AlterTableCmd too, since the ColumnDef can carry the USING expression easily enough. Also fix some other minor bugs that have crept in in the same areas, like failure to copy recently-added fields of ColumnDef in copyfuncs.c. While at it, document the formerly secret ability to specify a collation in ALTER TABLE ALTER COLUMN TYPE, ALTER TYPE ADD ATTRIBUTE, and ALTER TYPE ALTER ATTRIBUTE TYPE; and correct some misstatements about what the default collation selection will be when COLLATE is omitted. BTW, the three-parameter form of format_type() should go away too, since it just contributes to the confusion in this area; but I'll do that in a separate patch.	2011-03-09 22:39:20 -05:00
Tom Lane	49a08ca1e9	Adjust the permissions required for COMMENT ON ROLE. Formerly, any member of a role could change the role's comment, as of course could superusers; but holders of CREATEROLE privilege could not, unless they were also members. This led to the odd situation that a CREATEROLE holder could create a role but then could not comment on it. It also seems a bit dubious to let an unprivileged user change his own comment, let alone those of group roles he belongs to. So, change the rule to be "you must be superuser to comment on a superuser role, or hold CREATEROLE to comment on non-superuser roles". This is the same as the privilege check for creating/dropping roles, and thus fits much better with the rule for other object types, namely that only the owner of an object can comment on it. In passing, clean up the documentation for COMMENT a little bit. Per complaint from Owen Jacobson and subsequent discussion.	2011-03-09 11:28:34 -05:00
Heikki Linnakangas	4cd3fb6e12	Truncate predicate lock manager's SLRU lazily at checkpoint. That's safer than doing it aggressively whenever the tail-XID pointer is advanced, because this way we don't need to do it while holding SerializableXactHashLock. This also fixes bug #5915 spotted by YAMAMOTO Takashi, and removes an obsolete comment spotted by Kevin Grittner.	2011-03-08 12:12:54 +02:00
Simon Riggs	dcfe3f60c1	Catversion increment for pg_stat_replication changes for syncrep	2011-03-06 23:44:44 +00:00
Simon Riggs	966fb05b58	Add new files for syncrep missed in previous commit	2011-03-06 23:39:14 +00:00
Simon Riggs	a8a8a3e096	Efficient transaction-controlled synchronous replication. If a standby is broadcasting reply messages and we have named one or more standbys in synchronous_standby_names then allow users who set synchronous_replication to wait for commit, which then provides strict data integrity guarantees. Design avoids sending and receiving transaction state information so minimises bookkeeping overheads. We synchronize with the highest priority standby that is connected and ready to synchronize. Other standbys can be defined to takeover in case of standby failure. This version has very strict behaviour; more relaxed options may be added at a later date. Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime Casanova, Heikki Linnakangas and Robert Haas, plus the assistance of many other design reviewers.	2011-03-06 22:49:16 +00:00
Tom Lane	bfd7f8cbb2	Make plpythonu language use plpython2 shared library directly. The original scheme for this was to symlink plpython.$DLSUFFIX to plpython2.$DLSUFFIX, but that doesn't work on Windows, and only accidentally failed to fail because of the way that CREATE LANGUAGE created or didn't create new C functions. My changes of yesterday exposed the weakness of that approach. To fix, get rid of the symlink and make pg_pltemplate show what's really going on.	2011-03-05 15:13:15 -05:00
Tom Lane	63b656b7bf	Create extension infrastructure for the core procedural languages. This mostly just involves creating control, install, and update-from-unpackaged scripts for them. However, I had to adjust plperl and plpython to not share the same support functions between variants, because we can't put the same function into multiple extensions. catversion bump forced due to new contents of pg_pltemplate, and because initdb now installs plpgsql as an extension not a bare language. Add support for regression testing these as extensions not bare languages. Fix a couple of other issues that popped up while testing this: my initial hack at pg_dump binary-upgrade support didn't work right, and we don't want an extra schema permissions test after all. Documentation changes still to come, but I'm committing now to see whether the MSVC build scripts need work (likely they do).	2011-03-04 21:51:14 -05:00
Peter Eisentraut	b9cff97fdf	Don't allow CREATE TABLE AS to create a column with invalid collation It is possible that an expression ends up with a collatable type but without a collation. CREATE TABLE AS could then create a table based on that. But such a column cannot be dumped with valid SQL syntax, so we disallow creating such a column. per test report from Noah Misch	2011-03-04 23:42:07 +02:00
Tom Lane	8d3b421f5f	Allow non-superusers to create (some) extensions. Remove the unconditional superuser permissions check in CREATE EXTENSION, and instead define a "superuser" extension property, which when false (not the default) skips the superuser permissions check. In this case the calling user only needs enough permissions to execute the commands in the extension's installation script. The superuser property is also enforced in the same way for ALTER EXTENSION UPDATE cases. In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs to insist on ownership of the target object as well; to do that without duplicating code, refactor comment.c's big switch for permissions checks into a separate function in objectaddress.c. I also removed the superuserness checks in pg_available_extensions and related functions; there's no strong reason why everybody shouldn't be able to see that info. Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that in pg_dump, so that dumps won't fail for installed-by-default extensions. We don't have any of those yet, but we will soon. This is all per discussion of wrapping the standard procedural languages into extensions. I'll make those changes in a separate commit; this is just putting the core infrastructure in place.	2011-03-04 16:08:53 -05:00
Tom Lane	908ab80286	Further refine patch for commenting operator implementation functions. Instead of manually maintaining the "implementation of XXX operator" comments in pg_proc.h, delete all those entries and let initdb create them via a join. To let initdb figure out which name to use when there is a conflict, change the comments for deprecated operators to say they are deprecated --- which seems like a good thing to do anyway.	2011-03-03 15:55:47 -05:00
Tom Lane	6252c4f9e2	Run a portal's cleanup hook immediately when pushing it to DONE state. This works around the problem noted by Yamamoto Takashi in bug #5906, that there were code paths whereby we could reach AtCleanup_Portals with a portal's cleanup hook still unexecuted. The changes I made a few days ago were intended to prevent that from happening, and I think that on balance it's still a good thing to avoid, so I don't want to remove the Assert in AtCleanup_Portals. Hence do this instead.	2011-03-03 13:04:06 -05:00
Tom Lane	94133a9354	Mark operator implementation functions as such in their comments. Historically, we've not had separate comments for built-in pg_operator entries, but relied on the comments for the underlying functions. The trouble with this approach is that there isn't much of anything to suggest to users that they'd be better off using the operators instead. So, move all the relevant comments into pg_operator, and give each underlying function a comment that just says "implementation of XXX operator". There are only about half a dozen cases where it seems reasonable to use the underlying function interchangeably with the operator; in these cases I left the same comment in place on the function as on the operator. While at it, establish a policy that every built-in function and operator entry should have a comment: there are now queries in the opr_sanity regression test that will complain if one doesn't. This only required adding a dozen or two more entries than would have been there anyway. I also spent some time trying to eliminate gratuitous inconsistencies in the style of the comments, though it's hopeless to suppose that more won't creep in soon enough. Per my proposal of 2010-10-15.	2011-03-03 01:34:17 -05:00
Heikki Linnakangas	6eba5a7c57	Change pg_last_xlog_receive_location() not to move backwards. That makes it a lot more useful for determining which standby is most up-to-date, for example. There was long discussions on whether overwriting existing existing WAL makes sense to begin with, and whether we should do some more extensive variable renaming, but this change nevertheless seems quite uncontroversial. Fujii Masao, reviewed by Jeff Janes, Robert Haas, Stephen Frost.	2011-03-01 20:54:35 +02:00
Heikki Linnakangas	47ad79122b	Fix bugs in Serializable Snapshot Isolation. Change the way UPDATEs are handled. Instead of maintaining a chain of tuple-level locks in shared memory, copy any existing locks on the old tuple to the new tuple at UPDATE. Any existing page-level lock needs to be duplicated too, as a lock on the new tuple. That was neglected previously. Store xmin on tuple-level predicate locks, to distinguish a lock on an old already-recycled tuple from a new tuple at the same physical location. Failure to distinguish them caused loops in the tuple-lock chains, as reported by YAMAMOTO Takashi. Although we don't use the chain representation of UPDATEs anymore, it seems like a good idea to store the xmin to avoid some false positives if no other reason. CheckSingleTargetForConflictsIn now correctly handles the case where a lock that's being held is not reflected in the local lock table. That happens if another backend acquires a lock on our behalf due to an UPDATE or a page split. PredicateLockPageCombine now retains locks for the page that is being removed, rather than removing them. This prevents a potentially dangerous false-positive inconsistency where the local lock table believes that a lock is held, but it is actually not. Dan Ports and Kevin Grittner	2011-03-01 19:05:16 +02:00
Tom Lane	c0b0076036	Rearrange snapshot handling to make rule expansion more consistent. With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. The patch removes pg_parse_and_rewrite(), which is no longer used anywhere. There was considerable discussion about more aggressive refactoring of the query-processing functions exported by postgres.c, but for the moment nothing more has been done there. I also took the opportunity to refactor snapmgr.c's API slightly: the former PushUpdatedSnapshot() has been split into two functions. Marko Tiikkaja, reviewed by Steve Singer and Tom Lane	2011-02-28 23:28:06 -05:00
Robert Haas	92c30fd2ed	Rename pg_stat_replication.apply_location to replay_location. For consistency with pg_last_xlog_replay_location. Per discussion.	2011-02-28 12:49:57 -05:00
Tom Lane	a874fe7b4c	Refactor the executor's API to support data-modifying CTEs better. The originally committed patch for modifying CTEs didn't interact well with EXPLAIN, as noted by myself, and also had corner-case problems with triggers, as noted by Dean Rasheed. Those problems show it is really not practical for ExecutorEnd to call any user-defined code; so split the cleanup duties out into a new function ExecutorFinish, which must be called between the last ExecutorRun call and ExecutorEnd. Some Asserts have been added to these functions to help verify correct usage. It is no longer necessary for callers of the executor to call AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now done by ExecutorStart/ExecutorFinish respectively. If you really need to suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to ExecutorStart. Also, refactor portal commit processing to allow for the possibility that PortalDrop will invoke user-defined code. I think this is not actually necessary just yet, since the portal-execution-strategy logic forces any non-pure-SELECT query to be run to completion before we will consider committing. But it seems like good future-proofing.	2011-02-27 13:44:12 -05:00
Tom Lane	389af95155	Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH. This patch implements data-modifying WITH queries according to the semantics that the updates all happen with the same command counter value, and in an unspecified order. Therefore one WITH clause can't see the effects of another, nor can the outer query see the effects other than through the RETURNING values. And attempts to do conflicting updates will have unpredictable results. We'll need to document all that. This commit just fixes the code; documentation updates are waiting on author. Marko Tiikkaja and Hitoshi Harada	2011-02-25 18:58:02 -05:00
Tom Lane	bdca82f44d	Add a relkind field to RangeTblEntry to avoid some syscache lookups. The recent additions for FDW support required checking foreign-table-ness in several places in the parse/plan chain. While it's not clear whether that would really result in a noticeable slowdown, it seems best to avoid any performance risk by keeping a copy of the relation's relkind in RangeTblEntry. That might have some other uses later, anyway. Per discussion.	2011-02-22 19:24:40 -05:00
Peter Eisentraut	1c51c7d5ff	Add PL/Python functions for quoting strings Add functions plpy.quote_ident, plpy.quote_literal, plpy.quote_nullable, which wrap the equivalent SQL functions. To be able to propagate char * constness properly, make the argument of quote_literal_cstr() const char *. This also makes it more consistent with quote_identifier(). Jan Urbański, reviewed by Hitoshi Harada, some refinements by Peter Eisentraut	2011-02-22 23:41:23 +02:00
Tom Lane	1ab9b012bd	Allow binary I/O of type "void". void_send is useful for the same reason that void_out doesn't throw error, namely that someone might do "select void_returning_func(...)" from a client that prefers to operate in binary mode. The void_recv function may or may not have any practical use, but we provide it for symmetry. Radosław Smogura	2011-02-22 13:08:22 -05:00
Tom Lane	2e852e541c	Remove ExecRemoveJunk(), which is no longer used anywhere. This was a leftover from the pre-8.1 design of junkfilters. It doesn't seem to have any reason to live, since it's merely a combination of two easy function calls, and not a well-designed combination at that (it encourages callers to leak the result tuple).	2011-02-21 21:41:08 -05:00
Tom Lane	a210be7720	Fix dangling-pointer problem in before-row update trigger processing. ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new tuple value by seeing if the returned tuple was pointer-equal to the old one. But the "old one" was in estate->es_junkFilter's result slot, which would be scribbled on if we had done an EvalPlanQual update in response to a concurrent update of the target tuple; therefore we were comparing a dangling pointer to a live one. Given the right set of circumstances we could get a false match, resulting in not forcing the tuple to be stored in the slot we thought it was stored in. In the case reported by Maxim Boguk in bug #5798, this led to "cannot extract system attribute from virtual tuple" failures when trying to do "RETURNING ctid". I believe there is a very-low-probability chance of more serious errors, such as generating incorrect index entries based on the original rather than the trigger-modified version of the row. In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers, ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to have similar APIs. In the back branches I just changed ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers case.	2011-02-21 21:19:50 -05:00
Itagaki Takahiro	3cba8240a1	Add ENCODING option to COPY TO/FROM and file_fdw. File encodings can be specified separately from client encoding. If not specified, client encoding is used for backward compatibility. Cases when the encoding doesn't match client encoding are slower than matched cases because we don't have conversion procs for other encodings. Performance improvement would be be a future work. Original patch by Hitoshi Harada, and modified by me.	2011-02-21 14:32:40 +09:00
Tom Lane	7c5d0ae707	Add contrib/file_fdw foreign-data wrapper for reading files via COPY. This is both very useful in its own right, and an important test case for the core FDW support. This commit includes a small refactoring of copy.c to expose its option checking code as a separately callable function. The original patch submission duplicated hundreds of lines of that code, which seemed pretty unmaintainable. Shigeru Hanada, reviewed by Itagaki Takahiro and Tom Lane	2011-02-20 14:06:59 -05:00
Tom Lane	bb74240794	Implement an API to let foreign-data wrappers actually be functional. This commit provides the core code and documentation needed. A contrib module test case will follow shortly. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas	2011-02-20 00:18:14 -05:00
Tom Lane	327e025071	Create the catalog infrastructure for foreign-data-wrapper handlers. Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands, plus pg_dump support for same. Also invent a new pseudotype fdw_handler with properties similar to language_handler. This is split out of the "FDW API" patch for ease of review; it's all stuff we will certainly need, regardless of any other details of the FDW API. FDW handler functions will not actually get called yet. In passing, fix some omissions and infelicities in foreigncmds.c. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas	2011-02-19 00:07:15 -05:00
Simon Riggs	06828c5feb	Separate messages for standby replies and hot standby feedback. Allow messages to be sent at different times, and greatly reduce the frequency of hot standby feedback. Refactor to allow additional message types.	2011-02-18 11:31:49 +00:00
Itagaki Takahiro	62c7bd31c8	Add transaction-level advisory locks. They share the same locking namespace with the existing session-level advisory locks, but they are automatically released at the end of the current transaction and cannot be released explicitly via unlock functions. Marko Tiikkaja, reviewed by me.	2011-02-18 14:05:12 +09:00
Tom Lane	52b60530f2	Fix tsmatchsel() to account properly for null rows. ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh	2011-02-17 19:00:49 -05:00
Robert Haas	4a25bc145a	Add client_hostname field to pg_stat_activity. Peter Eisentraut, reviewed by Steve Singer, Alvaro Herrera, and me.	2011-02-17 16:03:28 -05:00
Tom Lane	a2095f7fb5	Fix bogus test for hypothetical indexes in get_actual_variable_range(). That function was supposing that indexoid == 0 for a hypothetical index, but that is not likely to be true in any non-toy implementation of an index adviser, since assigning a fake OID is the only way to know at EXPLAIN time which hypothetical index got selected. Fix by adding a flag to IndexOptInfo to mark hypothetical indexes. Back-patch to 9.0 where get_actual_variable_range() was added. Gurjeet Singh	2011-02-16 19:24:45 -05:00
Tom Lane	6595dd04d1	Add backwards-compatible declarations of some core GIN support functions. These are needed to support reloading dumps of 9.0 installations containing contrib/intarray or contrib/tsearch2. Since not only regular dump/reload but binary upgrade would fail, it seems worth the trouble to carry these stubs for awhile. Note that the contrib opclasses referencing these functions will still work fine, since GIN doesn't actually pay any attention to the declared signature of a support function.	2011-02-16 17:24:46 -05:00
Simon Riggs	bca8b7f16a	Hot Standby feedback for avoidance of cleanup conflicts on standby. Standby optionally sends back information about oldestXmin of queries which is then checked and applied to the WALSender's proc->xmin. GetOldestXmin() is modified slightly to agree with GetSnapshotData(), so that all backends on primary include WALSender within their snapshots. Note this does nothing to change the snapshot xmin on either master or standby. Feedback piggybacks on the standby reply message. vacuum_defer_cleanup_age is no longer used on standby, though parameter still exists on primary, since some use cases still exist. Simon Riggs, review comments from Fujii Masao, Heikki Linnakangas, Robert Haas	2011-02-16 19:29:37 +00:00
Tom Lane	6e02755b22	Add FOREACH IN ARRAY looping to plpgsql. (I'm not entirely sure that we've finished bikeshedding the syntax details, but the functionality seems OK.) Pavel Stehule, reviewed by Stephen Frost and Tom Lane	2011-02-16 01:53:03 -05:00
Robert Haas	4695da5ae9	pg_ctl promote Fujii Masao, reviewed by Robert Haas, Stephen Frost, and Magnus Hagander.	2011-02-15 21:30:23 -05:00
Itagaki Takahiro	8ddc05fb01	Export the external file reader used in COPY FROM as APIs. They are expected to be used by extension modules like file_fdw. There are no user-visible changes. Itagaki Takahiro Reviewed and tested by Kevin Grittner and Noah Misch.	2011-02-16 11:19:11 +09:00
Tom Lane	887dd041a6	Fix obsolete comment. Comment about MaxAllocSize was not updated when the TOAST-header macros were replaced in 8.3 "varvarlena" changes. Per report from Frederik Ramm.	2011-02-15 13:27:54 -05:00
Tom Lane	555353c0c5	Rearrange extension-related views as per recent discussion. The original design of pg_available_extensions did not consider the possibility of version-specific control files. Split it into two views: pg_available_extensions shows information that is generic about an extension, while pg_available_extension_versions shows all available versions together with information that could be version-dependent. Also, add an SRF pg_extension_update_paths() to assist in checking that a collection of update scripts provide sane update path sequences.	2011-02-14 19:22:36 -05:00
Bruce Momjian	0de0cc150a	Properly handle Win32 paths of 'E:abc', which can be either absolute or relative, by creating a function path_is_relative_and_below_cwd() to check for specific requirements. It is unclear if this fixes a security problem or not but the new code is more robust.	2011-02-12 09:47:51 -05:00
Peter Eisentraut	b313bca0af	DDL support for collations - collowner field - CREATE COLLATION - ALTER COLLATION - DROP COLLATION - COMMENT ON COLLATION - integration with extensions - pg_dump support for the above - dependency management - psql tab completion - psql \dO command	2011-02-12 15:55:18 +02:00
Tom Lane	1214749901	Add support for multiple versions of an extension and ALTER EXTENSION UPDATE. This follows recent discussions, so it's quite a bit different from Dimitri's original. There will probably be more changes once we get a bit of experience with it, but let's get it in and start playing with it. This is still just core code. I'll start converting contrib modules shortly. Dimitri Fontaine and Tom Lane	2011-02-11 21:25:57 -05:00
Robert Haas	2c20ba1fd2	Tweak find_composite_type_dependencies API a bit more. Per discussion with Noah Misch, the previous coding, introduced by my commit `65377e0b9c` on 2011-02-06, was really an abuse of RELKIND_COMPOSITE_TYPE, since the caller in typecmds.c is actually passing the name of a domain. So go back having a type name argument, but make the first argument a Relation rather than just a string so we can tell whether it's a table or a foreign table and emit the proper error message.	2011-02-11 08:47:38 -05:00
Tom Lane	01467d3e4f	Extend "ALTER EXTENSION ADD object" to permit "DROP object" as well. Per discussion, this is something we should have sooner rather than later, and it doesn't take much additional code to support it.	2011-02-10 17:37:22 -05:00
Heikki Linnakangas	b186523fd9	Send status updates back from standby server to master, indicating how far the standby has written, flushed, and applied the WAL. At the moment, this is for informational purposes only, the values are only shown in pg_stat_replication system view, but in the future they will also be needed for synchronous replication. Extracted from Simon riggs' synchronous replication patch by Robert Haas, with some tweaking by me.	2011-02-10 21:04:02 +02:00
Magnus Hagander	4c468b37a2	Track last time for statistics reset on databases and bgwriter Tracks one counter for each database, which is reset whenever the statistics for any individual object inside the database is reset, and one counter for the background writer. Tomas Vondra, reviewed by Greg Smith	2011-02-10 15:14:04 +01:00
Tom Lane	e617f0d7e4	Fix improper matching of resjunk column names for FOR UPDATE in subselect. Flattening of subquery range tables during setrefs.c could lead to the rangetable indexes in PlanRowMark nodes not matching up with the column names previously assigned to the corresponding resjunk ctid (resp. tableoid or wholerow) columns. Typical symptom would be either a "cannot extract system attribute from virtual tuple" error or an Assert failure. This wasn't a problem before 9.0 because we didn't support FOR UPDATE below the top query level, and so the final flattening could never renumber an RTE that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique number for each PlanRowMark to label the associated resjunk columns, so that the number need not change during flattening. Per report from David Johnston (though I'm darned if I can see how this got past initial testing of the relevant code). Back-patch to 9.0.	2011-02-09 23:27:42 -05:00
Tom Lane	caddcb8f4b	Fix pg_upgrade to handle extensions. This follows my proposal of yesterday, namely that we try to recreate the previous state of the extension exactly, instead of allowing CREATE EXTENSION to run a SQL script that might create some entirely-incompatible on-disk state. In --binary-upgrade mode, pg_dump won't issue CREATE EXTENSION at all, but instead uses a kluge function provided by pg_upgrade_support to recreate the pg_extension row (and extension-level pg_depend entries) without creating any member objects. The member objects are then restored in the same way as if they weren't members, in particular using pg_upgrade's normal hacks to preserve OIDs that need to be preserved. Then, for each member object, ALTER EXTENSION ADD is issued to recreate the pg_depend entry that marks it as an extension member. In passing, fix breakage in pg_upgrade's enum-type support: somebody didn't fix it when the noise word VALUE got added to ALTER TYPE ADD. Also, rationalize parsetree representation of COMMENT ON DOMAIN and fix get_object_address() to allow OBJECT_DOMAIN.	2011-02-09 19:18:08 -05:00
Peter Eisentraut	2e2d56fea9	Information schema views for collation support Add the views character_sets, collations, and collation_character_set_applicability.	2011-02-09 23:26:48 +02:00
Tom Lane	5bc178b89f	Implement "ALTER EXTENSION ADD object". This is an essential component of making the extension feature usable; first because it's needed in the process of converting an existing installation containing "loose" objects of an old contrib module into the extension-based world, and second because we'll have to use it in pg_dump --binary-upgrade, as per recent discussion. Loosely based on part of Dimitri Fontaine's ALTER EXTENSION UPGRADE patch.	2011-02-09 11:56:37 -05:00
Magnus Hagander	3144c33a2f	Implement NOWAIT option for BASE_BACKUP command Specifying this option makes the server not wait for the xlog to be archived, or emit a warning that it can't, instead leaving the responsibility with the client. This is useful when the log is being streamed using the streaming protocol in parallel with the backup, without having log archiving enabled.	2011-02-09 10:59:53 +01:00
Tom Lane	d9572c4e3b	Core support for "extensions", which are packages of SQL objects. This patch adds the server infrastructure to support extensions. There is still one significant loose end, namely how to make it play nice with pg_upgrade, so I am not yet committing the changes that would make all the contrib modules depend on this feature. In passing, fix a disturbingly large amount of breakage in AlterObjectNamespace() and callers. Dimitri Fontaine, reviewed by Anssi Kääriäinen, Itagaki Takahiro, Tom Lane, and numerous others	2011-02-08 16:13:22 -05:00
Peter Eisentraut	414c5a2ea6	Per-column collation support This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch	2011-02-08 23:04:18 +02:00
Simon Riggs	7a7d36ec33	Continue long tradition of bumping the catalog version a little late.	2011-02-08 19:44:50 +00:00
Simon Riggs	c016ce7281	Named restore points in recovery. Users can record named points, then new recovery.conf parameter recovery_target_name allows PITR to specify named points as recovery targets. Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits	2011-02-08 19:39:08 +00:00
Simon Riggs	8c6e3adbf7	Basic Recovery Control functions for use in Hot Standby. Pause, Resume, Status check functions only. Also, new recovery.conf parameter to pause_at_recovery_target, default on. Simon Riggs, reviewed by Fujii Masao	2011-02-08 18:30:22 +00:00
Heikki Linnakangas	f9f9d696a9	UINT64_MAX isn't defined on MSVC.	2011-02-08 18:15:53 +02:00
Simon Riggs	722bf7017b	Extend ALTER TABLE to allow Foreign Keys to be added without initial validation. FK constraints that are marked NOT VALID may later be VALIDATED, which uses an ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced table. Significantly reduces lock strength and duration when adding FKs. New state visible from psql. Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas	2011-02-08 12:23:20 +00:00
Robert Haas	32896c40ca	Avoid having autovacuum workers wait for relation locks. Waiting for relation locks can lead to starvation - it pins down an autovacuum worker for as long as the lock is held. But if we're doing an anti-wraparound vacuum, then we still wait; maintenance can no longer be put off. To assist with troubleshooting, if log_autovacuum_min_duration >= 0, we log whenever an autovacuum or autoanalyze is skipped for this reason. Per a gripe by Josh Berkus, and ensuing discussion.	2011-02-07 22:04:29 -05:00
Heikki Linnakangas	47082fa875	Oops, forgot to bump catversion in the Serializable Snapshot Isolation patch. I thought we didn't need that, but then I remembered that it added a new SLRU subdirectory, pg_serial. While we're at it, document what pg_serial is.	2011-02-08 00:24:23 +02:00
Heikki Linnakangas	dafaa3efb7	Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen	2011-02-08 00:09:08 +02:00
Robert Haas	65377e0b9c	Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks. If the foreign table's rowtype is being used as the type of a column in another table, we can't just up and change its data type. This was already checked for composite types and ordinary tables, but we previously failed to enforce it for foreign tables.	2011-02-06 00:26:27 -05:00
Robert Haas	356f2cbbb4	Make handling of errcodes.h more consistent with other generated headers. This fixes make distprep, and seems more robust in other ways as well. Some special handling is required because errcodes.txt is needed by some stuff in src/port, but just by src/backend as is the case for the other generated headers. While I'm at it, fix a few other things that were overlooked in the original patch.	2011-02-04 09:29:10 -05:00
Robert Haas	ddfe26f644	Avoid maintaining three separate copies of the error codes list. src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a big chunk of errcodes.sgml are now automatically generated from a single file, src/backend/utils/errcodes.txt. Jan Urbański, reviewed by Tom Lane.	2011-02-03 22:32:49 -05:00
Bruce Momjian	35b0a6b205	Simplify code used in is_absolute_path() macro; also add comment about 'E:abc' Win32 path handling.	2011-02-03 10:47:06 -05:00
Bruce Momjian	426227850b	Rename function to first_path_var_separator() to clarify it works with path variables, not directory paths.	2011-02-02 22:49:54 -05:00
Peter Eisentraut	15f55cc38a	Add validator to PL/Python Jan Urbański, reviewed by Hitoshi Harada	2011-02-01 22:55:04 +02:00
Magnus Hagander	5273f21434	Undefine setlocale() macro on Win32 New versions of libintl redefine setlocale() to a macro which causes problems when the backend and libintl are linked against different versions of the runtime, which is often the case in msvc builds. Hiroshi Inoue, slightly updated comment by me	2011-02-01 13:19:18 +01:00
Simon Riggs	56b21b7ae3	Re-classify ERRCODE_DATABASE_DROPPED to 57P04	2011-02-01 08:44:01 +00:00
Simon Riggs	9e95c9ad55	Create new errcode for recovery conflict caused by db drop on master. Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change. Unlikely to happen on most servers, so low impact change to allow session poolers to correctly handle this situation. Tatsuo Ishii, edits by me, review by Robert Haas	2011-02-01 00:20:53 +00:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Andrew Dunstan	48c9de8028	Fix typo	2011-01-30 20:34:05 -05:00
Andrew Dunstan	91812df4ed	Enable building with the Mingw64 compiler. This can be used to build 64 bit Windows binaries, not only on 64 bit Windows but on supported cross-compiling hosts including 32 bit Windows, Cygwin, Darwin and Linux.	2011-01-30 19:56:46 -05:00
Magnus Hagander	507069de6d	Add option to include WAL in base backup When included, this makes the base backup a complete working "clone" of the initial database, ready to have a postmaster started against it without the need to set up any log archiving or similar. Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas	2011-01-30 21:30:09 +01:00
Peter Eisentraut	6fe5e4e63e	autoreconf Synchronize pg_config.h.in with configure.in (someone must have forgotten to run autoheader or autoreconf), and clean up some spurious change in configure introduced by the last commit there.	2011-01-27 01:19:45 +02:00
Tom Lane	bd1ad1b019	Replace pg_class.relhasexclusion with pg_index.indisexclusion. There isn't any need to track this state on a table-wide basis, and trying to do so introduces undesirable semantic fuzziness. Move the flag to pg_index, where it clearly describes just a single index and can be immutable after index creation.	2011-01-25 17:51:59 -05:00
Tom Lane	88452d5ba6	Implement ALTER TABLE ADD UNIQUE/PRIMARY KEY USING INDEX. This feature allows a unique or pkey constraint to be created using an already-existing unique index. While the constraint isn't very functionally different from the bare index, it's nice to be able to do that for documentation purposes. The main advantage over just issuing a plain ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with CREATE INDEX CONCURRENTLY, so that there is not a long interval where the table is locked against updates. On the way, refactor some of the code in DefineIndex() and index_create() so that we don't have to pass through those functions in order to create the index constraint's catalog entries. Also, in parse_utilcmd.c, pass around the ParseState pointer in struct CreateStmtContext to save on notation, and add error location pointers to some error reports that didn't have one before. Gurjeet Singh, reviewed by Steve Singer and Tom Lane	2011-01-25 15:43:05 -05:00
Magnus Hagander	e5487f65fd	Make walsender options order-independent While doing this, also move base backup options into a struct instead of increasing the number of parameters to multiple functions for each new option.	2011-01-23 23:39:18 +01:00
Magnus Hagander	048d148fe6	Add pg_basebackup tool for streaming base backups This tool makes it possible to do the pg_start_backup/ copy files/pg_stop_backup step in a single command. There are still some steps to be done before this is a complete backup solution, such as the ability to stream the required WAL logs, but it's still usable, and could do with some buildfarm coverage. In passing, make the checkpoint request optionally fast instead of hardcoding it. Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine	2011-01-23 12:21:23 +01:00
Robert Haas	6f59777c65	Code cleanup for assign_transaction_read_only. As in commit `fb4c5d2798` on 2011-01-21, this avoids spurious debug messages and allows idempotent changes at any time. Along the way, make assign_XactIsoLevel allow idempotent changes even when not within a subtransaction, to be consistent with the new coding of assign_transaction_read_only and because there's no compelling reason to do otherwise. Kevin Grittner, with some adjustments.	2011-01-22 20:55:50 -05:00
Robert Haas	fb4c5d2798	Code cleanup for assign_XactIsoLevel. The new coding avoids a spurious debug message when a transaction that has changed the isolation level has been rolled back. It also allows the property to be freely changed to the current value within a subtransaction. Kevin Grittner, with one small change by me.	2011-01-21 21:49:19 -05:00
Robert Haas	8ceb245680	Make ALTER TABLE revalidate uniqueness and exclusion constraints. Failure to do so can lead to constraint violations. This was broken by commit `1ddc2703a9` on 2010-02-07, so back-patch to 9.0. Noah Misch. Regression test by me.	2011-01-20 22:44:10 -05:00
Tom Lane	6ca452ba7f	Move a couple of declarations to reflect where the routines really are.	2011-01-15 16:09:05 -05:00
Heikki Linnakangas	8f5d65e916	Treat a WAL sender process that hasn't started streaming yet as a regular backend, as far as the postmaster shutdown logic is concerned. That means, fast shutdown will wait for WAL sender processes to exit before signaling bgwriter to finish. This avoids race conditions between a base backup stopping or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't want e.g the end-of-backup WAL record to be written after the shutdown checkpoint.	2011-01-15 16:38:21 +02:00
Magnus Hagander	fcd810c69a	Use a lexer and grammar for parsing walsender commands Makes it easier to parse mainly the BASE_BACKUP command with it's options, and avoids having to manually deal with quoted identifiers in the label (previously broken), and makes it easier to add new commands and options in the future. In passing, refactor the case statement in the walsender to put each command in it's own function.	2011-01-14 16:30:33 +01:00
Magnus Hagander	688423d004	Exit from base backups when shutdown is requested When the exit waits until the whole backup completes, it may take a very long time. In passing, add back an error check in the main loop so we detect clients that disconnect much earlier if the backup is large.	2011-01-14 12:36:45 +01:00
Tom Lane	52948169bc	Code review for postmaster.pid contents changes. Fix broken test for pre-existing postmaster, caused by wrong code for appending lines to the lockfile; don't write a failed listen_address setting into the lockfile; don't arbitrarily change the location of the data directory in the lockfile compared to previous releases; provide more consistent and useful definitions of the socket path and listen_address entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as the postmaster; assorted code style improvements.	2011-01-13 19:01:28 -05:00
Tom Lane	d487afbb81	Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly. In an inherited UPDATE/DELETE, each target table has its own subplan, because it might have a column set different from other targets. This means that the resjunk columns we add to support EvalPlanQual might be at different physical column numbers in each subplan. The EvalPlanQual rewrite I did for 9.0 failed to account for this, resulting in possible misbehavior or even crashes during concurrent updates to the same row, as seen in a recent report from Gordon Shannon. Revise the data structure so that we track resjunk column numbers separately for each subplan. I also chose to move responsibility for identifying the physical column numbers back to executor startup, instead of assuming that numbers derived during preprocess_targetlist would stay valid throughout subsequent massaging of the plan. That's a bit slower, so we might want to consider undoing it someday; but it would complicate the patch considerably and didn't seem justifiable in a bug fix that has to be back-patched to 9.0.	2011-01-12 20:47:02 -05:00
Magnus Hagander	4c8e20f815	Track walsender state in shared memory and expose in pg_stat_replication	2011-01-11 21:25:28 +01:00
Magnus Hagander	0eb59c4591	Backend support for streaming base backups Add BASE_BACKUP command to walsender, allowing it to stream a base backup to the client (in tar format). The syntax is still far from ideal, that will be fixed in the switch to use a proper grammar for walsender. No client included yet, will come as a separate commit. Magnus Hagander and Heikki Linnakangas	2011-01-10 14:04:19 +01:00
Magnus Hagander	4448917d51	Split pg_start_backup() and pg_stop_backup() into two pieces Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.	2011-01-09 21:00:28 +01:00
Tom Lane	52fd2d65a3	Fix up core tsquery GIN support for new extractQuery API. No need for the empty-prefix-match kluge to force a full scan anymore.	2011-01-09 14:34:50 -05:00
Magnus Hagander	db4d22d0ef	Add pgreadlink() on Windows to read junction points Add support for reading back information about the symbolic links we've created with pgsymlink(), which are actually Junction Points. Just like pgsymlink() can only create directory symlinks, pgreadlink() can only read directory symlinks.	2011-01-09 15:09:19 +01:00
Tom Lane	adf328c0e1	Add array_contains_nulls() function in arrayfuncs.c. This will support fixing contrib/intarray (and probably other places) so that they don't have to fail on arrays that contain a null bitmap but no live null entries.	2011-01-08 20:26:14 -05:00
Tom Lane	7e2f906201	Remove pg_am.amindexnulls. The only use we have had for amindexnulls is in determining whether an index is safe to cluster on; but since the addition of the amclusterable flag, that usage is pretty redundant. In passing, clean up assorted sloppiness from the last patch that touched pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.	2011-01-08 16:08:05 -05:00
Tom Lane	56a57473a9	Refactor GIN's handling of duplicate search entries. The original coding could combine duplicate entries only when they originated from the same qual condition. In particular it could not combine cases where multiple qual conditions all give rise to full-index scan requests, which is an expensive case well worth optimizing. Refactor so that duplicates are recognized across all the quals.	2011-01-08 14:48:08 -05:00
Tom Lane	a032d50128	Fix the built-in GIN support procedure declarations in pg_proc.h. Add more "internal" arguments so that these pg_proc entries reflect the current preferred API. This is purely a cosmetic change, since GIN doesn't actually consult the pg_proc entry when calling a support function. Accordingly, no catversion bump.	2011-01-07 20:40:48 -05:00
Tom Lane	73912e7fbd	Fix GIN to support null keys, empty and null items, and full index scans. Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.	2011-01-07 19:16:24 -05:00
Robert Haas	9b4271deb9	Document pg_stat_replication, bump catversion since that was overlooked. Itagaki Takahiro, edited by me.	2011-01-07 11:06:55 -05:00
Itagaki Takahiro	a755ea33ae	New system view pg_stat_replication displays activity of wal sender processes. Itagaki Takahiro and Simon Riggs.	2011-01-07 20:35:38 +09:00
Magnus Hagander	66a8a0428d	Give superusers REPLIACTION permission by default This can be overriden by using NOREPLICATION on the CREATE ROLE statement, but by default they will have it, making it backwards compatible and "less surprising" (given that superusers normally override all checks).	2011-01-05 14:24:17 +01:00
Robert Haas	7f60be72b0	Fix crash in ALTER OPERATOR CLASS/FAMILY .. SET SCHEMA. In the previous coding, the parser emitted a List containing a C string, which is no good, because copyObject() can't handle it. Dimitri Fontaine	2011-01-03 22:08:55 -05:00
Magnus Hagander	77745cc7f1	Bump catversion, forgot in previous commit.	2011-01-03 12:50:30 +01:00
Magnus Hagander	40d9e94bd7	Add views and functions to monitor hot standby query conflicts Add the view pg_stat_database_conflicts and a column to pg_stat_database, and the underlying functions to provide the information.	2011-01-03 12:46:03 +01:00
Peter Eisentraut	39b8843296	Implement remaining fields of information_schema.sequences view Add new function pg_sequence_parameters that returns a sequence's start, minimum, maximum, increment, and cycle values, and use that in the view. (bug #5662; design suggestion by Tom Lane) Also slightly adjust the view's column order and permissions after review of SQL standard.	2011-01-02 15:15:21 +02:00
Robert Haas	0d692a0dc9	Basic foreign table support. Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas	2011-01-01 23:48:11 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Bruce Momjian	30aeda4394	Include the first valid listen address in pg_ctl to improve server start "wait" detection and add postmaster start time to help determine if the postmaster is actually using the specified data directory.	2010-12-31 17:25:02 -05:00
Tom Lane	7b46401557	Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c. There's no reason for these values to be known anywhere else. After doing this, executor/execdefs.h is vestigial and can be removed.	2010-12-30 22:12:40 -05:00
Tom Lane	f4e4b32743	Support RIGHT and FULL OUTER JOIN in hash joins. This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.	2010-12-30 20:26:08 -05:00
Alvaro Herrera	55573990ca	Avoid unnecessary public struct declaration in slru.h Instead, declare a public wrapper of the sole function using it for external callers, so that they don't have to always pass a NULL argument. Author: Kevin Grittner	2010-12-30 12:09:17 -03:00
Robert Haas	d2bc1c9907	Bump XLOG_PAGE_MAGIC. The unlogged tables patch (commit `53dbc27c62`, 2010-12-29) should have done this, since it changes the format of an XLOG_SMGR_CREATE record.	2010-12-29 07:19:21 -05:00
Robert Haas	53dbc27c62	Support unlogged tables. The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.	2010-12-29 06:48:53 -05:00
Magnus Hagander	9b8aff8c19	Add REPLICATION privilege for ROLEs This privilege is required to do Streaming Replication, instead of superuser, making it possible to set up a SR slave that doesn't have write permissions on the master. Superuser privileges do NOT override this check, so in order to use the default superuser account for replication it must be explicitly granted the REPLICATION permissions. This is backwards incompatible change, in the interest of higher default security.	2010-12-29 11:05:03 +01:00
Tom Lane	f2ba1e994c	Avoid unexpected conversion overflow in planner for distant date values. The "date" type supports a wider range of dates than int64 timestamps do. However, there is pre-int64-timestamp code in the planner that assumes that all date values can be converted to timestamp with impunity. Fortunately, what we really need out of the conversion is always a double (float8) value; so even when the date is out of timestamp's range it's possible to produce a sane answer. All we need is a code path that doesn't try to force the result into int64. Per trouble report from David Rericha. Back-patch to all supported versions. Although this is surely a corner case, there's not much point in advertising a date range wider than timestamp's if we will choke on such values in unexpected places.	2010-12-28 22:49:57 -05:00
Tom Lane	84fc571395	Rename the C functions bitand(), bitor() to bit_and(), bit_or(). This is to avoid use of the C++ keywords "bitand" and "bitor" in the header file utils/varbit.h. Note the functions' SQL-level names are not changed, only their C-level names. In passing, make some comments in varbit.c conform to project-standard layout.	2010-12-27 14:57:41 -05:00
Tom Lane	37b61a69f3	Fix failure of executor/hashjoin.h to compile standalone. Noted while experimenting with cpluspluscheck.	2010-12-27 12:20:09 -05:00
Tom Lane	275411912d	Fix ill-chosen use of "private" as an argument and struct field name. "private" is a keyword in C++, so this breaks the poorly-enforced policy that header files should be include-able in C++ code. Per report from Craig Ringer and some investigation with cpluspluscheck.	2010-12-27 11:26:19 -05:00
Robert Haas	63676ebff4	Corrections to patch adding SQL/MED error codes. My previous commit, `85cff3ce7f` on 2010-12-25, failed to update errcodes.sgml or plerrcodes.h. This patch corrects that oversight, per a gripe from Tom Lane, and also corrects a typographical error.	2010-12-26 21:35:25 -05:00
Andrew Dunstan	a534728afb	Only build in crashdump support on Windows if there's a working dbghelp.h.	2010-12-26 10:34:47 -05:00
Robert Haas	85cff3ce7f	Add foreign data wrapper error code values for SQL/MED. Extracted from a much larger patch by Shigeru Hanada.	2010-12-25 13:57:39 -05:00
Heikki Linnakangas	9de3aa65f0	Rewrite the GiST insertion logic so that we don't need the post-recovery cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.	2010-12-23 16:21:47 +02:00
Magnus Hagander	dcb09b595f	Support for collecting crash dumps on Windows Add support for collecting "minidump" style crash dumps on Windows, by setting up an exception handling filter. Crash dumps will be generated in PGDATA/crashdumps if the directory is created (the existance of the directory is used as on/off switch for the generation of the dumps). Craig Ringer and Magnus Hagander	2010-12-19 16:45:28 +01:00
Tom Lane	61b53695fb	Remove optreset from src/port/ implementations of getopt and getopt_long. We don't actually need optreset, because we can easily fix the code to ensure that it's cleanly restartable after having completed a scan over the argv array; which is the only case we need to restart in. Getting rid of it avoids a class of interactions with the system libraries and allows reversion of my change of yesterday in postmaster.c and postgres.c. Back-patch to 8.4. Before that the getopt code was a bit different anyway.	2010-12-16 16:23:05 -05:00
Itagaki Takahiro	03db44eae3	Add pg_read_binary_file() and whole-file-at-once versions of pg_read_file(). One of the usages of the binary version is to read files in a different encoding from the server encoding. Dimitri Fontaine and Itagaki Takahiro.	2010-12-16 06:56:28 +09:00
Robert Haas	34c70c7ac4	Instrument checkpoint sync calls. Greg Smith, reviewed by Jeff Janes	2010-12-14 09:26:19 -05:00
Robert Haas	d368e1a2a7	Allow plugins to suppress inlining and hook function entry/exit/abort. This is intended as infrastructure to allow an eventual SE-Linux plugin to support trusted procedures. KaiGai Kohei	2010-12-13 19:15:53 -05:00
Robert Haas	5f7b58fad8	Generalize concept of temporary relations to "relation persistence". This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.	2010-12-13 12:34:26 -05:00
Tom Lane	5132ad8bdf	Make S_IRGRP etc available in mingw builds as well as MSVC. (Hm, I wonder whether BCC defines them either...) Also label dangling endifs a bit better in this area.	2010-12-12 13:43:44 -05:00
Tom Lane	1319002e2e	Provide a complete set of file-permission-bit macros in win32.h. My previous patch exposed the fact that we didn't have these. Those hard-wired octal constants were actually wrong on Windows, not just inconsistent.	2010-12-11 13:11:18 -05:00
Robert Haas	d3d414696f	Allow bidirectional copy messages in streaming replication mode. Fujii Masao. Review by Alvaro Herrera, Tom Lane, and myself.	2010-12-11 09:27:37 -05:00
Tom Lane	671199929d	Move a couple of initdb's subroutines into src/port/. mkdir_p and check_data_dir will be useful in CREATE TABLESPACE, since we have agreed that that command should handle subdirectory creation just like initdb creates the PGDATA directory. Push them into src/port/ so that they are available to both initdb and the backend. Rename to pg_mkdir_p and pg_check_dir, just to be on the safe side. Add FreeBSD's copyright notice to pgmkdirp.c, since that's where the code came from originally (this really should have been in initdb.c). Very marginal code/comment cleanup.	2010-12-10 19:42:44 -05:00
Tom Lane	576477e73c	Force default wal_sync_method to be fdatasync on Linux. Recent versions of the Linux system header files cause xlogdefs.h to believe that open_datasync should be the default sync method, whereas formerly fdatasync was the default on Linux. open_datasync is a bad choice, first because it doesn't actually outperform fdatasync (in fact the reverse), and second because we try to use O_DIRECT with it, causing failures on certain filesystems (e.g., ext4 with data=journal option). This part of the patch is largely per a proposal from Marti Raudsepp. More extensive changes are likely to follow in HEAD, but this is as much change as we want to back-patch. Also clean up confusing code and incorrect documentation surrounding the fsync_writethrough option. Those changes shouldn't result in any actual behavioral change, but I chose to back-patch them anyway to keep the branches looking similar in this area. In 9.0 and HEAD, also do some copy-editing on the WAL Reliability documentation section. Back-patch to all supported branches, since any of them might get used on modern Linux versions.	2010-12-08 20:01:09 -05:00
Simon Riggs	e620ee35b2	Optimize commit_siblings in two ways to improve group commit. First, avoid scanning the whole ProcArray once we know there are at least commit_siblings active; second, skip the check altogether if commit_siblings = 0. Greg Smith	2010-12-08 18:48:03 +00:00
Heikki Linnakangas	5a031a5556	Fix bugs in the hot standby known-assigned-xids tracking logic. If there's an old transaction running in the master, and a lot of transactions have started and finished since, and a WAL-record is written in the gap between the creating the running-xacts snapshot and WAL-logging it, recovery will fail with "too many KnownAssignedXids" error. This bug was reported by Joachim Wieland on Nov 19th. In the same scenario, when fewer transactions have started so that all the xids fit in KnownAssignedXids despite the first bug, a more serious bug arises. We incorrectly initialize the clog code with the oldest still running transaction, and when we see the WAL record belonging to a transaction with an XID larger than one that committed already before the checkpoint we're recovering from, we zero the clog page containing the already committed transaction, leading to data loss. In hindsight, trying to track xids in the known-assigned-xids array before seeing the running-xacts record was too complicated. To fix that, hold XidGenLock while the running-xacts snapshot is taken and WAL-logged. That ensures that no transaction can begin or end in that gap, so that in recvoery we know that the snapshot contains all transactions running at that point in WAL.	2010-12-07 09:23:30 +01:00
Tom Lane	e194a942f9	Update comment to match later code changes.	2010-12-04 03:21:49 -05:00
Tom Lane	554506871b	KNNGIST, otherwise known as order-by-operator support for GIST. This commit represents a rather heavily editorialized version of Teodor's builtin_knngist_itself-0.8.2 and builtin_knngist_proc-0.8.1 patches. I redid the opclass API to add a separate Distance method instead of turning the Consistent method into an illogical mess, fixed some bit-rot in the rbtree interfaces, and generally worked over the code style and comments. There's still no non-code documentation to speak of, but I'll work on that separately. Some contrib-module changes are also yet to come (right now, point <-> point is the only KNN-ified operator). Teodor Sigaev and Tom Lane	2010-12-03 20:53:29 -05:00
Robert Haas	970a18687f	Use GUC lexer for recovery.conf parsing. This eliminates some crufty, special-purpose code and, as a non-trivial side benefit, allows recovery.conf parameters to be unquoted. Dimitri Fontaine, with review and cleanup by Alvaro Herrera, Itagaki Takahiro, and me.	2010-12-03 08:56:44 -05:00
Tom Lane	d583f10b7e	Create core infrastructure for KNNGIST. This is a heavily revised version of builtin_knngist_core-0.9. The ordering operators are no longer mixed in with actual quals, which would have confused not only humans but significant parts of the planner. Instead, ordering operators are carried separately throughout planning and execution. Since the API for ambeginscan and amrescan functions had to be changed anyway, this commit takes the opportunity to rationalize that a bit. RelationGetIndexScan no longer forces a premature index_rescan call; instead, callers of index_beginscan must call index_rescan too. Aside from making the AM-side initialization logic a bit less peculiar, this has the advantage that we do not make a useless extra am_rescan call when there are runtime key values. AMs formerly could not assume that the key values passed to amrescan were actually valid; now they can. Teodor Sigaev and Tom Lane	2010-12-02 20:51:37 -05:00
Tom Lane	c0b5fac701	Simplify and speed up mapping of index opfamilies to pathkeys. Formerly we looked up the operators associated with each index (caching them in relcache) and then the planner looked up the btree opfamily containing such operators in order to build the btree-centric pathkey representation that describes the index's sort order. This is quite pointless for btree indexes: we might as well just use the index's opfamily information directly. That saves syscache lookup cycles during planning, and furthermore allows us to eliminate the relcache's caching of operators altogether, which may help in reducing backend startup time. I added code to plancat.c to perform the same type of double lookup on-the-fly if it's ever faced with a non-btree amcanorder index AM. If such a thing actually becomes interesting for production, we should replace that logic with some more-direct method for identifying the corresponding btree opfamily; but it's not worth spending effort on now. There is considerably more to do pursuant to my recent proposal to get rid of sort-operator-based representations of sort orderings, but this patch grabs some of the low-hanging fruit. I'll look at the remainder of that work after the current commitfest.	2010-11-29 12:30:43 -05:00
Simon Riggs	ed78384acd	Move call to GetTopTransactionId() earlier in LockAcquire(), removing an infrequently occurring race condition in Hot Standby. An xid must be assigned before a lock appears in shared memory, rather than immediately after, else GetRunningTransactionLocks() may see InvalidTransactionId, causing assertion failures during lock processing on standby. Bug report and diagnosis by Fujii Masao, fix by me.	2010-11-29 01:08:02 +00:00
Robert Haas	55109313f9	Add more ALTER <object> .. SET SCHEMA commands. This adds support for changing the schema of a conversion, operator, operator class, operator family, text search configuration, text search dictionary, text search parser, or text search template. Dimitri Fontaine, with assorted corrections and other kibitzing.	2010-11-26 17:31:54 -05:00
Robert Haas	cc1ed40d57	Object access hook framework, with post-creation hook. After a SQL object is created, we provide an opportunity for security or logging plugins to get control; for example, a security label provider could use this to assign an initial security label to newly created objects. The basic infrastructure is (hopefully) reusable for other types of events that might require similar treatment. KaiGai Kohei, with minor adjustments.	2010-11-25 11:50:13 -05:00
Bruce Momjian	ba11258ccb	When reporting the server as not responding, if the hostname was supplied, also print the IP address. This allows IPv4 and IPv6 failures to be distinguished. Also useful when a hostname resolves to multiple IP addresses. Also, remove use of inet_ntoa() and use our own inet_net_ntop() in all places, including in libpq, because it is thread-safe.	2010-11-24 17:04:19 -05:00
Tom Lane	725d52d0c2	Create the system catalog infrastructure needed for KNNGIST. This commit adds columns amoppurpose and amopsortfamily to pg_amop, and column amcanorderbyop to pg_am. For the moment all the entries in amcanorderbyop are "false", since the underlying support isn't there yet. Also, extend the CREATE OPERATOR CLASS/ALTER OPERATOR FAMILY commands with [ FOR SEARCH \| FOR ORDER BY sort_operator_family ] clauses to allow the new columns of pg_amop to be populated, and create pg_dump support for dumping that information. I also added some documentation, although it's perhaps a bit premature given that the feature doesn't do anything useful yet. Teodor Sigaev, Robert Haas, Tom Lane	2010-11-24 14:22:17 -05:00
Peter Eisentraut	f2a4278330	Propagate ALTER TYPE operations to typed tables This adds RESTRICT/CASCADE flags to ALTER TYPE ... ADD/DROP/ALTER/ RENAME ATTRIBUTE to control whether to alter typed tables as well.	2010-11-23 22:50:17 +02:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Robert Haas	44475e782f	Centralize some ALTER <whatever> .. SET SCHEMA checks. Any flavor of ALTER <whatever> .. SET SCHEMA fails if (1) the object is already in the new schema, (2) either the old or new schema is a temp schema, or (3) either the old or new schema is the TOAST schema. Extraced from a patch by Dimitri Fontaine, with additional hacking by me.	2010-11-22 19:53:34 -05:00
Robert Haas	506070be34	Bump catversion. Should have done this as part of format(text) patch.	2010-11-21 06:34:42 -05:00
Robert Haas	7504870778	Add new SQL function, format(text). Currently, three conversion format specifiers are supported: %s for a string, %L for an SQL literal, and %I for an SQL identifier. The latter two are deliberately designed not to overlap with what sprintf() already supports, in case we want to add more of sprintf()'s functionality here later. Patch by Pavel Stehule, heavily revised by me. Reviewed by Jeff Janes and, in earlier versions, by Itagaki Takahiro and Tom Lane.	2010-11-20 22:33:27 -05:00
Tom Lane	89a368418c	Further cleanup of indxpath logic related to IndexOptInfo.opfamily array. We no longer need the terminating zero entry in opfamily[], so get rid of it. Also replace assorted ad-hoc looping logic with simple for and foreach constructs. This code is now noticeably more readable than it was an hour ago; credit to Robert for seeing that it could be simplified.	2010-11-20 15:07:16 -05:00
Robert Haas	4343c0e546	Expose quote_literal_cstr() from core. This eliminates the need for inefficient implementions of this functionality in both contrib/dblink and contrib/tablefunc, so remove them. The upcoming patch implementing an in-core format() function will also require this functionality. In passing, add some regression tests.	2010-11-20 10:04:48 -05:00
Robert Haas	4fc115b2e9	Speed up conversion of signed integers to C strings. A hand-coded implementation turns out to be much faster than calling printf(). In passing, add a few more regresion tests. Andres Freund, with assorted, mostly cosmetic changes.	2010-11-19 22:13:11 -05:00
Tom Lane	0f61d4dd1b	Improve relation width estimation for subqueries. As per the ancient comment for set_rel_width, it really wasn't much good for relations that aren't plain tables: it would never find any stats and would always fall back on datatype-based estimates, which are often pretty silly. Fix that by copying up width estimates from the subquery planning process. At some point we might want to do this for CTEs too, but that would be a significantly more invasive patch because the sub-PlannerInfo is no longer accessible by the time it's needed. I refrained from doing anything about that, partly for fear of breaking the unmerged CTE-related patches. In passing, also generate less bogus width estimates for whole-row Vars. Per a gripe from Jon Nelson.	2010-11-19 17:31:50 -05:00
Alvaro Herrera	6cc2deb86e	Add pg_describe_object function This function is useful to obtain textual descriptions of objects as stored in pg_depend.	2010-11-18 17:06:19 -03:00
Tom Lane	6fbc323c80	Further fallout from the MergeAppend patch. Fix things so that top-N sorting can be used in child Sort nodes of a MergeAppend node, when there is a LIMIT and no intervening joins or grouping. Actually doing this on the executor side isn't too bad, but it's a bit messier to get the planner to cost it properly. Per gripe from Robert Haas. In passing, fix an oversight in the original top-N-sorting patch: query_planner should not assume that a LIMIT can be used to make an explicit sort cheaper when there will be grouping or aggregation in between. Possibly this should be back-patched, but I'm not sure the mistake is serious enough to be a real problem in practice.	2010-11-18 00:30:10 -05:00
Tom Lane	511e902b51	Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally. In the previous coding, we simply issued ALTER SEQUENCE RESTART commands, which do not roll back on error. This meant that an error between truncating and committing left the sequences out of sync with the table contents, with potentially bad consequences as were noted in a Warning on the TRUNCATE man page. To fix, create a new storage file (relfilenode) for a sequence that is to be reset due to RESTART IDENTITY. If the transaction aborts, we'll automatically revert to the old storage file. This acts just like a rewriting ALTER TABLE operation. A penalty is that we have to take exclusive lock on the sequence, but since we've already got exclusive lock on its owning table, that seems unlikely to be much of a problem. The interaction of this with usual nontransactional behaviors of sequence operations is a bit weird, but it's hard to see what would be completely consistent. Our choice is to discard cached-but-unissued sequence values both when the RESTART is executed, and at rollback if any; but to not touch the currval() state either time. In passing, move the sequence reset operations to happen before not after any AFTER TRUNCATE triggers are fired. The previous ordering was not logically sensible, but was forced by the need to minimize inconsistency if the triggers caused an error. Transactional rollback is a much better solution to that. Patch by Steve Singer, rather heavily adjusted by me.	2010-11-17 16:42:18 -05:00
Heikki Linnakangas	2edc5cd493	The GiST scan algorithm uses LSNs to detect concurrent pages splits, but temporary indexes are not WAL-logged. We used a constant LSN for temporary indexes, on the assumption that we don't need to worry about concurrent page splits in temporary indexes because they're only visible to the current session. But that assumption is wrong, it's possible to insert rows and split pages in the same session, while a scan is in progress. For example, by opening a cursor and fetching some rows, and INSERTing new rows before fetching some more. Fix by generating fake increasing LSNs, used in place of real LSNs in temporary GiST indexes.	2010-11-16 11:32:21 +02:00
Robert Haas	3134d8863e	Add new buffers_backend_fsync field to pg_stat_bgwriter. This new field counts the number of times that a backend which writes a buffer out to the OS must also fsync() it. This happens when the bgwriter fsync request queue is full, and is generally detrimental to performance, so it's good to know when it's happening. Along the way, log a new message at level DEBUG1 whenever we fail to hand off an fsync, so that the problem can also be seen in examination of log files (if the logging level is cranked up high enough). Greg Smith, with minor tweaks by me.	2010-11-15 12:42:59 -05:00
Robert Haas	20cf8ae478	Fix copy-and-pasteo a little more completely. copydir.c is no longer in src/port	2010-11-15 10:10:58 -05:00
Alvaro Herrera	ae4b17edee	Fix copy-and-pasteo.	2010-11-15 11:52:56 -03:00
Robert Haas	11e482c350	Move copydir() prototype into its own header file. Having this in src/include/port.h makes no sense, now that copydir.c lives in src/backend/strorage rather than src/port. Along the way, remove an obsolete comment from contrib/pg_upgrade that makes reference to the old location.	2010-11-12 16:39:53 -05:00
Robert Haas	7ba6e4f0e0	Add monitoring function pg_last_xact_replay_timestamp. Fujii Masao, with a little wordsmithing by me.	2010-11-09 22:52:19 -05:00
Itagaki Takahiro	844ed5dc97	Don't use __declspec (dllimport) for PGDLLEXPORT to reduce warnings by gcc version 4 on mingw and cygwin. We don't use dllexport here because dllexport and dllwrap don't work well together.	2010-11-10 12:17:43 +09:00
Tom Lane	947d0c862c	Use appendrel planning logic for top-level UNION ALL structures. Formerly, we could convert a UNION ALL structure inside a subquery-in-FROM into an appendrel, as a side effect of pulling up the subquery into its parent; but top-level UNION ALL always caused use of plan_set_operations(). That didn't matter too much because you got an Append-based plan either way. However, now that the appendrel code can do things with MergeAppend, it's worthwhile to hack up the top-level case so it also uses appendrels. This is a bit of a stopgap; but going much further than this will require a major rewrite of the planner's set-operations support, which I'm not prepared to undertake now. For the moment let's grab the low-hanging fruit.	2010-11-08 15:15:02 -05:00
Tom Lane	034967bdcb	Reimplement planner's handling of MIN/MAX aggregate optimization. Per my recent proposal, get rid of all the direct inspection of indexes and manual generation of paths in planagg.c. Instead, set up EquivalenceClasses for the aggregate argument expressions, and let the regular path generation logic deal with creating paths that can satisfy those sort orders. This makes planagg.c a bit more visible to the rest of the planner than it was originally, but the approach is basically a lot cleaner than before. A major advantage of doing it this way is that we get MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans) practically for free, whereas in the old way we'd have had to add a whole lot more duplicative logic. One small disadvantage of this approach is that MIN/MAX aggregates can no longer exploit partial indexes having an "x IS NOT NULL" predicate, unless that restriction or something that implies it is specified in the query. The previous implementation was able to use the added "x IS NOT NULL" condition as an extra predicate proof condition, but in this version we rely entirely on indexes that are considered usable by the main planning process. That seems a fair tradeoff for the simplicity and functionality gained.	2010-11-04 12:01:17 -04:00
Tom Lane	0811ff2063	Avoid using a local FunctionCallInfoData struct in ExecMakeFunctionResult and related routines. We already had a redundant FunctionCallInfoData struct in FuncExprState, but were using that copy only in set-returning-function cases, to avoid keeping function evaluation state in the expression tree for the benefit of plpgsql's "simple expression" logic. But of course that didn't work anyway. Given the recent fixes in plpgsql there is no need to have two separate behaviors here. Getting rid of the local FunctionCallInfoData structs should make things a little faster (because we don't need to do InitFunctionCallInfoData each time), and it also makes for a noticeable reduction in stack space consumption during recursive calls.	2010-11-01 13:54:21 -04:00
Tom Lane	186cbbda8f	Provide hashing support for arrays. The core of this patch is hash_array() and associated typcache infrastructure, which works just about exactly like the existing support for array comparison. In addition I did some work to ensure that the planner won't think that an array type is hashable unless its element type is hashable, and similarly for sorting. This includes adding a datatype parameter to op_hashjoinable and op_mergejoinable, and adding an explicit "hashable" flag to SortGroupClause. The lack of a cross-check on the element type was a pre-existing bug in mergejoin support --- but it didn't matter so much before, because if you couldn't sort the element type there wasn't any good alternative to failing anyhow. Now that we have the alternative of hashing the array type, there are cases where we can avoid a failure by being picky at the planner stage, so it's time to be picky. The issue of exactly how to combine the per-element hash values to produce an array hash is still open for discussion, but the rest of this is pretty solid, so I'll commit it as-is.	2010-10-30 21:56:11 -04:00
Tom Lane	14231a41a9	Avoid creation of useless EquivalenceClasses during planning. Zoltan Boszormenyi exhibited a test case in which planning time was dominated by construction of EquivalenceClasses and PathKeys that had no actual relevance to the query (and in fact got discarded immediately). This happened because we generated PathKeys describing the sort ordering of every index on every table in the query, and only after that checked to see if the sort ordering was relevant. The EC/PK construction code is O(N^2) in the number of ECs, which is all right for the intended number of such objects, but it gets out of hand if there are ECs for lots of irrelevant indexes. To fix, twiddle the handling of mergeclauses a little bit to ensure that every interesting EC is created before we begin path generation. (This doesn't cost anything --- in fact I think it's a bit cheaper than before --- since we always eventually created those ECs anyway.) Then, if an index column can't be found in any pre-existing EC, we know that that sort ordering is irrelevant for the query. Instead of creating a useless EC, we can just not build a pathkey for the index column in the first place. The index will still be considered if it's useful for non-order-related reasons, but we will think of its output as unsorted.	2010-10-29 11:52:50 -04:00
Robert Haas	20709f8136	Add a client authentication hook. KaiGai Kohei, with minor cleanup of the comments by me.	2010-10-26 21:20:38 -04:00
Itagaki Takahiro	bf76ad07fe	Fix typos "are are".	2010-10-26 17:15:17 +09:00
Peter Eisentraut	35670340f5	Refactor typenameTypeId() Split the old typenameTypeId() into two functions: A new typenameTypeId() that returns only a type OID, and typenameTypeIdAndMod() that returns type OID and typmod. This isolates call sites better that actually care about the typmod.	2010-10-25 21:44:49 +03:00
Tom Lane	84c123be1d	Allow new values to be added to an existing enum type. After much expenditure of effort, we've got this to the point where the performance penalty is pretty minimal in typical cases. Andrew Dunstan, reviewed by Brendan Jurd, Dean Rasheed, and Tom Lane	2010-10-24 23:05:41 -04:00
Heikki Linnakangas	5c84fe4607	Make OFF keyword unreserved. It's not hard to imagine wanting to use 'off' as a variable or column name, and it's not reserved in recent versions of the SQL spec either. This became particularly annoying in 9.0, before that PL/pgSQL replaced variable names in queries with parameter markers, so it was possible to use OFF and many other backend parser keywords as variable names. Because of that, backpatch to 9.0.	2010-10-22 17:44:50 +03:00
Tom Lane	529cb267a6	Improve handling of domains over arrays. This patch eliminates various bizarre behaviors caused by sloppy thinking about the difference between a domain type and its underlying array type. In particular, the operation of updating one element of such an array has to be considered as yielding a value of the underlying array type, not a value of the domain, because there's no assurance that the domain's CHECK constraints are still satisfied. If we're intending to store the result back into a domain column, we have to re-cast to the domain type so that constraints are re-checked. For similar reasons, such a domain can't be blindly matched to an ANYARRAY polymorphic parameter, because the polymorphic function is likely to apply array-ish operations that could invalidate the domain constraints. For the moment, we just forbid such matching. We might later wish to insert an automatic downcast to the underlying array type, but such a change should also change matching of domains to ANYELEMENT for consistency. To ensure that all such logic is rechecked, this patch removes the original hack of setting a domain's pg_type.typelem field to match its base type; the typelem will always be zero instead. In those places where it's really okay to look through the domain type with no other logic changes, use the newly added get_base_element_type function in place of get_element_type. catversion bumped due to change in pg_type contents. Per bug #5717 from Richard Huxton and subsequent discussion.	2010-10-21 16:07:17 -04:00
Tom Lane	6e74a91b2b	Fix incorrect generation of whole-row variables in planner. A couple of places in the planner need to generate whole-row Vars, and were cutting corners by setting vartype = RECORDOID in the Vars, even in cases where there's an identifiable named composite type for the RTE being referenced. While we mostly got away with this, it failed when there was also a parser-generated whole-row reference to the same RTE, because the two Vars weren't equal() due to the difference in vartype. Fix by providing a subroutine the planner can call to generate whole-row Vars the same way the parser does. Per bug #5716 from Andrew Tipton. Back-patch to 9.0 where one of the bogus calls was introduced (the other one is new in HEAD).	2010-10-19 15:09:23 -04:00
Peter Eisentraut	bc8624b15d	Support key word 'all' in host column of pg_hba.conf	2010-10-18 22:15:44 +03:00
Tom Lane	419d2374bf	Fix a passel of inappropriately-named global functions in GIN. The GIN code has absolutely no business exporting GIN-specific functions with names as generic as compareItemPointers() or newScanKey(); that's just trouble waiting to happen. I got annoyed about this again just now and decided to fix it. This commit ensures that all global symbols defined in access/gin/ have names including "gin" or "Gin". There were a couple of cases, like names involving "PostingItem", where arguably the names were already sufficiently nongeneric; but I figured as long as I was risking creating merge problems for unapplied GIN patches I might as well impose a uniform policy. I didn't touch any static symbol names. There might be some places where it'd be appropriate to rename some static functions to match siblings that are exported, but I'll leave that for another time.	2010-10-17 21:43:26 -04:00
Tom Lane	48c7d9f6ff	Improve GIN indexscan cost estimation. The better estimate requires more statistics than we previously stored: in particular, counts of "entry" versus "data" pages within the index, as well as knowledge of the number of distinct key values. We collect this information during initial index build and update it during VACUUM, storing the info in new fields on the index metapage. No initdb is required because these fields will read as zeroes in a pre-existing index, and the new gincostestimate code is coded to behave (reasonably) sanely if they are zeroes. Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.	2010-10-17 20:52:32 -04:00
Tom Lane	07f1264dda	Allow WITH clauses to be attached to INSERT, UPDATE, DELETE statements. This is not the hoped-for facility of using INSERT/UPDATE/DELETE inside a WITH, but rather the other way around. It seems useful in its own right anyway. Note: catversion bumped because, although the contents of stored rules might look compatible, there's actually a subtle semantic change. A single Query containing a WITH and INSERT...VALUES now represents writing the WITH before the INSERT, not before the VALUES. While it's not clear that that matters to anyone, it seems like a good idea to have it cited in the git history for catversion.h. Original patch by Marko Tiikkaja, with updating and cleanup by Hitoshi Harada.	2010-10-15 19:55:25 -04:00
Peter Eisentraut	6ab42ae367	Support host names in pg_hba.conf Peter Eisentraut, reviewed by KaiGai Kohei and Tom Lane	2010-10-15 22:56:18 +03:00
Tom Lane	11cad29c91	Support MergeAppend plans, to allow sorted output from append relations. This patch eliminates the former need to sort the output of an Append scan when an ordered scan of an inheritance tree is wanted. This should be particularly useful for fast-start cases such as queries with LIMIT. Original patch by Greg Stark, with further hacking by Hans-Jurgen Schonig, Robert Haas, and Tom Lane.	2010-10-14 16:57:57 -04:00
Peter Eisentraut	1a996d6c29	Remove executable permission from files where it doesn't belong	2010-10-13 22:30:25 +03:00
Tom Lane	f4d242ef94	Remove some unnecessary tests of pgstat_track_counts. We may as well make pgstat_count_heap_scan() and related macros just count whenever rel->pgstat_info isn't null. Testing pgstat_track_counts buys nothing at all in the normal case where that flag is ON; and when it's OFF, the pgstat_info link will be null, so it's still a useless test. This change is unlikely to buy any noticeable performance improvement, but a cycle shaved is a cycle earned; and my investigations earlier today convinced me that we're down to the point where individual instructions in the inner execution loops are starting to matter.	2010-10-12 14:44:25 -04:00
Tom Lane	220e45bf32	Improve the planner's simplification of NOT constructs. This patch merges the responsibility for NOT-flattening into eval_const_expressions' processing. It wasn't done that way originally because prepqual.c is far older than eval_const_expressions. But putting this work into eval_const_expressions saves one pass over the qual trees, and in fact saves even more than that because we can exploit the knowledge that the subexpressions have already been recursively simplified. Doing it this way also lets us do it uniformly over all expressions, whereas prepqual.c formerly just did it at top level to save cycles. That should improve the planner's ability to recognize logically-equivalent constructs. While at it, also add the ability to fold a NOT into BooleanTest and NullTest constructs (the latter only for the scalar-datatype case). Per discussion of bug #5702.	2010-10-10 23:19:50 -04:00
Tom Lane	2ec993a7cb	Support triggers on views. This patch adds the SQL-standard concept of an INSTEAD OF trigger, which is fired instead of performing a physical insert/update/delete. The trigger function is passed the entire old and/or new rows of the view, and must figure out what to do to the underlying tables to implement the update. So this feature can be used to implement updatable views using trigger programming style rather than rule hacking. In passing, this patch corrects the names of some columns in the information_schema.triggers view. It seems the SQL committee renamed them somewhere between SQL:99 and SQL:2003. Dean Rasheed, reviewed by Bernd Helmle; some additional hacking by me.	2010-10-10 13:45:07 -04:00
Tom Lane	9cc8c84e73	Improve logging in VACUUM FULL VERBOSE and CLUSTER VERBOSE. This patch resurrects some of the information that could be logged by the old, now-dead implementation of VACUUM FULL, in particular counts of live and dead tuples and the time taken for the table rebuild proper. There's still no logging about the ensuing index rebuilds, though. Itagaki Takahiro	2010-10-07 21:46:46 -04:00
Tom Lane	3ba11d3df2	Teach CLUSTER to use seqscan-and-sort when it's faster than indexscan. ... or at least, when the planner's cost estimates say it will be faster. Leonardo Francalanci, reviewed by Itagaki Takahiro and Tom Lane	2010-10-07 20:00:28 -04:00
Tom Lane	3e5f9412d0	Reduce the memory requirement for large ispell dictionaries. This patch eliminates per-chunk palloc overhead for most small allocations needed in the representation of an ispell dictionary. This saves close to a factor of 2 on the current Czech ispell data. While it doesn't cover every last small allocation in the ispell code, we are at the point of diminishing returns, because about 95% of the allocations are covered already. Pavel Stehule, rather heavily revised by Tom	2010-10-06 19:31:05 -04:00
Tom Lane	9b910def24	Clean up temporary-memory management during ispell dictionary loading. Add explicit initialization and cleanup functions to spell.c, and keep all working state in the already-existing ISpellDict struct. This lets us get rid of a static variable along with some extremely shaky assumptions about usage of child memory contexts. This commit is just code beautification and has no impact on functionality or performance, but it opens the way to a less-grotty implementation of Pavel's memory-saving hack, which will follow shortly.	2010-10-06 15:15:15 -04:00
Tom Lane	eb22950510	Fix PlaceHolderVar mechanism's interaction with outer joins. The point of a PlaceHolderVar is to allow a non-strict expression to be evaluated below an outer join, after which its value bubbles up like a Var and can be forced to NULL when the outer join's semantics require that. However, there was a serious design oversight in that, namely that we didn't ensure that there was actually a correct place in the plan tree to evaluate the placeholder :-(. It may be necessary to delay evaluation of an outer join to ensure that a placeholder that should be evaluated below the join can be evaluated there. Per recent bug report from Kirill Simonov. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.	2010-09-28 14:19:00 -04:00
Robert Haas	eacb22ec47	Fix duplicate OIDs introduced by SECURITY LABEL patch. Report by Shigeru Hanada.	2010-09-28 07:07:03 -04:00
Robert Haas	4d355a8336	Add a SECURITY LABEL command. This is intended as infrastructure to support integration with label-based mandatory access control systems such as SE-Linux. Further changes (mostly hooks) will be needed, but this is a big chunk of it. KaiGai Kohei and Robert Haas	2010-09-27 20:55:27 -04:00
Peter Eisentraut	e440e12c56	Add ALTER TYPE ... ADD/DROP/ALTER/RENAME ATTRIBUTE Like with tables, this also requires allowing the existence of composite types with zero attributes. reviewed by KaiGai Kohei	2010-09-26 14:41:03 +03:00
Magnus Hagander	fe9b36fd59	Convert cvsignore to gitignore, and add .gitignore for build targets.	2010-09-22 12:57:04 +02:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Tom Lane	9eef3318a2	Fix several broken $PostgreSQL$ keywords. Noted while experimenting with Magnus's script to remove these.	2010-09-19 16:17:45 +00:00
Heikki Linnakangas	723d0184e2	Use a latch to make startup process wake up and replay immediately when new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.	2010-09-15 10:35:05 +00:00
Heikki Linnakangas	236b6bc29e	Simplify Windows implementation of latches. There's no need to keep a dynamic pool of event handles, we can permanently assign one for each shared latch. Thanks to that, we no longer need a separate shared memory block for latches, and we don't need to know in advance how many shared latches there is, so you no longer need to remember to update NumSharedLatches when you introduce a new latch to the system.	2010-09-15 10:06:21 +00:00
Tom Lane	4e97631e6a	Fix join-removal logic for pseudoconstant and outerjoin-delayed quals. In these cases a qual can get marked with the removable rel in its required_relids, but this is just to schedule its evaluation correctly, not because it really depends on the rel. We were assuming that, in effect, we could throw away all quals so marked, which is nonsense. Tighten up the logic to be a little more paranoid about which quals belong to the outer join being considered for removal, and arrange for all quals that don't belong to be updated so they will still get evaluated correctly. Also fix another problem that happened to be exposed by this test case, which was that make_join_rel() was failing to notice some cases where a constant-false qual could be used to prove a join relation empty. If it's a pushed-down constant false, then the relation is empty even if it's an outer join, because the qual applies after the outer join expansion. Per report from Nathan Grange. Back-patch into 9.0.	2010-09-14 23:15:29 +00:00
Heikki Linnakangas	d1c33ccf62	Remove prototype for non-existent function from walreceiver.h. Tidy up by separating prototypes for functions in walreceiver.c and walreceiverfuncs.c with comments.	2010-09-13 10:14:25 +00:00
Joe Conway	5eb15c9942	SERIALIZABLE transactions are actually implemented beneath the covers with transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.	2010-09-11 18:38:58 +00:00
Heikki Linnakangas	2746e5f21d	Introduce latches. A latch is a boolean variable, with the capability to wait until it is set. Latches can be used to reliably wait until a signal arrives, which is hard otherwise because signals don't interrupt select() on some platforms, and even when they do, there's race conditions. On Unix, latches use the so called self-pipe trick under the covers to implement the sleep until the latch is set, without race conditions. On Windows, Windows events are used. Use the new latch abstraction to sleep in walsender, so that as soon as a transaction finishes, walsender is woken up to immediately send the WAL to the standby. This reduces the latency between master and standby, which is good. Preliminary work by Fujii Masao. The latch implementation is by me, with helpful comments from many people.	2010-09-11 15:48:04 +00:00
Tom Lane	303696c3b4	Install a data-type-based solution for protecting pg_get_expr(). Since the code underlying pg_get_expr() is not secure against malformed input, and can't practically be made so, we need to prevent miscreants from feeding arbitrary data to it. We can do this securely by declaring pg_get_expr() to take a new datatype "pg_node_tree" and declaring the system catalog columns that hold nodeToString output to be of that type. There is no way at SQL level to create a non-null value of type pg_node_tree. Since the backend-internal operations that fill those catalog columns operate below the SQL level, they are oblivious to the datatype relabeling and don't need any changes.	2010-09-03 01:34:55 +00:00
Tom Lane	8ab6a6b456	In HEAD only, revert kluge solution for preventing misuse of pg_get_expr(). A data-type-based solution, which is much cleaner and more bulletproof, will follow shortly. It seemed best to make this a separate commit though.	2010-09-03 01:26:52 +00:00
Tom Lane	9513918c6c	Fix up flushing of composite-type typcache entries to be driven directly by SI invalidation events, rather than indirectly through the relcache. In the previous coding, we had to flush a composite-type typcache entry whenever we discarded the corresponding relcache entry. This caused problems at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report from Jeff Davis, and might result in real-world problems given the kind of unexpected relcache flush that that test mechanism is intended to model. The new coding decouples relcache and typcache management, which is a good thing anyway from a structural perspective. The cost is that we have to search the typcache linearly to find entries that need to be flushed. There are a couple of ways we could avoid that, but at the moment it's not clear it's worth any extra trouble, because the typcache contains very few entries in typical operation. Back-patch to 8.2, the same as some other recent fixes in this general area. The patch could be carried back to 8.0 with some additional work, but given that it's only hypothetical whether we're fixing any problem observable in the field, it doesn't seem worth the work now.	2010-09-02 03:16:46 +00:00
Peter Eisentraut	2355b69b1e	Small refactoring of makeVar() from a TargetEntry	2010-08-27 20:30:08 +00:00
Robert Haas	c10575ff00	Rewrite comment code for better modularity, and add necessary locking. Review by Alvaro Herrera, KaiGai Kohei, and Tom Lane.	2010-08-27 11:47:41 +00:00
Itagaki Takahiro	49b27ab551	Add string functions: concat(), concat_ws(), left(), right(), and reverse(). Pavel Stehule, reviewed by me.	2010-08-24 06:30:44 +00:00
Tom Lane	b9defe0405	Marginal code cleanup for streaming replication. There is no reason that proc.c should have to get involved in this dirty hack for letting the postmaster know which children are walsenders. Revert that file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.	2010-08-23 17:20:01 +00:00
Magnus Hagander	946045f04d	Add vacuum and analyze counters to pg_stat_*_tables views.	2010-08-21 10:59:17 +00:00
Peter Eisentraut	3f11971916	Remove extra newlines at end and beginning of files, add missing newlines at end of files.	2010-08-19 05:57:36 +00:00
Tom Lane	2d8314bd43	Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used elsewhere. Similarly rename the version in mbprint.c, not because this affects anything but just to keep the two copies in exact sync. There was some discussion of having only one copy in src/port/ instead, but this function is so small and unlikely to change that that seems like overkill. Slightly editorialized version of a patch by Joseph Adams. (The bug-fix aspect of his patch was applied separately, and back-patched.)	2010-08-18 19:54:01 +00:00
Tom Lane	b5565bca11	Fix failure of "ALTER TABLE t ADD COLUMN c serial" when done by non-owner. The implicitly created sequence was created as owned by the current user, who could be different from the table owner, eg if current user is a superuser or some member of the table's owning role. This caused sanity checks in the SEQUENCE OWNED BY code to spit up. Although possibly we don't need those sanity checks, the safest fix seems to be to make sure the implicit sequence is assigned the same owner role as the table has. (We still do all permissions checks as the current user, however.) Per report from Josh Berkus. Back-patch to 9.0. The bug goes back to the invention of SEQUENCE OWNED BY in 8.2, but the fix requires an API change for DefineRelation(), which seems to have potential for breaking third-party code if done in a minor release. Given the lack of prior complaints, it's probably not worth fixing in the stable branches.	2010-08-18 18:35:21 +00:00
Tom Lane	946b078044	MyBackendId now needs to be PGDLLIMPORT, so that contrib modules can access it on Windows. Per buildfarm.	2010-08-14 13:37:21 +00:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Tom Lane	a0b7b717a4	Add xml_is_well_formed, xml_is_well_formed_document, xml_is_well_formed_content functions to the core XML code. Per discussion, the former depends on XMLOPTION while the others do not. These supersede a version previously offered by contrib/xml2. Mike Fowler, reviewed by Pavel Stehule	2010-08-13 18:36:26 +00:00
Robert Haas	30c22eb8fc	Correct sundry errors in Hot Standby-related comments. Fujii Masao	2010-08-12 23:24:54 +00:00
Tom Lane	33f43725fb	Add three-parameter forms of array_to_string and string_to_array, to allow better handling of NULL elements within the arrays. The third parameter is a string that should be used to represent a NULL element, or should be translated into a NULL element, respectively. If the third parameter is NULL it behaves the same as the two-parameter form. There are two incompatible changes in the behavior of the two-parameter form of string_to_array. First, it will return an empty (zero-element) array rather than NULL when the input string is of zero length. Second, if the field separator is NULL, the function splits the string into individual characters, rather than returning NULL as before. These two changes make this form fully compatible with the behavior of the new three-parameter form. Pavel Stehule, reviewed by Brendan Jurd	2010-08-10 21:51:00 +00:00
Tom Lane	4dfc457854	Add an xpath_exists() function. This is equivalent to XMLEXISTS except that it offers support for namespace mapping. Mike Fowler, reviewed by David Fetter	2010-08-08 19:15:27 +00:00
Tom Lane	46aa77c7bd	Add stats functions and views to provide access to a transaction's own statistics counts. These numbers are being accumulated but haven't yet been transmitted to the collector (and won't be, until the transaction ends). For some purposes, though, it's handy to be able to look at them. Joel Jacobson, reviewed by Itagaki Takahiro	2010-08-08 16:27:06 +00:00
Tom Lane	e49ae8d3bc	Recognize functional dependency on primary keys. This allows a table's other columns to be referenced without listing them in GROUP BY, so long as the primary key column(s) are listed in GROUP BY. Eventually we should also allow functional dependency on a UNIQUE constraint when the columns are marked NOT NULL, but that has to wait until NOT NULL constraints are represented in pg_constraint, because we need to have pg_constraint OIDs for all the conditions needed to ensure functional dependency. Peter Eisentraut, reviewed by Alex Hunsaker and Tom Lane	2010-08-07 02:44:09 +00:00
Tom Lane	b0c451e145	Remove the single-argument form of string_agg(). It added nothing much in functionality, while creating an ambiguity in usage with ORDER BY that at least two people have already gotten seriously confused by. Also, add an opr_sanity test to check that we don't in future violate the newly minted policy of not having built-in aggregates with the same name and different numbers of parameters. Per discussion of a complaint from Thom Brown.	2010-08-05 18:21:19 +00:00
Robert Haas	fd1843ff89	Standardize get_whatever_oid functions for other object types. - Rename TSParserGetPrsid to get_ts_parser_oid. - Rename TSDictionaryGetDictid to get_ts_dict_oid. - Rename TSTemplateGetTmplid to get_ts_template_oid. - Rename TSConfigGetCfgid to get_ts_config_oid. - Rename FindConversionByName to get_conversion_oid. - Rename GetConstraintName to get_constraint_oid. - Add new functions get_opclass_oid, get_opfamily_oid, get_rewrite_oid, get_rewrite_oid_without_relid, get_trigger_oid, and get_cast_oid. The name of each function matches the corresponding catalog. Thanks to KaiGai Kohei for the review.	2010-08-05 15:25:36 +00:00
Robert Haas	2a6ef3445c	Standardize get_whatever_oid functions for object types with unqualified names. - Add a missing_ok parameter to get_tablespace_oid. - Avoid duplicating get_tablespace_od guts in objectNamesToOids. - Add a missing_ok parameter to get_database_oid. - Replace get_roleid and get_role_checked with get_role_oid. - Add get_namespace_oid, get_language_oid, get_am_oid. - Refactor existing code to use new interfaces. Thanks to KaiGai Kohei for the review.	2010-08-05 14:45:09 +00:00
Peter Eisentraut	641459f269	Add xmlexists function by Mike Fowler, reviewed by Peter Eisentraut	2010-08-05 04:21:54 +00:00
Robert Haas	26e47efb66	Fix declared argument name for numeric_maximum_size. The previous commit changed the function to say 'typmod' rather than 'typemod', but I forgot to update the header file.	2010-08-04 17:35:59 +00:00
Tom Lane	db04f2b322	Replace the naive HYPOT() macro with a standards-conformant hypotenuse function. This avoids unnecessary overflows and probably gives a more accurate result as well. Paul Matthews, reviewed by Andrew Geery	2010-08-03 21:21:03 +00:00
Tom Lane	67becf8d41	Fix ANALYZE's ancient deficiency of not trying to collect stats for expression indexes when the index column type (the opclass opckeytype) is different from the expression's datatype. When coded, this limitation wasn't worth worrying about because we had no intelligence to speak of in stats collection for the datatypes used by such opclasses. However, now that there's non-toy estimation capability for tsvector queries, it amounts to a bug that ANALYZE fails to do this. The fix changes struct VacAttrStats, and therefore constitutes an API break for custom typanalyze functions. Therefore we can't back-patch it into released branches, but it was agreed that 9.0 isn't yet frozen hard enough to make such a change unacceptable. Ergo, back-patch to 9.0 but no further. The API break had better be mentioned in 9.0 release notes.	2010-08-01 22:38:11 +00:00
Tom Lane	d4fe61b083	Fix an additional set of problems in GIN's handling of lossy page pointers. Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count() from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee: \| dd:')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:')); Back-patch to 8.4 where the lossy pointer code was introduced.	2010-08-01 19:16:39 +00:00
Tom Lane	0454f13161	Rewrite the rbtree routines so that an RBNode is the first field of the struct representing a tree entry, rather than being a separately allocated piece of storage. This API is at least as clean as the old one (if not more so --- there were some bizarre choices in there) and it permits a very substantial memory savings, on the order of 2X in ginbulk.c's usage. Also, fix minor memory leaks in code called by ginEntryInsert, in particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert itself. These leaks resulted in the GIN index build context continuing to bloat even after we'd filled it to maintenance_work_mem and started to dump data out to the index. In combination these fixes restore the GIN index build code to honoring the maintenance_work_mem limit about as well as it did in 8.4. Speed seems on par with 8.4 too, maybe even a bit faster, for a non-pathological case in which HEAD was formerly slower. Back-patch to 9.0 so we don't have a performance regression from 8.4.	2010-08-01 02:12:42 +00:00
Tom Lane	2ab57e089b	Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem() routines to make them behave better in the presence of "lossy" index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple "entry" streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn't really need to be visited. In Artur's example case, a query like WHERE tsvector @@ to_tsquery('a & b') was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b') The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there's more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur's example, such cases don't occur often enough to be worth trying to optimize. Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not subject to these problems.	2010-07-31 00:30:54 +00:00
Robert Haas	8a4dc94ca0	Make details of the Numeric representation private to numeric.c. Review by Tom Lane.	2010-07-30 04:30:23 +00:00
Tom Lane	f223bb7a41	Improved version of patch to protect pg_get_expr() against misuse: look through join alias Vars to avoid breaking join queries, and move the test to someplace where it will catch more possible ways of calling a function. We still ought to throw away the whole thing in favor of a data-type-based solution, but that's not feasible in the back branches. This needs to be back-patched further than 9.0, but I don't have time to do so today. Committing now so that the fix gets into 9.0beta4.	2010-07-29 23:16:33 +00:00
Simon Riggs	5b8bd0529e	Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0. Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.	2010-07-29 22:27:27 +00:00
Tom Lane	aab353a60b	Clean up some inconsistencies in the volatility marking of various I/O related functions. Per today's discussion, we will henceforth assume that datatype I/O functions are either stable or immutable, never volatile. (This implies in particular that domain CHECK constraint expressions shouldn't be volatile, since domain_in executes them.) In turn, functions that execute the I/O functions of arbitrary datatypes should always be labeled stable. This affects the labeling of array_to_string, which was unsafely marked immutable, and record_in, record_out, record_recv, record_send, domain_in, domain_recv, which were over-conservatively marked volatile. The array I/O functions were already marked stable, which is correct per this policy but would have been wrong if we maintained domain_in as volatile. Back-patch to 9.0, along with an earlier fix to correctly mark cash_in and cash_out as stable not immutable (since they depend on lc_monetary). No catversion bump --- the implications of this are not currently severe enough to justify a forced initdb.	2010-07-29 20:09:25 +00:00
Simon Riggs	04e17bae50	Add explicit regression tests for ALTER TABLE lock levels. Use this to catch a couple of lock level assignments that slipped through manual testing, per Peter Eisentraut.	2010-07-29 11:06:34 +00:00
Simon Riggs	2dbbda02e7	Reduce lock levels of CREATE TRIGGER and some ALTER TABLE, CREATE RULE actions. Avoid hard-coding lockmode used for many altering DDL commands, allowing easier future changes of lock levels. Implementation of initial analysis on DDL sub-commands, so that many lock levels are now at ShareUpdateExclusiveLock or ShareRowExclusiveLock, allowing certain DDL not to block reads/writes. First of number of planned changes in this area; additional docs required when full project complete.	2010-07-28 05:22:24 +00:00
Tom Lane	133924e13e	Fix potential failure when hashing the output of a subplan that produces a pass-by-reference datatype with a nontrivial projection step. We were using the same memory context for the projection operation as for the temporary context used by the hashtable routines in execGrouping.c. However, the hashtable routines feel free to reset their temp context at any time, which'd lead to destroying input data that was still needed. Report and diagnosis by Tao Ma. Back-patch to 8.1, where the problem was introduced by the changes that allowed us to work with "virtual" tuples instead of materializing intermediate tuple values everywhere. The earlier code looks quite similar, but it doesn't suffer the problem because the data gets copied into another context as a result of having to materialize ExecProject's output tuple.	2010-07-28 04:50:50 +00:00
Robert Haas	a3b012b560	CREATE TABLE IF NOT EXISTS. Reviewed by Bernd Helmle.	2010-07-25 23:21:22 +00:00
Robert Haas	ce68df468a	Add options to force quoting of all identifiers. I've added a quote_all_identifiers GUC which affects the behavior of the backend, and a --quote-all-identifiers argument to pg_dump and pg_dumpall which sets the GUC and also affects the quoting done internally by those applications. Design by Tom Lane; review by Alex Hunsaker; in response to bug #5488 filed by Hartmut Goebel.	2010-07-22 01:22:35 +00:00
Robert Haas	b8c6c71d1c	Centralize DML permissions-checking logic. Remove bespoke code in DoCopy and RI_Initial_Check, which now instead fabricate call ExecCheckRTPerms with a manufactured RangeTblEntry. This is intended to make it feasible for an enhanced security provider to actually make use of ExecutorCheckPerms_hook, but also has the advantage that RI_Initial_Check can allow use of the fast-path when column-level but not table-level permissions are present. KaiGai Kohei. Reviewed (in an earlier version) by Stephen Frost, and by me. Some further changes to the comments by me.	2010-07-22 00:47:59 +00:00
Robert Haas	5ffaa9005c	Add restart_after_crash GUC. Normally, we automatically restart after a backend crash, but in some cases when PostgreSQL is invoked by clusterware it may be desirable to suppress this behavior, so we provide an option which does this. Since no existing GUC group quite fits, create a new group called "error handling options" for this and the previously undocumented GUC exit_on_error, which is now documented. Review by Fujii Masao.	2010-07-20 00:47:53 +00:00
Tom Lane	3ec694e17b	Add a log_file_mode GUC that allows control of the file permissions set on log files created by the syslogger process. In passing, make unix_file_permissions display its value in octal, same as log_file_mode now does. Martin Pihlak	2010-07-16 22:25:51 +00:00
Tom Lane	7590ddb3eb	Add support for dividing money by money (yielding a float8 result) and for casting between money and numeric. Andy Balholm, reviewed by Kevin Grittner	2010-07-16 02:15:56 +00:00
Tom Lane	1cc29fe7c6	Teach EXPLAIN to print PARAM_EXEC Params as the referenced expressions, rather than just $N. This brings the display of nestloop-inner-indexscan plans back to where it's been, and incidentally improves the display of SubPlan parameters as well. In passing, simplify the EXPLAIN code by having it deal primarily in the PlanState tree rather than separately searching Plan and PlanState trees. This is noticeably cleaner for subplans, and about a wash elsewhere. One small difference from previous behavior is that EXPLAIN will no longer qualify local variable references in inner-indexscan plan nodes, since it no longer sees such nodes as possibly referencing multiple tables. Vars referenced through PARAM_EXEC Params are still forcibly qualified, though, so I don't think the display is any more confusing than before. Adjust a couple of examples in the documentation to match this behavior.	2010-07-13 20:57:19 +00:00
Tom Lane	53e757689c	Make NestLoop plan nodes pass outer-relation variables into their inner relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.	2010-07-12 17:01:06 +00:00
Robert Haas	f4122a8d50	Add a hook in ExecCheckRTPerms(). This hook allows a loadable module to gain control when table permissions are checked. It is expected to be used by an eventual SE-PostgreSQL implementation, but there are other possible applications as well. A sample contrib module can be found in the archives at: http://archives.postgresql.org/pgsql-hackers/2010-05/msg01095.php Robert Haas and Stephen Frost	2010-07-09 14:06:01 +00:00
Tom Lane	b40466c337	Stamp HEAD as 9.1devel. (And there was much rejoicing.)	2010-07-09 04:10:58 +00:00
Marc G. Fournier	1084f31770	tag beta3	2010-07-09 02:43:12 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Heikki Linnakangas	eb81b6509f	The previous fix in CVS HEAD and 8.4 for handling the case where a cursor being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane pointed out. The bug affects FOR statement variants too, because you can close an implicitly created cursor too by guessing the "<unnamed portal X>" name created for it. To fix that, "pin" the portal to prevent it from being dropped while it's being used in a PL/pgSQL FOR loop. Backpatch all the way to 7.4 which is the oldest supported version.	2010-07-05 09:27:18 +00:00
Tom Lane	e76c1a0f4d	Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.	2010-07-03 20:43:58 +00:00
Robert Haas	b3b7d603fb	Allow REASSIGNED OWNED to handle opclasses and opfamilies. Backpatch to 8.3, which is as far back as we have opfamilies. The opclass portion could probably be backpatched to 8.2, when REASSIGN OWNED was added, but for now I have not done that. Asko Tiidumaa, with minor adjustments by me.	2010-07-03 13:53:13 +00:00
Peter Eisentraut	a3401bea9c	Use different function names for plpython3 handlers, to avoid clashes in pg_pltemplate This should have a catversion bump, but it's still being debated whether it's worth it during beta.	2010-06-29 00:18:11 +00:00
Tom Lane	78e8f0025e	Clean up some randomness associated with trace_recovery_messages: don't put the variable declaration in the middle of a bunch of externs, and do use extern where it should be used.	2010-06-17 17:44:40 +00:00
Tom Lane	07e8b6aabc	Don't allow walsender to send WAL data until it's been safely fsync'd on the master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao	2010-06-17 16:41:25 +00:00
Itagaki Takahiro	147c665d01	Remove prototype of GetOldestWALSendPointer(), that is marked as NOT_USED.	2010-06-17 00:06:34 +00:00
Itagaki Takahiro	41f302b52a	Add new GUC categories corresponding to sections in docs, and move description for vacuum_defer_cleanup_age to the correct category. Sections in postgresql.conf are also sorted in the same order with docs. Per gripe by Fujii Masao, suggestion by Heikki Linnakangas, and patch by me.	2010-06-15 07:52:11 +00:00
Robert Haas	26b7abfa32	Fix ALTER LARGE OBJECT and GRANT ... ON LARGE OBJECT for large OIDs. The previous coding failed for OIDs too large to be represented by a signed integer.	2010-06-13 17:43:13 +00:00
Heikki Linnakangas	0a7cb85531	Make TriggerFile variable static. It's not used outside xlog.c. Fujii Masao	2010-06-10 07:49:23 +00:00
Marc G. Fournier	dcd52a64bd	tag 9.0beta2	2010-06-04 07:28:30 +00:00
Tom Lane	0cc59cc1f3	Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result) and current server clock time to SR data messages. These are not currently used on the slave side but seem likely to be useful in future, and it'd be better not to change the SR protocol after release. Per discussion. Also do some minor code review and cleanup on walsender.c, and improve the protocol documentation.	2010-06-03 22:17:32 +00:00
Alvaro Herrera	667440162c	Add comments about definitions that may affect PG_CONTROL_VERSION, per recent unintended-initdb-forcing fiasco	2010-06-03 20:37:13 +00:00
Tom Lane	34e543763c	Bump PG_CONTROL_VERSION to account for the incompatible change committed earlier.	2010-06-03 14:50:30 +00:00
Robert Haas	d561430b66	On clean shutdown during recovery, don't warn about possible corruption. Fujii Masao. Review by Heikki Linnakangas and myself.	2010-06-03 03:20:00 +00:00
Itagaki Takahiro	e54b0cba96	PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is __declspec (dllimport) on other compilers because cygwin and mingw don't like dllexport.	2010-05-28 16:34:15 +00:00
Tom Lane	c82d931dd1	Fix the volatility marking of textanycat() and anytextcat(): they were marked immutable, but that is wrong in general because the cast from the polymorphic argument to text could be stable or even volatile. Mark them volatile for safety. In the typical case where the cast isn't volatile, the planner will deduce the correct expression volatility after inlining the function, so performance is not lost. The just-committed fix in CREATE INDEX also ensures this won't break any indexing cases that ought to be allowed. Per discussion, I'm not bumping catversion for this change, as it doesn't seem critical enough to force an initdb on beta testers.	2010-05-27 16:20:11 +00:00
Itagaki Takahiro	77e50a61ff	Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT independently from BUILDING_DLL. It is always __declspec(dllexport).	2010-05-27 07:59:48 +00:00
Simon Riggs	f9dbac9476	HS Defer buffer pin deadlock check until deadlock_timeout has expired. During Hot Standby we need to check for buffer pin deadlocks when the Startup process begins to wait, in case it never wakes up again. We previously made the deadlock check immediately on the basis it was cheap, though clearer thinking and prima facie evidence shows that was too simple. Refactor existing code to make it easy to add in deferral of deadlock check until deadlock_timeout allowing a good reduction in deadlock checks since far few buffer pins are held for that duration. It's worth doing anyway, though major goal is to prevent further reports of context switching with high numbers of users on occasional tests.	2010-05-26 19:52:52 +00:00
Michael Meskes	29259531c7	Replace self written 'long long int' configure test by standard 'AC_TYPE_LONG_LONG_INT' macro call.	2010-05-25 17:28:20 +00:00
Michael Meskes	555a02f910	Added a configure test for "long long" datatypes. So far this is only used in ecpg and replaces the old test that was kind of hackish.	2010-05-25 14:32:55 +00:00
Robert Haas	ea9968c331	Rename PM_RECOVERY_CONSISTENT and PMSIGNAL_RECOVERY_CONSISTENT. The new names PM_HOT_STANDBY and PMSIGNAL_BEGIN_HOT_STANDBY more accurately reflect their actual function.	2010-05-15 20:01:32 +00:00
Tom Lane	c453569f0d	Spell __NetBSD__ the same way everywhere. Per Giles Lean.	2010-05-15 14:44:13 +00:00
Bruce Momjian	5b79fdadda	Use __bsdi__ consistently.	2010-05-15 10:14:20 +00:00
Tom Lane	382ff21203	Fix up lame idea of not using autoconf to determine if platform has scandir(). Should fix buildfarm failures.	2010-05-13 22:07:43 +00:00
Simon Riggs	8431e296ea	Cleanup initialization of Hot Standby. Clarify working with reanalysis of requirements and documentation on LogStandbySnapshot(). Fixes two minor bugs reported by Tom Lane that would lead to an incorrect snapshot after transaction wraparound. Also fix two other problems discovered that would give incorrect snapshots in certain cases. ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().	2010-05-13 11:15:38 +00:00
Robert Haas	dd6fcd35e3	Change typedef for rb_appendator to avoid conflict with C++ reserved words. Fixes a complaint from src/tools/pginclude/cpluspluscheck reported by Peter Eisentraut.	2010-05-11 18:14:01 +00:00
Marc G. Fournier	f9d9b2b34a	tag for 9.0beta1	2010-04-30 03:16:58 +00:00
Tom Lane	f0488bd57c	Rename the parameter recovery_connections to hot_standby, to reduce possible confusion with streaming-replication settings. Also, change its default value to "off", because of concern about executing new and poorly-tested code during ordinary non-replicating operation. Per discussion. In passing do some minor editing of related documentation.	2010-04-29 21:36:19 +00:00
Heikki Linnakangas	9b8a73326e	Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it's still used in the standby to enable/disable hot standby. Remove the WAL-logging of "unlogged operations", like creating an index without WAL-logging and fsyncing it at the end. Instead, we keep a copy of the wal_mode setting and the settings that affect how much shared memory a hot standby server needs to track master transactions (max_connections, max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings change, at server restart, write a WAL record noting the new settings and update pg_control. This allows us to notice the change in those settings in the standby at the right moment, they used to be included in checkpoint records, but that meant that a changed value was not reflected in the standby until the first checkpoint after the change. Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to the sequence it used to follow, before hot standby and subsequent patches changed it to 0x9003.	2010-04-28 16:10:43 +00:00
Bruce Momjian	75c5738177	Reorder pg_stat_activity columns to be more consistent, using layout suggested by Tom Lane. Catalog version bumped due to system view change.	2010-04-26 14:22:37 +00:00
Simon Riggs	90e04bab39	Patch revoked because of objections.	2010-04-24 16:20:32 +00:00
Robert Haas	33980a0640	Fix various instances of "the the". Two of these were pointed out by Erik Rijkers; the rest I found.	2010-04-23 23:21:44 +00:00
Simon Riggs	473af39737	Add missing optimizer hooks for function cost and number of rows. Closely follow design of other optimizer hooks: if hook exists retrieve value from plugin; if still not set then get from cache.	2010-04-23 22:23:39 +00:00
Simon Riggs	491d1ea5b3	Previous patch revoked following objections.	2010-04-23 20:21:31 +00:00
Simon Riggs	6ca23b1a29	Make CheckRequiredParameterValues() depend upon correct combination of parameters. Fix bug report by Robert Haas that error message and hint was incorrect if wrong mode parameters specified on master. Internal changes only. Proposals for parameter simplification on master/primary still under way.	2010-04-23 19:57:19 +00:00
Simon Riggs	bc2b85d904	Fix oversight in collecting values for cleanup_info records. vacuum_log_cleanup_info() now generates log records with a valid latestRemovedXid set in all cases. Also be careful not to zero the value when we do a round of vacuuming part-way through lazy_scan_heap(). Incidentally, this reduces frequency of conflicts in Hot Standby.	2010-04-21 17:20:56 +00:00
Tom Lane	ea46000a40	Arrange for client authentication to occur before we select a specific database to connect to. This is necessary for the walsender code to work properly (it was previously using an untenable assumption that template1 would always be available to connect to). This also gets rid of a small security shortcoming that was introduced in the original patch to eliminate the flat authentication files: before, you could find out whether or not the requested database existed even if you couldn't pass the authentication checks. The changes needed to support this are mainly just to treat pg_authid and pg_auth_members as nailed relations, so that we can read them without having to be able to locate real pg_class entries for them. This mechanism was already debugged for pg_database, but we hadn't recognized the value of applying it to those catalogs too. Since the current code doesn't have support for accessing toast tables before we've brought up all of the relcache, remove pg_authid's toast table to ensure that no one can store an out-of-line toasted value of rolpassword. The case seems quite unlikely to occur in practice, and was effectively unsupported anyway in the old "flatfiles" implementation. Update genbki.pl to actually implement the same rules as bootstrap.c does for not-nullability of catalog columns. The previous coding was a bit cheesy but worked all right for the previous set of bootstrap catalogs. It does not work for pg_authid, where rolvaliduntil needs to be nullable. Initdb forced due to minor catalog changes (mainly the toast table removal).	2010-04-20 23:48:47 +00:00
Robert Haas	481cb5d9b5	Rename standby_keep_segments to wal_keep_segments. Also, make the name of the GUC and the name of the backing variable match. Alnong the way, clean up a couple of slight typographical errors in the related docs.	2010-04-20 11:15:06 +00:00
Simon Riggs	cfac702223	Add new message for explicit rejection by pg_hba.conf. Implicit rejection retains same message as before.	2010-04-19 19:02:18 +00:00
Robert Haas	5b89ef384c	Add an 'enable_material' GUC. The logic for determining whether to materialize has been significantly overhauled for 9.0. In case there should be any doubt about whether materialization is a win in any particular case, this should provide a convenient way of seeing what happens without it; but even with enable_material turned off, we still materialize in cases where it is required for correctness. Thanks to Tom Lane for the review.	2010-04-19 00:55:26 +00:00
Simon Riggs	2847de9df2	Remove some additional changes in previous commit that belong elsewhere.	2010-04-18 18:17:12 +00:00
Simon Riggs	21d6a6a128	Tune GetSnapshotData() during Hot Standby by avoiding loop through normal backends. Makes code clearer also, since we avoid various Assert()s. Performance of snapshots taken during recovery no longer depends upon number of read-only backends.	2010-04-18 18:06:07 +00:00
Heikki Linnakangas	361bd1662e	Allow Hot Standby to begin from a shutdown checkpoint. Patch by Simon Riggs & me	2010-04-13 14:17:46 +00:00
Heikki Linnakangas	30556568f5	Update the location of last removed WAL segment in shared memory only after actually removing one, so that if we can't remove segments because WAL archiving is lagging behind, we don't unnecessarily forbid streaming the old not-yet-archived segments that are still perfectly valid. Per suggestion from Fujii Masao.	2010-04-12 10:40:43 +00:00
Heikki Linnakangas	e57cd7f0a1	Change the logic to decide when to delete old WAL segments, so that it doesn't take into account how far the WAL senders are. This way a hung WAL sender doesn't prevent old WAL segments from being recycled/removed in the primary, ultimately causing the disk to fill up. Instead add standby_keep_segments setting to control how many old WAL segments are kept in the primary. This also makes it more reliable to use streaming replication without WAL archiving, assuming that you set standby_keep_segments high enough.	2010-04-12 09:52:29 +00:00
Tom Lane	9029df17c4	Fix updateAclDependencies() to not assume that ACL role dependencies can only be added during GRANT and can only be removed during REVOKE; and fix its callers to not lie to it about the existing set of dependencies when instantiating a formerly-default ACL. The previous coding accidentally failed to malfunction so long as default ACLs contain only references to the object's owning role, because that role is ignored by updateAclDependencies. However this is obviously pretty fragile, as well as being an undocumented assumption. The new coding is a few lines longer but IMO much clearer.	2010-04-05 01:09:53 +00:00
Magnus Hagander	4c10623306	Update a number of broken links in comments. Josh Kupershmidt	2010-04-02 15:21:20 +00:00
Robert Haas	54943734f8	Refer to max_wal_senders in a more consistent fashion. The error message now makes explicit reference to the GUC that must be changed to fix the problem, using wording suggested by Tom Lane. Along the way, rename the GUC from MaxWalSenders to max_wal_senders for consistency and grep-ability.	2010-04-01 00:43:29 +00:00
Tom Lane	d174a4adbb	Fix "constraint_exclusion = partition" logic so that it will also attempt constraint exclusion on an inheritance set that is the target of an UPDATE or DELETE query. Per gripe from Marc Cousin. Back-patch to 8.4 where the feature was introduced.	2010-03-30 21:58:11 +00:00
Tom Lane	b78f6264eb	Rework join-removal logic as per recent discussion. In particular this fixes things so that it works for cases where nested removals are possible. The overhead of the optimization should be significantly less, as well.	2010-03-28 22:59:34 +00:00
Simon Riggs	a760893dbd	Derive latestRemovedXid for btree deletes by reading heap pages. The WAL record for btree delete contains a list of tids, even when backup blocks are present. We follow the tids to their heap tuples, taking care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the understanding that they will always have xmin/xmax earlier than any LP_NORMAL tuples referred to by killed index tuples. Iff all tuples are LP_DEAD we return InvalidTransactionId. The heap relfilenode is added to the WAL record, requiring API changes to pass down the heap Relation. XLOG_PAGE_MAGIC updated.	2010-03-28 09:27:02 +00:00
Alvaro Herrera	be8cebc717	Prevent ALTER USER f RESET ALL from removing the settings that were put there by a superuser -- "ALTER USER f RESET setting" already disallows removing such a setting. Apply the same treatment to ALTER DATABASE d RESET ALL when run by a database owner that's not superuser.	2010-03-25 14:44:34 +00:00
Simon Riggs	bf6285b3a7	Further corrections of mismatching struct and btree SizeOf macros. In this case, correction is to remove now unused fields from struct. Since these were unused and full of garbage anyway, no version change.	2010-03-20 07:49:48 +00:00
Tom Lane	865b29540e	Fix oversight in btpo.xact patch; it was in fact installing garbage in the xact field on replay, due to not writing out all the data in the wal log struct.	2010-03-19 20:51:30 +00:00
Simon Riggs	aa36bd2039	Update XLOG_PAGE_MAGIC to recognise WAL format changes.	2010-03-19 17:42:10 +00:00
Simon Riggs	3cdafe40e7	Adjust comment in .history file to match recovery target specified. Comment present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function. Bug report arising from internal support question.	2010-03-19 11:05:15 +00:00
Simon Riggs	5c73ae17d1	Reset btpo.xact following recovery of btree delete page. Add btpo_xact field into WAL record and reset it from there, rather than using FrozenTransactionId which can lead to some corner case bugs. Problem report and suggested route to a fix from Heikki, details by me.	2010-03-19 10:41:22 +00:00
Tom Lane	93324355eb	Pass incompletely-transformed aggregate argument lists as separate parameters to transformAggregateCall, instead of abusing fields in Aggref to carry them temporarily. No change in functionality but hopefully the code is a bit clearer now. Per gripe from Gokulakannan Somasundaram.	2010-03-17 16:52:38 +00:00
Bruce Momjian	a6c1cea2b7	Add libpq warning message if the .pgpass-retrieved password fails. Add ERRCODE_INVALID_PASSWORD sqlstate error code.	2010-03-13 14:55:57 +00:00
Tom Lane	4df5c6c719	Update comment for pg_constraint.conindid to mention that it's used for exclusion constraints. Not sure how we managed to update the comment for it in catalogs.sgml but miss this one.	2010-03-11 03:36:22 +00:00
Tom Lane	8bf14182cf	Export xml.c's libxml-error-handling support so that contrib/xml2 can use it too, instead of duplicating the functionality (badly). I renamed xml_init to pg_xml_init, because the former seemed just a bit too generic to be safe as a global symbol. I considered likewise renaming xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably made it sufficiently PG-centric already.	2010-03-03 17:29:45 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Tom Lane	11b5847058	Add an OR REPLACE option to CREATE LANGUAGE. This operates in the same way as other CREATE OR REPLACE commands, ie, it replaces everything but the ownership and ACL lists of an existing entry, and requires the caller to have owner privileges for that entry. While modifying an existing language has some use in development scenarios, in typical usage all the "replaced" values come from pg_pltemplate so there will be no actual change in the language definition. The reason for adding this is mainly to allow programs to ensure that a language exists without triggering an error if it already does exist. This commit just adds and documents the new option. A followon patch will use it to clean up some unpleasant cases in pg_dump and pg_regress.	2010-02-23 22:51:43 +00:00
Tom Lane	05d8a561ff	Clean up handling of XactReadOnly and RecoveryInProgress checks. Add some checks that seem logically necessary, in particular let's make real sure that HS slave sessions cannot create temp tables. (If they did they would think that temp tables belonging to the master's session with the same BackendId were theirs. We must not allow myTempNamespace to become set in a slave session.) Change setval() and nextval() so that they are only allowed on temp sequences in a read-only transaction. This seems consistent with what we allow for table modifications in read-only transactions. Since an HS slave can't have a temp sequence, this also provides a nicer cure for the setval PANIC reported by Erik Rijkers. Make the error messages more uniform, and have them mention the specific command being complained of. This seems worth the trifling amount of extra code, since people are likely to see such messages a lot more than before.	2010-02-20 21:24:02 +00:00
Peter Eisentraut	2f6cf9192c	Revert version stamping in wrong branch	2010-02-19 18:42:30 +00:00
Peter Eisentraut	a779afb40c	Version stamp 9.0alpha4	2010-02-19 16:03:22 +00:00
Heikki Linnakangas	ad458cfe81	Don't use O_DIRECT when writing WAL files if archiving or streaming is enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.	2010-02-19 10:51:04 +00:00
Tom Lane	50a90fac40	Stamp HEAD as 9.0devel, and update various places that were referring to 8.5 (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.	2010-02-17 04:19:41 +00:00
Tom Lane	d1e027221d	Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue. In addition, add support for a "payload" string to be passed along with each notify event. This implementation should be significantly more efficient than the old one, and is also more compatible with Hot Standby usage. There is not yet any facility for HS slaves to receive notifications generated on the master, although such a thing is possible in future. Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.	2010-02-16 22:34:57 +00:00
Andrew Dunstan	fc5173ad51	Add query text to auto_explain output. Still to be done: fix docs and fix regression failures under auto_explain.	2010-02-16 22:19:59 +00:00
Magnus Hagander	215cbc90f8	Add emulation of non-blocking sockets to the win32 socket/signal layer, and use this in pq_getbyte_if_available. It's only a limited implementation which swithes the whole emulation layer no non-blocking mode, but that's enough as long as non-blocking is only used during a short period of time, and only one socket is accessed during this time.	2010-02-16 19:26:02 +00:00
Greg Stark	f8c183a1ac	Speed up CREATE DATABASE by deferring the fsyncs until after copying all the data and using posix_fadvise to nudge the OS into flushing it earlier. This also hopefully makes CREATE DATABASE avoid spamming the cache. Tests show a big speedup on Linux at least on some filesystems. Idea and patch from Andres Freund.	2010-02-15 00:50:57 +00:00
Robert Haas	e26c539e9f	Wrap calls to SearchSysCache and related functions using macros. The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.	2010-02-14 18:42:19 +00:00
Tom Lane	7507b193bc	Don't expose the inline definition of MemoryContextSwitchTo when FRONTEND is defined. Its reference to CurrentMemoryContext causes link failures on some platforms, evidently because the inline function gets compiled despite lack of use. Per buildfarm member warthog.	2010-02-13 20:46:52 +00:00
Simon Riggs	dd428c79a4	Fix relcache init file invalidation during Hot Standby for the case where a database has a non-default tablespaceid. Pass thru MyDatabaseId and MyDatabaseTableSpace to allow file path to be re-created in standby and correct invalidation to take place in all cases. Update and rework xact_commit_desc() debug messages. Bug report from Tom by code inspection. Fix by me.	2010-02-13 16:15:48 +00:00
Tom Lane	e08ab7c312	Support inlining various small performance-critical functions on non-GCC compilers, by applying a configure check to see if the compiler will accept an unreferenced "static inline foo ..." function without warnings. It is believed that such warnings are the only reason not to declare inlined functions in headers, if the compiler understands "inline" at all. Kurt Harriman	2010-02-13 02:34:16 +00:00
Simon Riggs	b95a720a48	Re-enable max_standby_delay = -1 using deadlock detection on startup process. If startup waits on a buffer pin we send a request to all backends to cancel themselves if they are holding the buffer pin required and they are also waiting on a lock. If not, startup waits until max_standby_delay before cancelling any backend waiting for the requested buffer pin.	2010-02-13 01:32:20 +00:00
Simon Riggs	fafa374f2d	Introduce WAL records to log reuse of btree pages, allowing conflict resolution during Hot Standby. Page reuse interlock requested by Tom. Analysis and patch by me.	2010-02-13 00:59:58 +00:00
Tom Lane	ec4be2ee68	Extend the set of frame options supported for window functions. This patch allows the frame to start from CURRENT ROW (in either RANGE or ROWS mode), and it also adds support for ROWS n PRECEDING and ROWS n FOLLOWING start and end points. (RANGE value PRECEDING/FOLLOWING isn't there yet --- the grammar works, but that's all.) Hitoshi Harada, reviewed by Pavel Stehule	2010-02-12 17:33:21 +00:00
Teodor Sigaev	5209c084a6	Generic implementation of red-black binary tree. It's planned to use in several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.	2010-02-11 14:29:50 +00:00
Tom Lane	cbe9d6beb4	Fix up rickety handling of relation-truncation interlocks. Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.	2010-02-09 21:43:30 +00:00
Tom Lane	d5768dce10	Create an official API function for C functions to use to check if they are being called as aggregates, and to get the aggregate transition state memory context if needed. Use it instead of poking directly into AggState and WindowAggState in places that shouldn't know so much. We should have done this in 8.4, probably, but better late than never. Revised version of a patch by Hitoshi Harada.	2010-02-08 20:39:52 +00:00
Bruce Momjian	dfc902854a	Add C comments that HEAP_MOVED_* define usage is only for pre-9.0 binary upgrades.	2010-02-08 14:10:21 +00:00
Tom Lane	9a75803b1a	Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was needed by nothing else. The restructuring I just finished doing on cache management exposed to me how silly this routine was. Its function was to go into the catcache and blow away all entries related to a given relation when there was a relcache flush on that relation. However, there is no point in removing a catcache entry if the catalog row it represents is still valid --- and if it isn't valid, there must have been a catcache entry flush on it, because that's triggered directly by heap_update or heap_delete on the catalog row. So this routine accomplished nothing except to blow away valid cache entries that we'd very likely be wanting in the near future to help reconstruct the relcache entry. Dumb. On top of which, it required a subtle and easy-to-get-wrong attribute in syscache definitions, ie, the column containing the OID of the related relation if any. Removing that is a very useful maintenance simplification.	2010-02-08 05:53:55 +00:00
Tom Lane	0a469c8769	Remove old-style VACUUM FULL (which was known for a little while as VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.	2010-02-08 04:33:55 +00:00
Tom Lane	1ddc2703a9	Work around deadlock problems with VACUUM FULL/CLUSTER on system catalogs, as per my recent proposal. First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples to commit unless the index build is checking uniqueness/exclusion constraints. If it isn't, there's no harm in just indexing the in-doubt tuple. Second, modify VACUUM FULL/CLUSTER to suppress reverifying uniqueness/exclusion constraint properties while rebuilding indexes of the target relation. This is reasonable because these commands aren't meant to deal with corrupted-data situations. Constraint properties will still be rechecked when an index is rebuilt by a REINDEX command. This gets us out of the problem that new-style VACUUM FULL would often wait for other transactions while holding exclusive lock on a system catalog, leading to probable deadlock because those other transactions need to look at the catalogs too. Although the real ultimate cause of the problem is a debatable choice to release locks early after modifying system catalogs, changing that choice would require pretty serious analysis and is not something to be undertaken lightly or on a tight schedule. The present patch fixes the problem in a fairly reasonable way and should also improve the speed of VACUUM FULL/CLUSTER a little bit.	2010-02-07 22:40:33 +00:00
Tom Lane	b9b8831ad6	Create a "relation mapping" infrastructure to support changing the relfilenodes of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.	2010-02-07 20:48:13 +00:00
Tom Lane	9727c583fe	Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping of old and new toast tables can be done either at the logical level (by swapping the heaps' reltoastrelid links) or at the physical level (by swapping the relfilenodes of the toast tables and their indexes). This is necessary infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared system catalogs, where we cannot change reltoastrelid. The physical swap saves a few catalog updates too. We unfortunately have to keep the logical-level swap logic because in some cases we will be adding or deleting a toast table, so there's no possibility of a physical swap. However, that only happens as a consequence of schema changes in the table, which we do not need to support for system catalogs, so such cases aren't an obstacle for that. In passing, refactor the cluster support functions a little bit to eliminate unnecessarily-duplicated code; and fix the problem that while CLUSTER had been taught to rename the final toast table at need, ALTER TABLE had not.	2010-02-04 00:09:14 +00:00
Heikki Linnakangas	808969d0e7	Add a message type header to the CopyData messages sent from primary to standby in streaming replication. While we only have one message type at the moment, adding a message type header makes this easier to extend.	2010-02-03 09:47:19 +00:00
Tom Lane	70a2b05a59	Assorted cleanups in preparation for using a map file to support altering the relfilenode of currently-not-relocatable system catalogs. 1. Get rid of inval.c's dependency on relfilenode, by not having it emit smgr invalidations as a result of relcache flushes. Instead, smgr sinval messages are sent directly from smgr.c when an actual relation delete or truncate is done. This makes considerably more structural sense and allows elimination of a large number of useless smgr inval messages that were formerly sent even in cases where nothing was changing at the physical-relation level. Note that this reintroduces the concept of nontransactional inval messages, but that's okay --- because the messages are sent by smgr.c, they will be sent in Hot Standby slaves, just from a lower logical level than before. 2. Move setNewRelfilenode out of catalog/index.c, where it never logically belonged, into relcache.c; which is a somewhat debatable choice as well but better than before. (I considered catalog/storage.c, but that seemed too low level.) Rename to RelationSetNewRelfilenode. 3. Cosmetic cleanups of some other relfilenode manipulations.	2010-02-03 01:14:17 +00:00
Magnus Hagander	0a27347141	Make RADIUS authentication use pg_getaddrinfo_all() to get address of the server. Gets rid of a fairly ugly hack for Solaris, and also provides hostname and IPV6 support.	2010-02-02 19:09:37 +00:00
Robert Haas	d8db6a6096	Fold FindConversion() into FindConversionByName() and remove ACL check. All callers of FindConversionByName() already do suitable permissions checking already apart from this function, but this is not just dead code removal: the unnecessary permissions check can actually lead to spurious failures - there's no reason why inability to execute the underlying function should prohibit renaming the conversion, for example. (The error messages in these cases were also rather poor: FindConversion would return InvalidOid, eventually leading to a complaint that the conversion "did not exist", which was not correct.) KaiGai Kohei	2010-02-02 18:52:33 +00:00
Robert Haas	63f9282f6e	Tighten integrity checks on ALTER TABLE ... ALTER COLUMN ... RENAME. When a column is renamed, we recursively rename the same column in all descendent tables. But if one of those tables also inherits that column from a table outside the inheritance hierarchy rooted at the named table, we must throw an error. The previous coding correctly prohibited the rename when the parent had inherited the column from elsewhere, but overlooked the case where the parent was OK but a child table also inherited the same column from a second, unrelated parent. For now, not backpatched due to lack of complaints from the field. KaiGai Kohei, with further changes by me. Reviewed by Bernd Helme and Tom Lane.	2010-02-01 19:28:56 +00:00
Robert Haas	42a8ab0a14	Augment EXPLAIN output with more details on Hash nodes. We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.	2010-02-01 15:43:36 +00:00
Simon Riggs	296578feb4	Revoke augmentation of WAL records for btree delete, per discussion.	2010-02-01 13:40:28 +00:00
Itagaki Takahiro	9ea9918e37	Add string_agg aggregate functions. The one argument version concatenates the input values into a string. The two argument version also does the same thing, but inserts delimiters between elements. Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.	2010-02-01 03:14:45 +00:00
Simon Riggs	c85c941470	Detect early deadlock in Hot Standby when Startup is already waiting. First stage of required deadlock detection to allow re-enabling max_standby_delay setting of -1, which is now essential in the absence of improved relation- specific conflict resoluton. Requested by Greg Stark et al.	2010-01-31 19:01:11 +00:00
Tom Lane	0c0203d0a9	Parenthesize this macro, just in case.	2010-01-31 17:35:46 +00:00
Simon Riggs	6d2bc0a6cf	Augment WAL records for btree delete with GetOldestXmin() to reduce false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on \| off, with default off allows measurement of performance impact to see whether it should be set on all the time.	2010-01-29 18:39:05 +00:00
Simon Riggs	76be0c81cc	Filter recovery conflicts based upon dboid from relfilenode of WAL records for heap and btree. Minor change, mostly API changes to pass through the required values. This is a simple change though also provides the refactoring required for further enhancements to conflict processing using the relOid. Changes only have effect during Hot Standby.	2010-01-29 17:10:05 +00:00
Peter Eisentraut	e7b3349a8a	Type table feature This adds the CREATE TABLE name OF type command, per SQL standard.	2010-01-28 23:21:13 +00:00
Magnus Hagander	083e1b0f27	Add functions to reset the statistics counter for a single table/index or a single function.	2010-01-28 14:25:41 +00:00
Magnus Hagander	21d3ae09da	Define INADDR_NONE on Solaris when it's missing. Per a couple of buildfarm members complaining.	2010-01-28 11:36:14 +00:00
Heikki Linnakangas	e0e8b96345	Change a few remaining calls of XLogArchivingActive() to use XLogIsNeeded() instead, to determine if an otherwise non-logged operation needs to be logged in WAL for standby servers. Fujii Masao	2010-01-28 07:31:42 +00:00
Heikki Linnakangas	1bb2558046	Make standby server continuously retry restoring the next WAL segment with restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.	2010-01-27 15:27:51 +00:00
Magnus Hagander	b3daac5a9c	Add support for RADIUS authentication.	2010-01-27 12:12:00 +00:00
Tom Lane	d879697cd2	Remove the default_do_language parameter, instead making DO use a hardwired default of "plpgsql". This is more reasonable than it was when the DO patch was written, because we have since decided that plpgsql should be installed by default. Per discussion, having a parameter for this doesn't seem useful enough to justify the risk of application breakage if the value is changed unexpectedly.	2010-01-26 16:33:40 +00:00
Tom Lane	9507c8a1db	Add get_bit/set_bit functions for bit strings, paralleling those for bytea, and implement OVERLAY() for bit strings and bytea. In passing also convert text OVERLAY() to a true built-in, instead of relying on a SQL function. Leonardo F, reviewed by Kevin Grittner	2010-01-25 20:55:32 +00:00
Simon Riggs	959ac58c04	In HS, Startup process sets SIGALRM when waiting for buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.	2010-01-23 16:37:12 +00:00
Robert Haas	d779199175	Fix several oversights in previous commit - attribute options patch. I failed to 'cvs add' the new files and also neglected to bump catversion.	2010-01-22 16:42:31 +00:00
Robert Haas	76a47c0e74	Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism. Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.	2010-01-22 16:40:19 +00:00
Peter Eisentraut	adb7764030	PL/Python DO handler Also cleaned up some redundancies between the primary error messages and the error context in PL/Python. Hannu Valtonen	2010-01-22 15:45:15 +00:00
Heikki Linnakangas	09b115f706	Write a WAL record whenever we perform an operation without WAL-logging that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.	2010-01-20 19:43:40 +00:00
Heikki Linnakangas	47ce95a7b9	Now that much of walreceiver has been pulled back into the postgres binary, revert PGDLLIMPORT decoration of global variables. I'm not sure if there's any real harm from unnecessary PGDLLIMPORTs, but these are all internal variables that external modules really shouldn't be messing with. ThisTimeLineID still needs PGDLLIMPORT.	2010-01-20 18:54:27 +00:00
Heikki Linnakangas	32bc08b1d4	Rethink the way walreceiver is linked into the backend. Instead than shoving walreceiver as whole into a dynamically loaded module, split the libpq-specific parts of it into dynamically loaded module and keep the rest in the main backend binary. Although Tom fixed the Windows compilation problems with the old walreceiver module already, this is a cleaner division of labour and makes the code more readable. There's also the prospect of adding new transport methods as pluggable modules in the future, which this patch makes easier, though for now the API between libpqwalreceiver and walreceiver process should be considered private. The libpq-specific module is now in src/backend/replication/libpqwalreceiver, and the part linked with postgres binary is in src/backend/replication/walreceiver.c.	2010-01-20 09:16:24 +00:00
Magnus Hagander	7e40cdc075	Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared statistics of the bgwriter. Greg Smith	2010-01-19 14:11:32 +00:00
Tom Lane	4f15699d70	Add pg_table_size() and pg_indexes_size() to provide more user-friendly wrappers around the pg_relation_size() function. Bernd Helmle, reviewed by Greg Smith	2010-01-19 05:50:18 +00:00
Tom Lane	9a915e596f	Improve the handling of SET CONSTRAINTS commands by having them search pg_constraint before searching pg_trigger. This allows saner handling of corner cases; in particular we now say "constraint is not deferrable" rather than "constraint does not exist" when the command is applied to a constraint that's inherently non-deferrable. Per a gripe several months ago from hubert depesz lubaczewski. To make this work without breaking user-defined constraint triggers, we have to add entries for them to pg_constraint. However, in return we can remove the pgconstrname column from pg_constraint, which represents a fairly sizable space savings. I also replaced the tgisconstraint column with tgisinternal; the old meaning of tgisconstraint can now be had by testing for nonzero tgconstraint, while there is no other way to get the old meaning of nonzero tgconstraint, namely that the trigger was internally generated rather than being user-created. In passing, fix an old misstatement in the docs and comments, namely that pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable. Actually, we mark RI action triggers as nondeferrable even when they belong to a nominally deferrable FK constraint. The SET CONSTRAINTS code now relies on that instead of hard-coding a list of exception OIDs.	2010-01-17 22:56:23 +00:00
Simon Riggs	a8ce974cdd	Teach standby conflict resolution to use SIGUSR1 Conflict reason is passed through directly to the backend, so we can take decisions about the effect of the conflict based upon the local state. No specific changes, as yet, though this prepares for later work. CancelVirtualTransaction() sends signals while holding ProcArrayLock. Introduce errdetail_abort() to give message detail explaining that the abort was caused by conflict processing. Remove CONFLICT_MODE states in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.	2010-01-16 10:05:59 +00:00
Tom Lane	c9dc53be77	Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants of the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified. Per buildfarm results.	2010-01-16 05:52:29 +00:00
Tom Lane	47a09eda89	PGDLLIMPORT-ize the remaining variables needed by walreceiver.	2010-01-16 00:04:41 +00:00
Tom Lane	08f8d478eb	Do parse analysis of an EXPLAIN's contained statement during the normal parse analysis phase, rather than at execution time. This makes parameter handling work the same as it does in ordinary plannable queries, and in particular fixes the incompatibility that Pavel pointed out with plpgsql's new handling of variable references. plancache.c gets a little bit grottier, but the alternatives seem worse.	2010-01-15 22:36:35 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00
Teodor Sigaev	4cbe473938	Add point_ops opclass for GiST.	2010-01-14 16:31:09 +00:00
Simon Riggs	e99767bc28	First part of refactoring of code for ResolveRecoveryConflict. Purposes of this are to centralise the conflict code to allow further change, as well as to allow passing through the full reason for the conflict through to the conflicting backends. Backend state alters how we can handle different types of conflict so this is now required. As originally suggested by Heikki, no longer optional.	2010-01-14 11:08:02 +00:00
Bruce Momjian	228170410d	Please tablespace directories in their own subdirectory so pg_migrator can upgrade clusters without renaming the tablespace directories. New directory structure format is, e.g.: $PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814	2010-01-12 02:42:52 +00:00
Tom Lane	6164b4bc19	Some trivial adjustments in comments for struct RelationData.	2010-01-10 22:19:17 +00:00
Simon Riggs	3bfcccc295	During Hot Standby, fix drop database when sessions idle. Previously we only cancelled sessions that were in-transaction. Simple fix is to just cancel all sessions without waiting. Doing it this way avoids complicating common code paths, which would not be worth the trouble to cover this rare case. Problem report and fix by Andres Freund, edited somewhat by me	2010-01-10 15:44:28 +00:00
Magnus Hagander	87091cb1f1	Create typedef pgsocket for storing socket descriptors. This silences some warnings on Win64. Not using the proper SOCKET datatype was actually wrong on Win32 as well, but didn't cause any warnings there. Also create define PGINVALID_SOCKET to indicate an invalid/non-existing socket, instead of using a hardcoded -1 value.	2010-01-10 14:16:08 +00:00
Robert Haas	84b6d5f359	Remove partial, broken support for NULL pointers when fetching attributes. Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to (void )0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.	2010-01-10 04:26:36 +00:00
Simon Riggs	42edbd16fb	During Hot Standby, set DatabasePath correctly during relcache init file deletion, so that we attempt to unlink the correct filepath. unlink() errors are ignorable there, so lack of a DatabasePath initialization step did not cause visible problems until a related bug showed up on Solaris. Code refactored from xact_redo_commit() to ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay shared invalidation messages for many databases, so we cannot SetDatabasePath() once as we do in normal backends. Read the databaseid from the shared invalidation messages, then set DatabasePath temporarily before calling RelationCacheInitFileInvalidate(). Problem report by Robert Treat, analysis and fix by me.	2010-01-09 16:49:27 +00:00
Itagaki Takahiro	83a5a338bf	pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows.	2010-01-08 00:48:56 +00:00
Tom Lane	50626efe0a	Fix 3-parameter form of bit substring() to throw error for negative length, as required by SQL standard.	2010-01-07 20:17:44 +00:00
Tom Lane	901be0fad4	Remove all the special-case code for INT64_IS_BUSTED, per decision that we're not going to support that anymore. I did keep the 64-bit-CRC-with-32-bit-arithmetic code, since it has a performance excuse to live. It's a bit moot since that's all ifdef'd out, of course.	2010-01-07 04:53:35 +00:00
Robert Haas	814c8a03ba	Further fixes for per-tablespace options patch. Add missing varlena header to TableSpaceOpts structure. And, per Tom Lane, instead of calling tablespace_reloptions in CacheMemoryContext, call it in the caller's memory context and copy the value over afterwards, to reduce the chances of a session-lifetime memory leak.	2010-01-07 03:53:08 +00:00
Tom Lane	d15cb38dec	Alter the configure script to fail immediately if the C compiler does not provide a working 64-bit integer datatype. As recently noted, we've been broken on such platforms since early in the 8.4 development cycle. Since it took nearly two years for anyone to even notice, it seems that the rationale for continuing to support such platforms has reached the point of non-existence. Rather than thrashing around to try to make it work again, we'll just admit up front that this no longer works. Back-patch to 8.4 since that branch is also broken. We should go around to remove INT64_IS_BUSTED support, but just in HEAD, so that seems like material for a separate commit.	2010-01-07 00:25:05 +00:00
Itagaki Takahiro	946cf229e8	Support rewritten-based full vacuum as VACUUM FULL. Traditional VACUUM FULL was renamed to VACUUM FULL INPLACE. Also added a new option -i, --inplace for vacuumdb to perform FULL INPLACE vacuuming. Since the new VACUUM FULL uses CLUSTER infrastructure, we cannot use it for system tables. VACUUM FULL for system tables always fall back into VACUUM FULL INPLACE silently. Itagaki Takahiro, reviewed by Jeff Davis and Simon Riggs.	2010-01-06 05:31:14 +00:00
Bruce Momjian	28f6cab61a	binary upgrade: Preserve relfilenodes for views and composite types --- even though we don't store data in, them, they do consume relfilenodes. Bump catalog version.	2010-01-06 05:18:18 +00:00
Bruce Momjian	2391396b3d	Update catalog version for recent relfilenode patch, so pg_migrator can identify the new API.	2010-01-06 03:07:24 +00:00
Bruce Momjian	f98fbc78c3	Preserve relfilenodes: Add support to pg_dump --binary-upgrade to preserve all relfilenodes, for use by pg_migrator.	2010-01-06 03:04:03 +00:00
Bruce Momjian	8cdb85b512	Remove tabs in SGML. Move OIDCHARS to proper include file.	2010-01-06 02:41:37 +00:00
Tom Lane	90f4c2d960	Add support for doing FULL JOIN ON FALSE. While this is really a rather peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.	2010-01-05 23:25:36 +00:00
Robert Haas	d86d51a958	Support ALTER TABLESPACE name SET/RESET ( tablespace_options ). This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.	2010-01-05 21:54:00 +00:00
Magnus Hagander	ce92f8b463	Use _mm_pause() for win64 spin_delay(), per note from Tsutomu Yamada.	2010-01-05 11:06:28 +00:00
Tom Lane	64737e9313	Get rid of the need for manual maintenance of the initial contents of pg_attribute, by having genbki.pl derive the information from the various catalog header files. This greatly simplifies modification of the "bootstrapped" catalogs. This patch finally kills genbki.sh and Gen_fmgrtab.sh; we now rely entirely on Perl scripts for those build steps. To avoid creating a Perl build dependency where there was not one before, the output files generated by these scripts are now treated as distprep targets, ie, they will be built and shipped in tarballs. But you will need a reasonably modern Perl (probably at least 5.6) if you want to build from a CVS pull. The changes to the MSVC build process are untested, and may well break --- we'll soon find out from the buildfarm. John Naylor, based on ideas from Robert Haas and others	2010-01-05 01:06:57 +00:00
Magnus Hagander	305e85b098	Add a Win64-specific spin_delay() function. We can't use the same as before, since MSVC on Win64 doesn't support inline assembly.	2010-01-04 17:10:24 +00:00
Heikki Linnakangas	06f82b2961	Write an end-of-backup WAL record at pg_stop_backup(), and wait for it at recovery instead of reading the backup history file. This is more robust, as it stops you from prematurely starting up an inconsisten cluster if the backup history file is lost for some reason, or if the base backup was never finished with pg_stop_backup(). This also paves the way for a simpler streaming replication patch, which doesn't need to care about backup history files anymore. The backup history file is still created and archived as before, but it's not used by the system anymore. It's just for informational purposes now. Bump PG_CONTROL_VERSION as the location of the backup startpoint is now written to a new field in pg_control, and catversion because initdb is required Original patch by Fujii Masao per Simon's idea, with further fixes by me.	2010-01-04 12:50:50 +00:00
Tom Lane	40608e7f94	When estimating the selectivity of an inequality "column > constant" or "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.	2010-01-04 02:44:40 +00:00
Magnus Hagander	c3705d8cc5	Make ssize_t 64-bit on Win64, for compatibility with for example plpython.	2010-01-02 22:47:37 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Magnus Hagander	8491998d3d	Set proper sizes for size_t and void* on 64-bit Windows builds. Tsutomu Yamada	2010-01-02 13:56:37 +00:00
Magnus Hagander	2de9a463ff	Support 64-bit shared memory when building on 64-bit Windows. Tsutomu Yamada	2010-01-02 12:18:45 +00:00
Tom Lane	7839d35991	Add an "argisrow" field to NullTest nodes, following a plan made way back in 8.2beta but never carried out. This avoids repetitive tests of whether the argument is of scalar or composite type. Also, be a bit more paranoid about composite arguments in some places where we previously weren't checking.	2010-01-01 23:03:10 +00:00
Tom Lane	29c4ad9829	Support "x IS NOT NULL" clauses as indexscan conditions. This turns out to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.	2010-01-01 21:53:49 +00:00
Tom Lane	85d02a6586	Redefine Datum as uintptr_t, instead of unsigned long. This is more in keeping with modern practice, and is a first step towards porting to Win64 (which has sizeof(pointer) > sizeof(long)). Tsutomu Yamada, Magnus Hagander, Tom Lane	2009-12-31 19:41:37 +00:00
Tom Lane	48c192c15e	Revise pgstat's tracking of tuple changes to improve the reliability of decisions about when to auto-analyze. The previous code depended on n_live_tuples + n_dead_tuples - last_anl_tuples, where all three of these numbers could be bad estimates from ANALYZE itself. Even worse, in the presence of a steady flow of HOT updates and matching HOT-tuple reclamations, auto-analyze might never trigger at all, even if all three numbers are exactly right, because n_dead_tuples could hold steady. To fix, replace last_anl_tuples with an accurately tracked count of the total number of committed tuple inserts + updates + deletes since the last ANALYZE on the table. This can still be compared to the same threshold as before, but it's much more trustworthy than the old computation. Tracking this requires one more intra-transaction counter per modified table within backends, but no additional memory space in the stats collector. There probably isn't any measurable speed difference; if anything it might be a bit faster than before, since I was able to eliminate some per-tuple arithmetic operations in favor of adding sums once per (sub)transaction. Also, simplify the logic around pgstat vacuum and analyze reporting messages by not trying to fold VACUUM ANALYZE into a single pgstat message. The original thought behind this patch was to allow scheduling of analyzes on parent tables by artificially inflating their changes_since_analyze count. I've left that for a separate patch since this change seems to stand on its own merit.	2009-12-30 20:32:14 +00:00
Tom Lane	540e69a061	Add an index on pg_inherits.inhparent, and use it to avoid seqscans in find_inheritance_children(). This is a complete no-op in databases without any inheritance. In databases where there are just a few entries in pg_inherits, it could conceivably be a small loss. However, in databases with many inheritance parents, it can be a big win.	2009-12-29 22:00:14 +00:00
Tom Lane	649b5ec7c8	Add the ability to store inheritance-tree statistics in pg_statistic, and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.	2009-12-29 20:11:45 +00:00
Bruce Momjian	e5b457c2ac	Add backend and pg_dump code to allow preservation of pg_enum oids, for use in binary upgrades. Bump catalog version for detection by pg_migrator of new backend API.	2009-12-27 14:50:46 +00:00
Bruce Momjian	c44327afa4	Binary upgrade: Modify pg_dump --binary-upgrade and add backend support routines to support the preservation of pg_type oids when doing a binary upgrade. This allows user-defined composite types and arrays to be binary upgraded.	2009-12-24 22:09:24 +00:00
Tom Lane	d68e08d1fe	Allow the index name to be omitted in CREATE INDEX, causing the system to choose an index name the same as it would do for an unnamed index constraint. (My recent changes to the index naming logic have helped to ensure that this will be a reasonable choice.) Per a suggestion from Peter. A necessary side-effect is to promote CONCURRENTLY to type_func_name_keyword status, ie, it can't be a table/column/index name anymore unless quoted. This is not all bad, since we have heard more than once of people typing CREATE INDEX CONCURRENTLY ON foo (...) and getting a normal index build of an index named "concurrently", which was not what they wanted. Now this syntax will result in a concurrent build of an index with system-chosen name; which they can rename afterwards if they want something else.	2009-12-23 17:41:45 +00:00
Tom Lane	cfc5008a51	Adjust naming of indexes and their columns per recent discussion. Index expression columns are now named after the FigureColname result for their expressions, rather than always being "pg_expression_N". Digits are appended to this name if needed to make the column name unique within the index. (That happens for regular columns too, thus fixing the old problem that CREATE INDEX fooi ON foo (f1, f1) fails. Before exclusion indexes there was no real reason to do such a thing, but now maybe there is.) Default names for indexes and associated constraints now include the column names of all their columns, not only the first one as in previous practice. (Of course, this will be truncated as needed to fit in NAMEDATALEN. Also, pkey indexes retain the historical behavior of not naming specific columns at all.) An example of the results: regression=# create table foo (f1 int, f2 text, regression(# exclude (f1 with =, lower(f2) with =)); NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_f1_lower_exclusion" for table "foo" CREATE TABLE regression=# \d foo_f1_lower_exclusion Index "public.foo_f1_lower_exclusion" Column \| Type \| Definition --------+---------+------------ f1 \| integer \| f1 lower \| text \| lower(f2) btree, for table "public.foo"	2009-12-23 02:35:25 +00:00
Tom Lane	4fca795de4	Bump catversion to reflect the fact that HS patch changed pg_proc contents, and PG_CONTROL_VERSION to reflect the fact that it changed pg_control contents. (I see we did at least remember to change XLOG_PAGE_MAGIC for the WAL contents changes.)	2009-12-19 04:08:32 +00:00
Simon Riggs	efc16ea520	Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.	2009-12-19 01:32:45 +00:00
Peter Eisentraut	d6de43099a	Don't unblock SIGQUIT in the SIGQUIT handler This was possibly linked to a deadlock-like situation in glibc syslog code invoked by the ereport call in quickdie(). In any case, a signal handler should not unblock its own signal unless there is a specific reason to.	2009-12-16 23:05:00 +00:00
Peter Eisentraut	b63b967a7e	If there is no sigdelset(), define it as a macro. This removes some duplicate code that recreated the identical workaround when the newer signal API is missing.	2009-12-16 22:55:34 +00:00
Peter Eisentraut	dd4cd55c15	Python 3 support in PL/Python Behaves more or less unchanged compared to Python 2, but the new language variant is called plpython3u. Documentation describing the naming scheme is included.	2009-12-15 22:59:55 +00:00
Tom Lane	a5495cd841	Add a hook to let loadable modules get control at ProcessUtility execution, and use it to extend contrib/pg_stat_statements to track utility commands. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 20:04:49 +00:00
Tom Lane	34d26872ed	Support ORDER BY within aggregate function calls, at long last providing a non-kluge method for controlling the order in which values are fed to an aggregate function. At the same time eliminate the old implementation restriction that DISTINCT was only supported for single-argument aggregates. Possibly release-notable behavioral change: formerly, agg(DISTINCT x) dropped null values of x unconditionally. Now, it does so only if the agg transition function is strict; otherwise nulls are treated as DISTINCT normally would, ie, you get one copy. Andrew Gierth, reviewed by Hitoshi Harada	2009-12-15 17:57:48 +00:00
Robert Haas	cddca5ec13	Add an EXPLAIN (BUFFERS) option to show buffer-usage statistics. This patch also removes buffer-usage statistics from the track_counts output, since this (or the global server statistics) is deemed to be a better interface to this information. Itagaki Takahiro, reviewed by Euler Taveira de Oliveira.	2009-12-15 04:57:48 +00:00
Tom Lane	a620d5005d	Fix a bug introduced when set-returning SQL functions were made inline-able: we have to cope with the possibility that the declared result rowtype contains dropped columns. This fails in 8.4, as per bug #5240. While at it, be more paranoid about inserting binary coercions when inlining. The pre-8.4 code did not really need to worry about that because it could not inline at all in any case where an added coercion could change the behavior of the function's statement. However, when inlining a SRF we allow sorting, grouping, and set-ops such as UNION. In these cases, modifying one of the targetlist entries that the sort/group/setop depends on could conceivably change the behavior of the function's statement --- so don't inline when such a case applies.	2009-12-14 02:15:54 +00:00
Magnus Hagander	0182d6f646	Allow LDAP authentication to operate in search+bind mode, meaning it does a search for the user in the directory first, and then binds with the DN found for this user. This allows for LDAP logins in scenarios where the DN of the user cannot be determined simply by prefix and suffix, such as the case where different users are located in different containers. The old way of authentication can be significantly faster, so it's kept as an option. Robert Fleming and Magnus Hagander	2009-12-12 21:35:21 +00:00
Robert Haas	02490d4692	Export ExplainBeginOutput() and ExplainEndOutput() for auto_explain. Without these functions, anyone outside of explain.c can't actually use ExplainPrintPlan, because the ExplainState won't be initialized properly. The user-visible result of this was a crash when using auto_explain with the JSON output format. Report by Euler Taveira de Oliveira. Analysis by Tom Lane. Patch by me.	2009-12-12 00:35:34 +00:00
Itagaki Takahiro	f1325ce213	Add large object access control. A new system catalog pg_largeobject_metadata manages ownership and access privileges of large objects. KaiGai Kohei, reviewed by Jaime Casanova.	2009-12-11 03:34:57 +00:00
Andrew Dunstan	324385d67f	Add YAML to list of EXPLAIN formats. Greg Sabino Mullane, reviewed by Takahiro Itagaki.	2009-12-11 01:33:35 +00:00
Tom Lane	62aba76568	Prevent indirect security attacks via changing session-local state within an allegedly immutable index function. It was previously recognized that we had to prevent such a function from executing SET/RESET ROLE/SESSION AUTHORIZATION, or it could trivially obtain the privileges of the session user. However, since there is in general no privilege checking for changes of session-local state, it is also possible for such a function to change settings in a way that might subvert later operations in the same session. Examples include changing search_path to cause an unexpected function to be called, or replacing an existing prepared statement with another one that will execute a function of the attacker's choosing. The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against these threats, which are the same places previously deemed to need protection against the SET ROLE issue. GUC changes are still allowed, since there are many useful cases for that, but we prevent security problems by forcing a rollback of any GUC change after completing the operation. Other cases are handled by throwing an error if any change is attempted; these include temp table creation, closing a cursor, and creating or deleting a prepared statement. (In 7.4, the infrastructure to roll back GUC changes doesn't exist, so we settle for rejecting changes of "search_path" in these contexts.) Original report and patch by Gurjeet Singh, additional analysis by Tom Lane. Security: CVE-2009-4136	2009-12-09 21:57:51 +00:00
Tom Lane	0cb65564e5	Add exclusion constraints, which generalize the concept of uniqueness to support any indexable commutative operator, not just equality. Two rows violate the exclusion constraint if "row1.col OP row2.col" is TRUE for each of the columns in the constraint. Jeff Davis, reviewed by Robert Haas	2009-12-07 05:22:23 +00:00
Tom Lane	8de7472b45	Don't use a duplicate OID for aclexplode().	2009-12-06 02:55:54 +00:00
Peter Eisentraut	36f887c41c	Speed up information schema privilege views Instead of expensive cross joins to resolve the ACL, add table-returning function aclexplode() that expands the ACL into a useful form, and join against that. Also, implement the role__grants views as a thin layer over the respective _privileges views instead of essentially repeating the same code twice. fixes bug #4596 by Joachim Wieland, with cleanup by me	2009-12-05 21:43:36 +00:00
Heikki Linnakangas	ab3148b712	Fix bug in temporary file management with subtransactions. A cursor opened in a subtransaction stays open even if the subtransaction is aborted, so any temporary files related to it must stay alive as well. With the patch, we use ResourceOwners to track open temporary files and don't automatically close them at subtransaction end (though in the normal case temporary files are registered with the subtransaction resource owner and will therefore be closed). At end of top transaction, we still check that there's no temporary files marked as close-at-end-of-transaction open, but that's now just a debugging cross-check as the resource owner cleanup should've closed them already.	2009-12-03 11:03:29 +00:00
Tom Lane	0d32342501	Teach the regular expression functions to do case-insensitive matching and locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.	2009-12-01 21:00:24 +00:00
Bruce Momjian	ef51395e24	Revert due to Tom's concerns: Add ProcessUtility_hook() to handle all DDL to contrib/pg_stat_statements.	2009-12-01 02:31:13 +00:00
Bruce Momjian	d85cb27293	ProcessUtility_hook: Add ProcessUtility_hook() to handle all DDL to contrib/pg_stat_statements. Itagaki Takahiro	2009-12-01 01:08:46 +00:00
Tom Lane	0c61cff57a	Make pg_stat_activity.application_name visible to all users, rather than being hidden when current_query is. Relocate it to a column position more consistent with that behavior. Per discussion.	2009-11-29 18:14:32 +00:00
Tom Lane	42b2907d12	Add support for anonymous code blocks (DO blocks) to PL/Perl. Joshua Tolley, reviewed by Brendan Jurd and Tim Bunce	2009-11-29 03:02:27 +00:00
Tom Lane	8217cfbd99	Add support for an application_name parameter, which is displayed in pg_stat_activity and recorded in log entries. Dave Page, reviewed by Andres Freund	2009-11-28 23:38:08 +00:00
Tom Lane	1a95f12702	Eliminate a lot of list-management overhead within join_search_one_level by adding a requirement that build_join_rel add new join RelOptInfos to the appropriate list immediately at creation. Per report from Robert Haas, the list_concat_unique_ptr() calls that this change eliminates were taking the lion's share of the runtime in larger join problems. This doesn't do anything to fix the fundamental combinatorial explosion in large join problems, but it should push out the threshold of pain a bit further. Note: because this changes the order in which joinrel lists are built, it might result in changes in selected plans in cases where different alternatives have exactly the same costs. There is one example in the regression tests.	2009-11-28 00:46:19 +00:00
Heikki Linnakangas	cd87b6f8a5	Fix an old bug in multixact and two-phase commit. Prepared transactions can be part of multixacts, so allocate a slot for each prepared transaction in the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer the oldest member value from the current backends slot to the prepared xact slot. Also save and recover the value from the 2pc state file. The symptom of the bug was that after a transaction prepared, a shared lock still held by the prepared transaction was sometimes ignored by other transactions. Fix back to 8.1, where both 2PC and multixact were introduced.	2009-11-23 09:58:36 +00:00
Tom Lane	7fc0f06221	Add a WHEN clause to CREATE TRIGGER, allowing a boolean expression to be checked to determine whether the trigger should be fired. For BEFORE triggers this is mostly a matter of spec compliance; but for AFTER triggers it can provide a noticeable performance improvement, since queuing of a deferred trigger event and re-fetching of the row(s) at end of statement can be short-circuited if the trigger does not need to be fired. Takahiro Itagaki, reviewed by KaiGai Kohei.	2009-11-20 20:38:12 +00:00
Tom Lane	c742b795dd	Add a hook to CREATE/ALTER ROLE to allow an external module to check the strength of database passwords, and create a sample implementation of such a hook as a new contrib module "passwordcheck". Laurenz Albe, reviewed by Takahiro Itagaki	2009-11-18 21:57:56 +00:00
Tom Lane	5e66a51c2e	Provide a parenthesized-options syntax for VACUUM, analogous to that recently adopted for EXPLAIN. This will allow additional options to be implemented in future without having to make them fully-reserved keywords. The old syntax remains available for existing options, however. Itagaki Takahiro	2009-11-16 21:32:07 +00:00
Tom Lane	caf9c830d9	Improve planning of Materialize nodes inserted atop the inner input of a mergejoin to shield it from doing mark/restore and refetches. Put an explicit flag in MergePath so we can centralize the logic that knows about this, and add costing logic that considers using Materialize even when it's not forced by the previously-existing considerations. This is in response to a discussion back in August that suggested that materializing an inner indexscan can be helpful when the refetch percentage is high enough.	2009-11-15 02:45:35 +00:00
Magnus Hagander	da8d684d39	Add inheritable ACE when creating a restricted token for execution on Win32. Also refactor the code around it to be more clear. Jesse Morris	2009-11-14 15:39:36 +00:00
Tom Lane	82121aff12	Avoid assuming that enum CreateStmtLikeOption is unsigned. Zdenek Kotala	2009-11-13 23:44:19 +00:00
Tom Lane	19d802767d	Remove pg_parse_string_token() --- not needed anymore.	2009-11-12 01:13:12 +00:00
Alvaro Herrera	e7ec022266	Fix longstanding problems in VACUUM caused by untimely interruptions In VACUUM FULL, an interrupt after the initial transaction has been recorded as committed can cause postmaster to restart with the following error message: PANIC: cannot abort transaction NNNN, it was already committed This problem has been reported many times. In lazy VACUUM, an interrupt after the table has been truncated by lazy_truncate_heap causes other backends' relcache to still point to the removed pages; this can cause future INSERT and UPDATE queries to error out with the following error message: could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes The window to this race condition is extremely narrow, but it has been seen in the wild involving a cancelled autovacuum process. The solution for both problems is to inhibit interrupts in both operations until after the respective transactions have been committed. It's not a complete solution, because the transaction could theoretically be aborted by some other error, but at least fixes the most common causes of both problems.	2009-11-10 18:00:06 +00:00
Tom Lane	10bcfa189b	Re-refactor the core scanner's API, in order to get out from under the problem of different parsers having different YYSTYPE unions that they want to use with it. I defined a new union core_YYSTYPE that is just the (very short) list of semantic values returned by the core scanner. I had originally worried that this would require an extra interface layer, but actually we can have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at no extra cost. Names associated with the core scanner are now "core_yy_foo", with "base_yy_foo" being used in the core Bison parser and the parser.c interface layer. This solves the last serious stumbling block to eliminating plpgsql's separate lexer. One restriction that will still be present is that plpgsql and the core will have to agree on the token numbers assigned to tokens that can be returned by the core lexer. Since Bison doesn't seem willing to accept external assignments of those numbers, we'll have to live with decreeing that core and plpgsql grammars declare these tokens first and in the same order.	2009-11-09 18:38:48 +00:00
Andrew Dunstan	b79f49c780	Keep track of language's trusted flag in InlineCodeBlock. Needed to support DO blocks for languages that have both trusted and untrusted variants.	2009-11-06 21:57:57 +00:00
Tom Lane	593f4b854a	Don't treat NEW and OLD as reserved words anymore. For the purposes of rules it works just as well to have them be ordinary identifiers, and this gets rid of a number of ugly special cases. Plus we aren't interfering with non-rule usage of these names. catversion bump because the names change internally in stored rules.	2009-11-05 23:24:27 +00:00
Tom Lane	6bef82b38a	Rename some encoding conversion modules to keep pathnames in our source tarballs under 100 characters. This should avoid failures with certain untarring tools (WinZip and Midnight Commander have been mentioned as likely suspects). Per my proposal of yesterday. catversion bumped since the initial contents of pg_proc change.	2009-11-04 23:47:04 +00:00
Tom Lane	9bedd128d6	Add support for invoking parser callback hooks via SPI and in cached plans. As proof of concept, modify plpgsql to use the hooks. plpgsql is still inserting $n symbols textually, but the "back end" of the parsing process now goes through the ParamRef hook instead of using a fixed parameter-type array, and then execution only fetches actually-referenced parameters, using a hook added to ParamListInfo. Although there's a lot left to be done in plpgsql, this already cures the "if (TG_OP = 'INSERT' and NEW.foo ...)" problem, as illustrated by the changed regression test.	2009-11-04 22:26:08 +00:00
Tom Lane	7d535ebe5b	Dept of second thoughts: after studying index_getnext() a bit more I realize that it can scribble on scan->xs_ctup.t_self while following HOT chains, so we can't rely on that to stay valid between hashgettuple() calls. Introduce a private variable in HashScanOpaque, instead.	2009-11-01 22:30:54 +00:00
Tom Lane	c4afdca4c2	Fix two serious bugs introduced into hash indexes by the 8.4 patch that made hash indexes keep entries sorted by hash value. First, the original plans for concurrency assumed that insertions would happen only at the end of a page, which is no longer true; this could cause scans to transiently fail to find index entries in the presence of concurrent insertions. We can compensate by teaching scans to re-find their position after re-acquiring read locks. Second, neither the bucket split nor the bucket compaction logic had been fixed to preserve hashvalue ordering, so application of either of those processes could lead to permanent corruption of an index, in the sense that searches might fail to find entries that are present. This patch fixes the split and compaction logic to preserve hashvalue ordering, but it cannot do anything about pre-existing corruption. We will need to recommend reindexing all hash indexes in the 8.4.2 release notes. To buy back the performance loss hereby induced in split and compaction, fix them to use PageIndexMultiDelete instead of retail PageIndexDelete operations. We might later want to do something with qsort'ing the page contents rather than doing a binary search for each insertion, but that seemed more invasive than I cared to risk in a back-patch. Per bug #5157 from Jeff Janes and subsequent investigation.	2009-11-01 21:25:25 +00:00
Tom Lane	fb5d05805b	Implement parser hooks for processing ColumnRef and ParamRef nodes, as per my recent proposal. As proof of concept, remove knowledge of Params from the core parser, arranging for them to be handled entirely by parser hook functions. It turns out we need an additional hook for that --- I had forgotten about the code that handles inferring a parameter's type from context. This is a preliminary step towards letting plpgsql handle its variables through parser hooks. Additional work remains to be done to expose the facility through SPI, but I think this is all the changes needed in the core parser.	2009-10-31 01:41:31 +00:00
Tom Lane	cbcd1701f1	Fix AcquireRewriteLocks to be sure that it acquires the right lock strength when FOR UPDATE is propagated down into a sub-select expanded from a view. Similar bug to parser's isLockedRel issue that I fixed yesterday; likewise seems not quite worth the effort to back-patch.	2009-10-28 17:36:50 +00:00
Tom Lane	46e3a16b05	When FOR UPDATE/SHARE is used with LIMIT, put the LockRows plan node underneath the Limit node, not atop it. This fixes the old problem that such a query might unexpectedly return fewer rows than the LIMIT says, due to LockRows discarding updated rows. There is a related problem that LockRows might destroy the sort ordering produced by earlier steps; but fixing that by pushing LockRows below Sort would create serious performance problems that are unjustified in many real-world applications, as well as potential deadlock problems from locking many more rows than expected. Instead, keep the present semantics of applying FOR UPDATE after ORDER BY within a single query level; but allow the user to specify the other way by writing FOR UPDATE in a sub-select. To make that work, track whether FOR UPDATE appeared explicitly in sub-selects or got pushed down from the parent, and don't flatten a sub-select that contained an explicit FOR UPDATE.	2009-10-28 14:55:47 +00:00
Tom Lane	61e5328208	Make FOR UPDATE/SHARE in the primary query not propagate into WITH queries; for example in WITH w AS (SELECT * FROM foo) SELECT * FROM w, bar ... FOR UPDATE the FOR UPDATE will now affect bar but not foo. This is more useful and consistent than the original 8.4 behavior, which tried to propagate FOR UPDATE into the WITH query but always failed due to assorted implementation restrictions. Even though we are in process of removing those restrictions, it seems correct on philosophical grounds to not let the outer query's FOR UPDATE affect the WITH query. In passing, fix isLockedRel which frequently got things wrong in nested-subquery cases: "FOR UPDATE OF foo" applies to an alias foo in the current query level, not subqueries. This has been broken for a long time, but it doesn't seem worth back-patching further than 8.4 because the actual consequences are minimal. At worst the parser would sometimes get RowShareLock on a relation when it should be AccessShareLock or vice versa. That would only make a difference if someone were using ExclusiveLock concurrently, which no standard operation does, and anyway FOR UPDATE doesn't result in visible changes so it's not clear that the someone would notice any problem. Between that and the fact that FOR UPDATE barely works with subqueries at all in existing releases, I'm not excited about worrying about it.	2009-10-27 17:11:18 +00:00
Heikki Linnakangas	2078e384a3	Fix range check in date_recv that tried to limit accepted values to only those accepted by date_in(). I confused julian day numbers and number of days since the postgres epoch 2000-01-01 in the original patch. I just noticed that it's still easy to get such out-of-range values into the database using to_date or +- operators, but this patch doesn't do anything about those functions. Per report from James Pye.	2009-10-26 16:13:11 +00:00
Tom Lane	9f2ee8f287	Re-implement EvalPlanQual processing to improve its performance and eliminate a lot of strange behaviors that occurred in join cases. We now identify the "current" row for every joined relation in UPDATE, DELETE, and SELECT FOR UPDATE/SHARE queries. If an EvalPlanQual recheck is necessary, we jam the appropriate row into each scan node in the rechecking plan, forcing it to emit only that one row. The former behavior could rescan the whole of each joined relation for each recheck, which was terrible for performance, and what's much worse could result in duplicated output tuples. Also, the original implementation of EvalPlanQual could not re-use the recheck execution tree --- it had to go through a full executor init and shutdown for every row to be tested. To avoid this overhead, I've associated a special runtime Param with each LockRows or ModifyTable plan node, and arranged to make every scan node below such a node depend on that Param. Thus, by signaling a change in that Param, the EPQ machinery can just rescan the already-built test plan. This patch also adds a prohibition on set-returning functions in the targetlist of SELECT FOR UPDATE/SHARE. This is needed to avoid the duplicate-output-tuple problem. It seems fairly reasonable since the other restrictions on SELECT FOR UPDATE are meant to ensure that there is a unique correspondence between source tuples and result tuples, which an output SRF destroys as much as anything else does.	2009-10-26 02:26:45 +00:00
Tom Lane	ab61df9e52	Remove regex_flavor GUC, so that regular expressions are always "advanced" style by default. Per discussion, there seems to be hardly anything that really relies on being able to change the regex flavor, so the ability to select it via embedded options ought to be enough for any stragglers. Also, if we didn't remove the GUC, we'd really be morally obligated to mark the regex functions non-immutable, which'd possibly create performance issues.	2009-10-21 20:38:58 +00:00
Tom Lane	289e2905c8	Remove add_missing_from GUC and associated parser support for "implicit RTEs". Per recent discussion, add_missing_from has been deprecated for long enough to consider removing, and it's getting in the way of planned parser refactoring. The system now always behaves as though add_missing_from were OFF.	2009-10-21 20:22:38 +00:00
Magnus Hagander	748771379b	Write to the Windows eventlog in UTF16, converting the message encoding as necessary. Itagaki Takahiro with some changes from me	2009-10-17 00:24:51 +00:00
Tom Lane	b2734a0d79	Support SQL-compliant triggers on columns, ie fire only if certain columns are named in the UPDATE's SET list. Note: the schema of pg_trigger has not actually changed; we've just started to use a column that was there all along. catversion bumped anyway so that this commit is included in the history of potentially interesting changes to system catalog contents. Itagaki Takahiro	2009-10-14 22:14:25 +00:00
Alvaro Herrera	201e5b282b	Add new PGC_S_DATABASE_USER enum value to several places missed by my patch last week. Per note and patch from Jeff Davis.	2009-10-13 14:18:40 +00:00
Tom Lane	8d54c2482b	Code review for LIKE INCLUDING patch --- clean up some cosmetic and not so cosmetic stuff.	2009-10-13 00:53:08 +00:00
Tom Lane	11ca04b4b7	Support GRANT/REVOKE ON ALL TABLES/SEQUENCES/FUNCTIONS IN SCHEMA. Petr Jelinek	2009-10-12 20:39:42 +00:00
Andrew Dunstan	faa1afc6c1	CREATE LIKE INCLUDING COMMENTS and STORAGE, and INCLUDING ALL shortcut. Itagaki Takahiro.	2009-10-12 19:49:24 +00:00
Tom Lane	0adaf4cb31	Move the handling of SELECT FOR UPDATE locking and rechecking out of execMain.c and into a new plan node type LockRows. Like the recent change to put table updating into a ModifyTable plan node, this increases planning flexibility by allowing the operations to occur below the top level of the plan tree. It's necessary in any case to restore the previous behavior of having FOR UPDATE locking occur before ModifyTable does. This partially refactors EvalPlanQual to allow multiple rows-under-test to be inserted into the EPQ machinery before starting an EPQ test query. That isn't sufficient to fix EPQ's general bogosity in the face of plans that return multiple rows per test row, though. Since this patch is mostly about getting some plan node infrastructure in place and not about fixing ten-year-old bugs, I will leave EPQ improvements for another day. Another behavioral change that we could now think about is doing FOR UPDATE before LIMIT, but that too seems like it should be treated as a followon patch.	2009-10-12 18:10:51 +00:00
Tom Lane	8a5849b7ff	Split the processing of INSERT/UPDATE/DELETE operations out of execMain.c. They are now handled by a new plan node type called ModifyTable, which is placed at the top of the plan tree. In itself this change doesn't do much, except perhaps make the handling of RETURNING lists and inherited UPDATEs a tad less klugy. But it is necessary preparation for the intended extension of allowing RETURNING queries inside WITH. Marko Tiikkaja	2009-10-10 01:43:50 +00:00
Peter Eisentraut	b865d27582	Use pg_get_triggerdef in pg_dump Add a variant of pg_get_triggerdef with a second argument "pretty" that causes the output to be formatted in the way pg_dump used to do. Use this variant in pg_dump with server versions >= 8.5. This insulates pg_dump from most future trigger feature additions, such as the upcoming column triggers patch. Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>	2009-10-09 21:02:56 +00:00
Tom Lane	c970292a94	Remove very ancient tuple-counting infrastructure (IncrRetrieved() and friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.	2009-10-08 22:34:57 +00:00
Tom Lane	717fa274d1	Support use of function argument names to identify which actual arguments match which function parameters. The syntax uses AS, for example funcname(value AS arg1, anothervalue AS arg2) Pavel Stehule	2009-10-08 02:39:25 +00:00
Alvaro Herrera	2eda8dfb52	Make it possibly to specify GUC params per user and per database. Create a new catalog pg_db_role_setting where they are now stored, and better encapsulate the code that deals with settings into its realm. The old datconfig and rolconfig columns are removed. psql has gained a \drds command to display the settings. Backwards compatibility warning: while the backwards-compatible system views still have the config columns, they no longer completely represent the configuration for a user or database. Catalog version bumped.	2009-10-07 22:14:26 +00:00
Alvaro Herrera	07cefdfb7a	Fix snapshot management, take two. Partially revert the previous patch I installed and replace it with a more general fix: any time a snapshot is pushed as Active, we need to ensure that it will not be modified in the future. This means that if the same snapshot is used as CurrentSnapshot, it needs to be copied separately. This affects serializable transactions only, because CurrentSnapshot has already been copied by RegisterSnapshot and so PushActiveSnapshot does not think it needs another copy. However, CommandCounterIncrement would modify CurrentSnapshot, whereas ActiveSnapshots must not have their command counters incremented. I say "partially" because the regression test I added for the previous bug has been kept. (This restores 8.3 behavior, because before snapmgr.c existed, any snapshot set as Active was copied.) Per bug report from Stuart Bishop in 6bc73d4c0910042358k3d1adff3qa36f8df75198ecea@mail.gmail.com	2009-10-07 16:27:18 +00:00
Tom Lane	e0c433c4a3	Change CREATE TABLE so that column default expressions coming from different inheritance parent tables are compared using equal(), instead of doing strcmp() on the nodeToString representation. The old implementation was always a tad cheesy, and it finally fails completely as of 8.4, now that the node tree might contain syntax location information. equal() knows it's supposed to ignore those fields, but strcmp() hardly can. Per recent report from Scott Ribe.	2009-10-06 00:55:26 +00:00
Tom Lane	249724cb01	Create an ALTER DEFAULT PRIVILEGES command, which allows users to adjust the privileges that will be applied to subsequently-created objects. Such adjustments are always per owning role, and can be restricted to objects created in particular schemas too. A notable benefit is that users can override the traditional default privilege settings, eg, the PUBLIC EXECUTE privilege traditionally granted by default for functions. Petr Jelinek	2009-10-05 19:24:49 +00:00
Tom Lane	54d60bbd07	Fix a couple of issues in recent patch to print updates to postgresql.conf settings: avoid calling superuser() in contexts where it's not defined, don't leak the transient copies of GetConfigOption output, and avoid the whole exercise in postmaster child processes. I found that actually no current caller of GetConfigOption has any use for its internal check of GUC_SUPERUSER_ONLY. But rather than just remove that entirely, it seemed better to add a parameter indicating whether to enforce the check. Per report from Simon and subsequent testing.	2009-10-03 18:04:57 +00:00
Tom Lane	e66d714386	Make sure that GIN fast-insert and regular code paths enforce the same tuple size limit. Improve the error message for index-tuple-too-large so that it includes the actual size, the limit, and the index name. Sync with the btree occurrences of the same error. Back-patch to 8.4 because it appears that the out-of-sync problem is occurring in the field. Teodor and Tom	2009-10-02 21:14:04 +00:00
Alvaro Herrera	caa4cfa369	Ensure that a cursor has an immutable snapshot throughout its lifespan. The old coding was using a regular snapshot, referenced elsewhere, that was subject to having its command counter updated. Fix by creating a private copy of the snapshot exclusively for the cursor. Backpatch to 8.4, which is when the bug was introduced during the snapshot management rewrite.	2009-10-02 17:57:30 +00:00
Tom Lane	f3aec2c7f5	Support "samehost" and "samenet" specifications in pg_hba.conf, by enumerating the machine's IP interfaces to look for a match. Stef Walter	2009-10-01 01:58:58 +00:00
Tom Lane	421d7d8edb	Remove no-longer-needed ExecCountSlots infrastructure.	2009-09-27 21:10:53 +00:00
Tom Lane	f92e8a4b5e	Replace the array-style TupleTable data structure with a simple List of TupleTableSlot nodes. This eliminates the need to count in advance how many Slots will be needed, which seems more than worth the small increase in the amount of palloc traffic during executor startup. The ExecCountSlots infrastructure is now all dead code, but I'll remove it in a separate commit for clarity. Per a comment from Robert Haas.	2009-09-27 20:09:58 +00:00
Tom Lane	12d8fae4cd	Simplify the bootstrap (BKI) code by getting rid of a useless table of all the strings seen during the bootstrap run. There might have been some actual point to doing that, many years ago, but as far as I can see the only value now is to conserve a bit of memory. Even if we cared about wasting a megabyte or so during the initdb run, it'd be far more effective to arrange to release memory at the end of each BKI command, instead of intentionally hanging onto strings that might never be used again. Not maintaining the table probably makes it faster too; but the main point of this patch is to get rid of a couple hundred lines of unnecessary and rather crufty code.	2009-09-27 01:32:11 +00:00
Tom Lane	4985635230	Extend the BKI infrastructure to allow system catalogs to be given hand-assigned rowtype OIDs, even when they are not "bootstrapped" catalogs that have handmade type rows in pg_type.h. Give pg_database such an OID. Restore the availability of C macros for the rowtype OIDs of the bootstrapped catalogs. (These macros are now in the individual catalogs' .h files, though, not in pg_type.h.) This commit doesn't do anything especially useful by itself, but it's necessary infrastructure for reverting some ill-considered changes in relcache.c.	2009-09-26 22:42:03 +00:00
Peter Eisentraut	c2bb0378cf	Unicode escapes in E'...' strings Author: Marko Kreen <markokr@gmail.com>	2009-09-22 23:52:53 +00:00
Tom Lane	9048b73184	Implement the DO statement to support execution of PL code without having to create a function for it. Procedural languages now have an additional entry point, namely a function to execute an inline code block. This seemed a better design than trying to hide the transient-ness of the code from the PL. As of this patch, only plpgsql has an inline handler, but probably people will soon write handlers for the other standard PLs. In passing, remove the long-dead LANCOMPILER option of CREATE LANGUAGE. Petr Jelinek	2009-09-22 23:43:43 +00:00
Tom Lane	488d70ab46	Implement "join removal" for cases where the inner side of a left join is unique and is not referenced above the join. In this case the inner side doesn't affect the query result and can be thrown away entirely. Although perhaps nobody would ever write such a thing by hand, it's a reasonably common case in machine-generated SQL. The current implementation only recognizes the case where the inner side is a simple relation with a unique index matching the query conditions. This is enough for the use-cases that have been shown so far, but we might want to try to handle other cases later. Robert Haas, somewhat rewritten by Tom	2009-09-17 20:49:29 +00:00
Tom Lane	e97281c46c	Write psql's ~/.psql_history file using history_truncate_file() and append_history(), if libreadline is new enough to have those functions (they seem to be present at least since 4.2; but libedit may not have them). This gives significantly saner behavior when two or more sessions overlap in their use of the history file; although having two sessions exit at just the same time is still perilous to your history. The behavior of \s remains unchanged, ie, overwrite whatever was there. Per bug #5052 from Marek Wójtowicz.	2009-09-13 22:18:22 +00:00
Tom Lane	9bb342811b	Rewrite the planner's handling of materialized plan types so that there is an explicit model of rescan costs being different from first-time costs. The costing of Material nodes in particular now has some visible relationship to the actual runtime behavior, where before it was essentially fantasy. This also fixes up a couple of places where different materialized plan types were treated differently for no very good reason (probably just oversights). A couple of the regression tests are affected, because the planner now chooses to put the other relation on the inside of a nestloop-with-materialize. So far as I can see both changes are sane, and the planner is now more consistently following the expectation that it should prefer to materialize the smaller of two relations. Per a recent discussion with Robert Haas.	2009-09-12 22:12:09 +00:00
Peter Eisentraut	3ab8b7fa6f	Fix/improve bytea and boolean support in PL/Python Before, PL/Python converted data between SQL and Python by going through a C string representation. This broke for bytea in two ways: - On input (function parameters), you would get a Python string that contains bytea's particular external representation with backslashes etc., instead of a sequence of bytes, which is what you would expect in a Python environment. This problem is exacerbated by the new bytea output format. - On output (function return value), null bytes in the Python string would cause truncation before the data gets stored into a bytea datum. This is now fixed by converting directly between the PostgreSQL datum and the Python representation. The required generalized infrastructure also allows for other improvements in passing: - When returning a boolean value, the SQL datum is now true if and only if Python considers the value that was passed out of the PL/Python function to be true. Previously, this determination was left to the boolean data type input function. So, now returning 'foo' results in true, because Python considers it true, rather than false because PostgreSQL considers it false. - On input, we can convert the integer and float types directly to their Python equivalents without having to go through an intermediate string representation. original patch by Caleb Welton, with updates by myself	2009-09-09 19:00:09 +00:00
Tom Lane	255f66efa9	Fix bug with WITH RECURSIVE immediately inside WITH RECURSIVE. 99% of the code was already okay with this, but the hack that obtained the output column types of a recursive union in advance of doing real parse analysis of the recursive union forgot to handle the case where there was an inner WITH clause available to the non-recursive term. Best fix seems to be to refactor so that we don't need the "throwaway" parse analysis step at all. Instead, teach the transformSetOperationStmt code to set up the CTE's output column information after it's processed the non-recursive term normally. Per report from David Fetter.	2009-09-09 03:32:52 +00:00
Tom Lane	eeb6cb143a	Add a boolean GUC parameter "bonjour" to control whether a Bonjour-enabled build actually attempts to advertise itself via Bonjour. Formerly it always did so, which meant that packagers had to decide for their users whether this behavior was wanted or not. The default is "off" to be on the safe side, though this represents a change in the default behavior of a Bonjour-enabled build. Per discussion.	2009-09-08 17:08:36 +00:00
Magnus Hagander	9f0e84a65d	Change our WIN32 API version to be 5.01 (Windows XP), to bring in the proper IPV6 headers in newer SDKs.	2009-09-07 11:22:12 +00:00
Heikki Linnakangas	7be39bb0be	Tigthen binary receive functions so that they reject values that the text input functions don't accept either. While the backend can handle such values fine, they can cause trouble in clients and in pg_dump/restore. This is followup to the original issue on time datatype reported by Andrew McNamara a while ago. Like that one, none of these seem worth back-patching.	2009-09-04 11:20:23 +00:00
Tom Lane	187e5d8981	Disallow RESET ROLE and RESET SESSION AUTHORIZATION inside security-definer functions. This extends the previous patch that forbade SETting these variables inside security-definer functions. RESET is equally a security hole, since it would allow regaining privileges of the caller; furthermore it can trigger Assert failures and perhaps other internal errors, since the code is not expecting these variables to change in such contexts. The previous patch did not cover this case because assign hooks don't really have enough information, so move the responsibility for preventing this into guc.c. Problem discovered by Heikki Linnakangas. Security: no CVE assigned yet, extends CVE-2007-6600	2009-09-03 22:08:05 +00:00
Tom Lane	57c9dff9d1	Fix subquery pullup to wrap a PlaceHolderVar around the entire RowExpr that's generated for a whole-row Var referencing the subquery, when the subquery is in the nullable side of an outer join. The previous coding instead put PlaceHolderVars around the elements of the RowExpr. The effect was that when the outer join made the subquery outputs go to null, the whole-row Var produced ROW(NULL,NULL,...) rather than just NULL. There are arguments afoot about whether those things ought to be semantically indistinguishable, but for the moment they are not entirely so, and the planner needs to take care that its machinations preserve the difference. Per bug #5025. Making this feasible required refactoring ResolveNew() to allow more caller control over what is substituted for a Var. I chose to make ResolveNew() a wrapper around a new general-purpose function replace_rte_variables(). I also fixed the ancient bogosity that ResolveNew might fail to set a query's hasSubLinks field after inserting a SubLink in it. Although all current callers make sure that happens anyway, we've had bugs of that sort before, and it seemed like a good time to install a proper solution. Back-patch to 8.4. The problem can be demonstrated clear back to 8.0, but the fix would be too invasive in earlier branches; not to mention that people may be depending on the subtly-incorrect behavior. The 8.4 series is new enough that fixing this probably won't cause complaints, but it might in older branches. Also, 8.4 shows the incorrect behavior in more cases than older branches do, because it is able to flatten subqueries in more cases.	2009-09-02 17:52:24 +00:00
Tom Lane	794e3e81a0	Force VACUUM to recalculate oldestXmin even when we haven't changed our own database's datfrozenxid, if the current value is old enough to be forcing autovacuums or warning messages. This ensures that a bogus value is replaced as soon as possible. Per a comment from Heikki.	2009-09-01 04:46:49 +00:00
Tom Lane	b92f7a22b9	Bump catversion for flat-file-ectomy. Also remove a missed dead extern declaration.	2009-09-01 03:53:08 +00:00
Alvaro Herrera	a8bb8eb583	Remove flatfiles.c, which is now obsolete. Recent commits have removed the various uses it was supporting. It was a performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems it slowed down user creation after a billion users.	2009-09-01 02:54:52 +00:00
Tom Lane	0905e8aeeb	Move processing of startup-packet switches and GUC settings into InitPostgres, to fix the problem that SetClientEncoding needs to be done before InitializeClientEncoding, as reported by Zdenek Kotala. We get at least the small consolation of being able to remove the bizarre API detail that had InitPostgres returning whether user is a superuser.	2009-09-01 00:09:42 +00:00
Tom Lane	00e6a16d01	Change the autovacuum launcher to read pg_database directly, rather than via the "flat files" facility. This requires making it enough like a backend to be able to run transactions; it's no longer an "auxiliary process" but more like the autovacuum worker processes. Also, its signal handling has to be brought into line with backends/workers. In particular, since it now has to handle procsignal.c processing, the special autovac-launcher-only signal conditions are moved to SIGUSR2. Alvaro, with some cleanup from Tom	2009-08-31 19:41:00 +00:00
Tom Lane	25ec228ef7	Track the current XID wrap limit (or more accurately, the oldest unfrozen XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. To avoid bloating the checkpoint records unreasonably, I switched from tracking the oldest database by name to tracking it by OID. This turns out to save cycles in general (everywhere but the warning-generating paths, which we hardly care about) and also helps us deal with the case that the oldest database got dropped instead of being vacuumed. The prior coding might go for a long time without updating the wrap limit in that case, which is bad because it might result in a lot of useless autovacuum activity.	2009-08-31 02:23:23 +00:00
Tom Lane	e710b65c1c	Remove the use of the pg_auth flat file for client authentication. (That flat file is now completely useless, but removal will come later.) To do this, postpone client authentication into the startup transaction that's run by InitPostgres. We still collect the startup packet and do SSL initialization (if needed) at the same time we did before. The AuthenticationTimeout is applied separately to startup packet collection and the actual authentication cycle. (This is a bit annoying, since it means a couple extra syscalls; but the signal handling requirements inside and outside a transaction are sufficiently different that it seems best to treat the timeouts as completely independent.) A small security disadvantage is that if the given database name is invalid, this will be reported to the client before any authentication happens. We could work around that by connecting to database "postgres" instead, but consensus seems to be that it's not worth introducing such surprising behavior. Processing of all command-line switches and GUC options received from the client is now postponed until after authentication. This means that PostAuthDelay is much less useful than it used to be --- if you need to investigate problems during InitPostgres you'll have to set PreAuthDelay instead. However, allowing an unauthenticated user to set any GUC options whatever seems a bit too risky, so we'll live with that.	2009-08-29 19:26:52 +00:00
Tom Lane	bb16dc49ab	Modify the definition of window-function PARTITION BY and ORDER BY clauses so that their elements are always taken as simple expressions over the query's input columns. It originally seemed like a good idea to make them act exactly like GROUP BY and ORDER BY, right down to the SQL92-era behavior of accepting output column names or numbers. However, that was not such a great idea, for two reasons: 1. It permits circular references, as exhibited in bug #5018: the output column could be the one containing the window function itself. (We actually had a regression test case illustrating this, but nobody thought twice about how confusing that would be.) 2. It doesn't seem like a good idea for, eg, "lead(foo) OVER (ORDER BY foo)" to potentially use two completely different meanings for "foo". Accordingly, narrow down the behavior of window clauses to use only the SQL99-compliant interpretation that the expressions are simple expressions.	2009-08-27 20:08:03 +00:00
Peter Eisentraut	9d182ef002	Update of install-sh, mkinstalldirs, and associated configury Update install-sh to that from Autoconf 2.63, plus our Darwin-specific changes (which I simplified a bit). install-sh is now able to install multiple files in one run, so we could simplify our makefiles sometime. install-sh also now has a -d option to create directories, so we don't need mkinstalldirs anymore. Use AC_PROG_MKDIR_P in configure.in, so we can use mkdir -p when available instead of install-sh -d. For consistency with the rest of the world, the corresponding make variable has been renamed from $(mkinstalldirs) to $(MKDIR_P).	2009-08-26 22:24:44 +00:00
Tom Lane	7fc7a7c4d0	Fix a violation of WAL coding rules in the recent patch to include an "all tuples visible" flag in heap page headers. The flag update must be applied before calling XLogInsert, but heap_update and the tuple moving routines in VACUUM FULL were ignoring this rule. A crash and replay could therefore leave the flag incorrectly set, causing rows to appear visible in seqscans when they should not be. This might explain recent reports of data corruption from Jeff Ross and others. In passing, do a bit of editorialization on comments in visibilitymap.c.	2009-08-24 02:18:32 +00:00
Tom Lane	cab9a0656c	Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook.	2009-08-23 19:23:41 +00:00
Tom Lane	c38b75947e	Tweak ExecIndexEvalRuntimeKeys to forcibly detoast any toasted comparison values before they get passed to the index access method. This avoids repeated detoastings that will otherwise ensue as the comparison value is examined by various index support functions. We have seen a couple of reports of cases where repeated detoastings result in an order-of-magnitude slowdown, so it seems worth adding a bit of extra logic to prevent this. I had previously proposed trying to avoid duplicate detoastings in general, but this fix takes care of what seems the most important case in practice with very little effort or risk. Back-patch to 8.4 so that the PostGIS folk won't have to wait a year to have this fix in a production release. (The issue exists further back, of course, but the code's diverged enough to make backpatching further a higher-risk action. Also it appears that the possible gains may be limited in prior releases because of different handling of lossy operators.)	2009-08-23 18:26:08 +00:00
Tom Lane	3bd2241135	Fix overflow for INTERVAL 'x ms' where x is more than a couple million, and integer datetimes are in use. Per bug report from Hubert Depesz Lubaczewski. Alex Hunsaker	2009-08-18 21:23:14 +00:00
Teodor Sigaev	a88a48011c	Introduce filtering dictionary support to tsearch. Propagate --nolocale option to CREATE DATABASE command in pg_regress to allow correct checking of locale-sensitive contrib modules.	2009-08-18 10:30:41 +00:00
Peter Eisentraut	7902338192	Remove stray character in type description	2009-08-13 21:14:31 +00:00
Tom Lane	04011cc970	Allow backends to start up without use of the flat-file copy of pg_database. To make this work in the base case, pg_database now has a nailed-in-cache relation descriptor that is initialized using hardwired knowledge in relcache.c. This means pg_database is added to the set of relations that need to have a Schema_pg_xxx macro maintained in pg_attribute.h. When this path is taken, we'll have to do a seqscan of pg_database to find the row we need. In the normal case, we are able to do an indexscan to find the database's row by name. This is made possible by storing a global relcache init file that describes only the shared catalogs and their indexes (and therefore is usable by all backends in any database). A new backend loads this cache file, finds its database OID after an indexscan on pg_database, and then loads the local relcache init file for that database. This change should effectively eliminate number of databases as a factor in backend startup time, even with large numbers of databases. However, the real reason for doing it is as a first step towards getting rid of the flat files altogether. There are still several other sub-projects to be tackled before that can happen.	2009-08-12 20:53:31 +00:00
Tom Lane	e61fd4ac74	Support EEEE (scientific notation) in to_char(). Pavel Stehule, Brendan Jurd	2009-08-10 18:29:27 +00:00
Tom Lane	9bd27b7c9e	Extend EXPLAIN to support output in XML or JSON format. There are probably still some adjustments to be made in the details of the output, but this gets the basic structure in place. Robert Haas	2009-08-10 05:46:50 +00:00
Tom Lane	3783c9d4df	Remove long-since-unused file commands/version.h. Noticed by Itagaki Takahiro.	2009-08-07 16:19:57 +00:00
Tom Lane	dcb2bda9b7	Improve plpgsql's ability to cope with rowtypes containing dropped columns, by supporting conversions in places that used to demand exact rowtype match. Since this issue is certain to come up elsewhere (in fact, already has, in ExecEvalConvertRowtype), factor out the support code into new core functions for tuple conversion. I chose to put these in a new source file since heaptuple.c is already overly long. Heavily revised version of a patch by Pavel Stehule.	2009-08-06 20:44:32 +00:00
Heikki Linnakangas	23dc89d2c3	Improve error messages in md.c. When a filesystem operation like open() or fsync() fails, say "file" rather than "relation" when printing the filename. This makes messages that display block numbers a bit confusing. For example, in message 'could not read block 150000 of file "base/1234/5678.1"', 150000 is the block number from the beginning of the relation, ie. segment 0, not 150000th block within that segment. Per discussion, users aren't usually interested in the exact location within the file, so we can live with that. To ease constructing error messages, add FilePathName(File) function to return the pathname of a virtual fd.	2009-08-05 18:01:54 +00:00
Tom Lane	a2a8c7a662	Support hex-string input and output for type BYTEA. Both hex format and the traditional "escape" format are automatically handled on input. The output format is selected by the new GUC variable bytea_output. As committed, bytea_output defaults to HEX, which is an incompatible change. We will keep it this way for awhile for testing purposes, but should consider whether to switch to the more backwards-compatible default of ESCAPE before 8.5 is released. Peter Eisentraut	2009-08-04 16:08:37 +00:00
Tom Lane	f192e4a5d0	Cause pg_proc.probin to be declared as text, not bytea. Everything was already treating it as text anyway, to the point that I couldn't find anything to change except the datatype markings in catalog/*.h. The only effect that the bytea declaration had was to cause byteaout() to be invoked when pg_dump (or another client program) inspected the column value. Since pg_dump wasn't expecting that, but just treating what it got as text, the net result is that dump and reload would mangle any backslashes or non-ASCII characters in the filename string for a C-language function. That is a very long-standing bug, but given the lack of field complaints it doesn't seem worth trying to find a back-patchable fix. We'll just make this change to fix it going forward. This change will also forestall problems after the planned change to let bytea emit hex output instead of escaped characters.	2009-08-04 04:04:12 +00:00
Joe Conway	be6bca23b3	Implement has_sequence_privilege() Add family of functions that did not exist earlier, mainly due to historical omission. Original patch by Abhijit Menon-Sen, with review and modifications by Joe Conway. catversion.h bumped.	2009-08-03 21:11:40 +00:00
Tom Lane	9072592946	Add ALTER TABLE ... ALTER COLUMN ... SET STATISTICS DISTINCT Robert Haas	2009-08-02 22:14:53 +00:00
Tom Lane	527f0ae3fa	Department of second thoughts: let's show the exact key during unique index build failures, too. Refactor a bit more since that error message isn't spelled the same.	2009-08-01 20:59:17 +00:00
Tom Lane	b680ae4bdb	Improve unique-constraint-violation error messages to include the exact values being complained of. In passing, also remove the arbitrary length limitation in the similar error detail message for foreign key violations. Itagaki Takahiro	2009-08-01 19:59:41 +00:00
Tom Lane	2487d872e0	Create a multiplexing structure for signals to Postgres child processes. This patch gets us out from under the Unix limitation of two user-defined signal types. We already had done something similar for signals directed to the postmaster process; this adds multiplexing for signals directed to backends and auxiliary processes (so long as they're connected to shared memory). As proof of concept, replace the former usage of SIGUSR1 and SIGUSR2 for backends with use of the multiplexing mechanism. There are still some hard-wired definitions of SIGUSR1 and SIGUSR2 for other process types, but getting rid of those doesn't seem interesting at the moment. Fujii Masao	2009-07-31 20:26:23 +00:00
Tom Lane	060baf2784	Merge the Constraint and FkConstraint node types into a single type. This was foreseen to be a good idea long ago, but nobody had got round to doing it. The recent patch for deferred unique constraints made transformConstraintAttrs() ugly enough that I decided it was time. This change will also greatly simplify parsing of deferred CHECK constraints, if anyone ever gets around to implementing that. While at it, add a location field to Constraint, and use that to provide an error cursor for some of the constraint-related error messages.	2009-07-30 02:45:38 +00:00
Tom Lane	25d9bf2e3e	Support deferrable uniqueness constraints. The current implementation fires an AFTER ROW trigger for each tuple that looks like it might be non-unique according to the index contents at the time of insertion. This works well as long as there aren't many conflicts, but won't scale to massive unique-key reassignments. Improving that case is a TODO item. Dean Rasheed	2009-07-29 20:56:21 +00:00
Tom Lane	c1b9ec24ef	Add system catalog columns pg_constraint.conindid and pg_trigger.tgconstrindid. conindid is the index supporting a constraint. We can use this not only for unique/primary-key constraints, but also foreign-key constraints, which depend on the unique index that constrains the referenced columns. tgconstrindid is just copied from the constraint's conindid field, or is zero for triggers not associated with constraints. This is mainly intended as infrastructure for upcoming patches, but it has some virtue in itself, since it exposes a relationship that you formerly had to grovel in pg_depend to determine. I simplified one information_schema view accordingly. (There is a pg_dump query that could also use conindid, but I left it alone because it wasn't clear it'd get any faster.)	2009-07-28 02:56:31 +00:00
Tom Lane	aac3c301b5	Add s_lock support for SuperH architecture. After a patch originally submitted by Nobuhiro Iwamatsu, but corrected (I think) to match our guidelines for safe use of asm fragments. This should be considered untested ...	2009-07-27 05:31:05 +00:00
Tom Lane	d4382c4ae7	Extend EXPLAIN to allow generic options to be specified. The original syntax made it difficult to add options without making them into reserved words. This change parenthesizes the options to avoid that problem, and makes provision for an explicit (and perhaps non-Boolean) value for each option. The original syntax is still supported, but only for the two original options ANALYZE and VERBOSE. As a test case, add a COSTS option that can suppress the planner cost estimates. This may be useful for including EXPLAIN output in the regression tests, which are otherwise unable to cope with cross-platform variations in cost estimates. Robert Haas	2009-07-26 23:34:18 +00:00
Tom Lane	8af12bca3b	Assorted minor refactoring in EXPLAIN. This is believed to not change the output at all, with one known exception: "Subquery Scan foo" becomes "Subquery Scan on foo". (We can fix that if anyone complains, but it would be a wart, because the old code was clearly inconsistent.) The main intention is to remove duplicate coding and provide a cleaner base for subsequent EXPLAIN patching. Robert Haas	2009-07-24 21:08:42 +00:00
Magnus Hagander	a7e587863c	Reserve the shared memory region during backend startup on Windows, so that memory allocated by starting third party DLLs doesn't end up conflicting with it. Hopefully this solves the long-time issue with "could not reattach to shared memory" errors on Win32. Patch from Tsutomu Yamada and me, based on idea from Trevor Talbot.	2009-07-24 20:12:42 +00:00
Tom Lane	846c364dd4	Change do_tup_output() to take Datum/isnull arrays instead of a char * array, so it doesn't go through BuildTupleFromCStrings. This is more or less a wash for current uses, but will avoid inefficiency for planned changes to EXPLAIN. Robert Haas	2009-07-22 17:00:23 +00:00
Tom Lane	ca7c8168de	Tweak TOAST code so that columns marked with MAIN storage strategy are not forced out-of-line unless that is necessary to make the row fit on a page. Previously, they were forced out-of-line if needed to get the row down to the default target size (1/4th page). Kevin Grittner	2009-07-22 01:21:22 +00:00
Peter Eisentraut	5dedce6770	Change pg_listener attribute number constants to match the usual pattern It appears that, for no particularly good reason, pg_listener.h deviates from the usual convention for declaring attribute number constants. Normally, it's #define Anum_{catalog-name}_{column-name} {attribute-number} pg_listener.h, however substitutes a different string that is similar, but not the same as, the column name. This change fixes that. Author: Robert Haas <robertmhaas@gmail.com>	2009-07-21 20:24:51 +00:00
Alvaro Herrera	396a493c17	Install src/include/utils/fmgroids.h on VPATH builds too. The original coding was not dealing specially with this file being a symlink, with the end result that it was not installed in VPATH builds. Oddly enough, the clean target does know about it ...	2009-07-20 20:38:55 +00:00
Andrew Dunstan	e73131a16a	DROP IF EXISTS for columns and constraints. Andres Freund.	2009-07-20 02:42:28 +00:00
Tom Lane	31d1f23302	Teach simplify_boolean_equality to simplify the forms foo <> true and foo <> false, along with its previous duties of simplifying foo = true and foo = false. (All of these are equivalent to just foo or NOT foo as the case may be.) It's not clear how often this is really useful; but it costs almost nothing to do, and it seems some people think we should be smart about such cases. Per recent bug report.	2009-07-20 00:24:30 +00:00
Tom Lane	011eae60ef	Fix error cleanup failure caused by 8.4 changes in plpgsql to try to avoid memory leakage in error recovery. We were calling FreeExprContext, and therefore invoking ExprContextCallback callbacks, in both normal and error exits from subtransactions. However this isn't very safe, as shown in recent trouble report from Frank van Vugt, in which releasing a tupledesc refcount failed. It's also unnecessary, since the resources that callbacks might wish to release should be cleaned up by other error recovery mechanisms (ie the resource owners). We only really want FreeExprContext to release memory attached to the exprcontext in the error-exit case. So, add a bool parameter to FreeExprContext to tell it not to call the callbacks. A more general solution would be to pass the isCommit bool parameter on to the callbacks, so they could do only safe things during error exit. But that would make the patch significantly more invasive and possibly break third-party code that registers ExprContextCallback callbacks. We might want to do that later in HEAD, but for now I'll just do what seems reasonable to back-patch.	2009-07-18 19:15:42 +00:00
Tom Lane	f5bc74192d	Make GEQO's planning deterministic by having it start from a predictable random number seed each time. This is how it used to work years ago, but we got rid of the seed reset because it was resetting the main random() sequence and thus having undesirable effects on the rest of the system. To fix, establish a private random number state for each execution of geqo(), and initialize the state using the new GUC variable geqo_seed. People who want to experiment with different random searches can do so by changing geqo_seed, but you'll always get the same plan for the same value of geqo_seed (if holding all other planner inputs constant, of course). The new state is kept in PlannerInfo by adding a "void *" field reserved for use by join_search hooks. Most of the rather bulky code changes in this commit are just arranging to pass PlannerInfo around to all the GEQO functions (many of which formerly didn't receive it). Andres Freund, with some editorialization by Tom	2009-07-16 20:55:44 +00:00
Tom Lane	c43feefa80	Add erand48() to the set of functions supported by our src/port/ library, and extend configure to test for it properly instead of hard-wiring an assumption that everybody but Windows has the rand48 functions. (We do cheat to the extent of assuming that probing for erand48 will do for the entire rand48 family.) erand48() is unused as of this commit, but a followon patch will cause GEQO to depend on it. Andres Freund, additional hacking by Tom	2009-07-16 17:43:52 +00:00
Peter Eisentraut	de160e2c00	Make backend header files C++ safe This alters various incidental uses of C++ key words to use other similar identifiers, so that a C++ compiler won't choke outright. You still (probably) need extern "C" { }; around the inclusion of backend headers. based on a patch by Kurt Harriman <harriman@acm.org> Also add a script cpluspluscheck to check for C++ compatibility in the future. As of right now, this passes without error for me.	2009-07-16 06:33:46 +00:00
Tom Lane	1aa58d3a83	Tweak the core scanner so that it can be used by plpgsql too. Changes: Pass in the keyword lookup array instead of having it be hardwired. (This incidentally allows elimination of some duplicate coding in ecpg.) Re-order the token declarations in gram.y so that non-keyword tokens have numbers that won't change when keywords are added or removed. Add ".." and ":=" to the set of tokens recognized by scan.l. (Since these combinations are nowhere legal in core SQL, this does not change anything except the precise wording of the error you get when you write this.)	2009-07-14 20:24:10 +00:00
Tom Lane	91e71929ba	Convert the core lexer and parser into fully reentrant code, by making use of features added to flex and bison since this code was originally written. This change doesn't in itself offer any new capability, but it's needed infrastructure for planned improvements in plpgsql. Another feature now available in flex is the ability to make it use palloc instead of malloc, so do that to avoid possible memory leaks. (We should at some point change the other lexers likewise, but this commit doesn't touch them.)	2009-07-13 02:02:20 +00:00
Tom Lane	f1d9c299d6	Fix up PGDLLIMPORT marking for standard_conforming_strings. Moving it into a header file that plpgsql's scan.l can see broke the previous kluge. Per buildfarm results.	2009-07-13 00:42:18 +00:00
Tom Lane	6566e37e02	Move some declarations in the raw-parser header files to create a clearer distinction between the external API (parser.h) and declarations that only need to be visible within the raw parser code (gramparse.h, which now is only included by parser.c, gram.y, scan.l, and keywords.c). This is in preparation for the upcoming change to a reentrant lexer, which will require referencing YYSTYPE in the declarations of base_yylex and filtered_base_yylex, hence gram.h will have to be included by gramparse.h. We don't want any more files than absolutely necessary to depend on gram.h, so some cleanup is called for.	2009-07-12 17:12:34 +00:00
Peter Eisentraut	23d830bd9a	Alter some gratuitous uses of "ANSI" when "SQL standard" might have been meant or the reference to a standard was unnecessary.	2009-07-11 21:15:32 +00:00
Tom Lane	8fe8ac5f40	Fix typo in comment.	2009-07-07 18:49:16 +00:00
Peter Eisentraut	e292dbcf54	More sensible character_octet_length For character types with typmod, character_octet_length columns in the information schema now show the maximum character length times the maximum length of a character in the server encoding, instead of some huge value as before.	2009-07-07 18:23:15 +00:00
Tom Lane	9b27eab71c	Fix set_append_rel_pathlist() to deal intelligently with cases where substituting a child rel's output expressions into the appendrel's restriction clauses yields a pseudoconstant restriction. We might be able to skip scanning that child rel entirely (if we get constant FALSE), or generate a one-time filter. 8.3 more or less accidentally generated plans that weren't completely stupid in these cases, but that was only because an extra recursive level of subquery_planner() always occurred and allowed const-simplification to happen. 8.4's ability to pull up appendrel members with non-Var outputs exposes the fact that we need to work harder here. Per gripe from Sergey Burladyan.	2009-07-06 18:26:30 +00:00
Peter Eisentraut	7cc514ac65	Upgrade to Autoconf 2.63 This upgrades the configure infrastructure to the latest Autoconf version. Some notable news are: - The workaround for the broken fseeko() test is gone. - Checking for unknown options is now provided by Autoconf itself. - Fixes for Mac OS X	2009-07-02 18:55:40 +00:00
Tom Lane	f4ab0b032b	Stamp HEAD as 8.5devel.	2009-07-01 23:15:55 +00:00
Tom Lane	2c3c2c1c3c	Add missed src/include/foreign subdirectory to the set installed into INSTALLDIR/include/server/. Itagaki Takahiro	2009-06-30 17:38:50 +00:00
Marc G. Fournier	41f467f343	Bundle v8.4.0	2009-06-27 00:14:47 +00:00
Tom Lane	2de48a83e6	Cleanup and code review for the patch that made bgwriter active during archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom	2009-06-26 20:29:04 +00:00
Heikki Linnakangas	7e48b77b1c	Fix some serious bugs in archive recovery, now that bgwriter is active during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.	2009-06-25 21:36:00 +00:00
Marc G. Fournier	bc00ceb159	bundle RC2	2009-06-22 23:15:02 +00:00
Peter Eisentraut	12bc87e09b	Refine the use of terminology around bound and unbound cursors and cursor variables. Remove the confusing term "reference cursor".	2009-06-18 10:22:09 +00:00
Tom Lane	f08e5e92e8	Fix the just-reported problem that you can't specify all four trigger event types in CREATE TRIGGER. While at it, clean up the amazingly tedious and inextensible way that the trigger event type list was handled. Per report from Greg Sabino Mullane.	2009-06-18 01:27:02 +00:00
Marc G. Fournier	35daaa91d3	time to tag rc1 ...	2009-06-12 05:19:22 +00:00
Tom Lane	44aa60fa7c	Revisit AlterTableCreateToastTable's API once again, hoping to make it what pg_migrator actually needs and not just a partial solution. We have to be able to specify the OID that the new toast table should be created with.	2009-06-11 20:46:11 +00:00
Bruce Momjian	d747140279	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list provided by Andrew.	2009-06-11 14:49:15 +00:00
Peter Eisentraut	9b7304bc25	Fix xmlattribute escaping XML special characters twice (bug #4822 ). Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>	2009-06-09 22:00:57 +00:00
Tom Lane	e343eaafc2	Mark internal_in as not strict, so that the construct "null::internal" will throw an error, rather than possibly allowing someone to synthesize a manual call to an internal-accepting function. As of CVS HEAD and existing releases, all such functions are either STRICT or careful about null inputs, so there is no current security issue here. But it seems like a good idea to lock this down to protect against future mistakes. In passing, similarly lock down trigger_in, language_handler_in, opaque_in, and shell_in. These are not believed to present any security risk, but there's still no good reason to allow nulls of these types to be created. I left the polymorphic pseudotypes (anyelement etc) alone, since a null of one of those types doesn't seem to be a problem --- the worst you can say about it is that it doesn't have an underlying non-polymorphic type. If we were to make this change during normal development, we'd just automatically bump catversion for a pg_proc.h change. But since this doesn't create a compatibility risk and isn't believed to be fixing a live bug, it seems better not to force a catversion bump in late beta.	2009-06-09 19:51:00 +00:00
Tom Lane	32ea236361	Improve the IndexVacuumInfo/IndexBulkDeleteResult API to allow somewhat sane behavior in cases where we don't know the heap tuple count accurately; in particular partial vacuum, but this also makes the API a bit more useful for ANALYZE. This patch adds "estimated_count" flags to both structs so that an approximate count can be flagged as such, and adjusts the logic so that approximate counts are not used for updating pg_class.reltuples. This fixes my previous complaint that VACUUM was putting ridiculous values into pg_class.reltuples for indexes. The actual impact of that bug is limited, because the planner only pays attention to reltuples for an index if the index is partial; which probably explains why beta testers hadn't noticed a degradation in plan quality from it. But it needs to be fixed. The whole thing is a bit messy and should be redesigned in future, because reltuples now has the potential to drift quite far away from reality when a long period elapses with no non-partial vacuums. But this is as good as it's going to get for 8.4.	2009-06-06 22:13:52 +00:00
Tom Lane	356eea24ce	Fix a serious bug introduced into GIN in 8.4: now that MergeItemPointers() is supposed to remove duplicate heap TIDs, we have to be sure to reduce the tuple size and posting-item count accordingly in addItemPointersToTuple(). Failing to do so resulted in the effective injection of garbage TIDs into the index contents, ie, whatever happened to be in the memory palloc'd for the new tuple. I'm not sure that this fully explains the index corruption reported by Tatsuo Ishii, but the test case I'm using no longer fails.	2009-06-06 02:39:40 +00:00
Tom Lane	52f0fc703f	GIN's ItemPointerIsMin, ItemPointerIsMax, and ItemPointerIsLossyPage macros should use GinItemPointerGetBlockNumber/GinItemPointerGetOffsetNumber, not ItemPointerGetBlockNumber/ItemPointerGetOffsetNumber, because the latter will Assert() on ip_posid == 0, ie a "Min" pointer. (Thus, ItemPointerIsMin has never worked at all, but it seems unused at present.) I'm not certain that the case can occur in normal functioning, but it's blowing up on me while investigating Tatsuo-san's data corruption problem. In any case it seems like a problem waiting to bite someone. Back-patch just in case this really is a problem for somebody in the field.	2009-06-05 18:50:47 +00:00
Tom Lane	76d4abf2d9	Improve the recently-added support for properly pluralized error messages by extending the ereport() API to cater for pluralization directly. This is better than the original method of calling ngettext outside the elog.c code because (1) it avoids double translation, which wastes cycles and in the worst case could give a wrong result; and (2) it avoids having to use a different coding method in PL code than in the core backend. The client-side uses of ngettext are not touched since neither of these concerns is very pressing in the client environment. Per my proposal of yesterday.	2009-06-04 18:33:08 +00:00
Tom Lane	b3b89fd1f1	Fix DecodeInterval to report an error for multiple occurrences of DAY, WEEK, YEAR, DECADE, CENTURY, or MILLENIUM fields, just as it always has done for other types of fields. The previous behavior seems to have been a hack to avoid defining bit-positions for all these field types in DTK_M() masks, rather than something that was really considered to be desired behavior. But there is room in the masks for these, and we really need to tighten up at least the behavior of DAY and YEAR fields to avoid unexpected behavior associated with the 8.4 changes to interpret ambiguous fields based on the interval qualifier (typmod) value. Per my example and proposed patch.	2009-06-01 16:55:11 +00:00
Tom Lane	99bf328237	Remove the useless and rather inconsistent return values of EncodeDateOnly, EncodeTimeOnly, EncodeDateTime, EncodeInterval. These don't have any good reason to fail, and their callers were mostly not checking anyway.	2009-05-26 02:17:50 +00:00
Tom Lane	c3707a4fcd	Use more-portable coding for the check on handing out the last available relopt_kind value in add_reloption_kind(). Per Zdenek Kotala.	2009-05-24 22:22:44 +00:00
Marc G. Fournier	abc924519a	commit for BETA2	2009-05-15 02:18:27 +00:00
Tom Lane	23543c732b	Rewrite xml.c's memory management (yet again). Give up on the idea of redirecting libxml's allocations into a Postgres context. Instead, just let it use malloc directly, and add PG_TRY blocks as needed to be sure we release libxml data structures in error recovery code paths. This is ugly but seems much more likely to play nicely with third-party uses of libxml, as seen in recent trouble reports about using Perl XML facilities in pl/perl and bug #4774 about contrib/xml2. I left the code for allocation redirection in place, but it's only built/used if you #define USE_LIBXMLCONTEXT. This is because I found it useful to corral libxml's allocations in a palloc context when hunting for libxml memory leaks, and we're surely going to have more of those in the future with this type of approach. But we don't want it turned on in a normal build because it breaks exactly what we need to fix. I have not re-indented most of the code sections that are now wrapped by PG_TRY(); that's for ease of review. pg_indent will fix it. This is a pre-existing bug in 8.3, but I don't dare back-patch this change until it's gotten a reasonable amount of field testing.	2009-05-13 20:27:17 +00:00
Tom Lane	f23bdda324	Fix LOCK TABLE to eliminate the race condition that could make it give weird errors when tables are concurrently dropped. To do this we must take lock on each relation before we check its privileges. The old code was trying to do that the other way around, which is a bit pointless when there are lots of other commands that lock relations before checking privileges. I did keep it checking each relation's privilege before locking the next relation, which is a detail that ALTER TABLE isn't too picky about.	2009-05-12 16:43:32 +00:00
Tom Lane	d4a363cdf2	Modify find_inheritance_children() and find_all_inheritors() to add the ability to lock relations as they scan pg_inherits, and to ignore any relations that have disappeared by the time we get lock on them. This makes uses of these functions safe against concurrent DROP operations on child tables: we will effectively ignore any just-dropped child, rather than possibly throwing an error as in recent bug report from Thomas Johansson (and similar past complaints). The behavior should not change otherwise, since the code was acquiring those same locks anyway, just a little bit later. An exception is LockTableCommand(), which is still behaving unsafely; but that seems to require some more discussion before we change it.	2009-05-12 03:11:02 +00:00
Tom Lane	0ada559187	Do some minor code refactoring in preparation for changing the APIs of find_inheritance_children() and find_all_inheritors(). I got annoyed that these are buried inside the planner but mostly used elsewhere. So, create a new file catalog/pg_inherits.c and put them there, along with a couple of other functions that search pg_inherits. The code that modifies pg_inherits is (still) in tablecmds.c --- it's kind of entangled with unrelated code that modifies pg_depend and other stuff, so pulling it out seemed like a bigger change than I wanted to make right now. But this file provides a natural home for it if anyone ever gets around to that. This commit just moves code around; it doesn't change anything, except I succumbed to the temptation to make a couple of trivial optimizations in typeInheritsFrom().	2009-05-12 00:56:05 +00:00
Tom Lane	8dcf18414b	Fix cost_nestloop and cost_hashjoin to model the behavior of semi and anti joins a bit better, ie, understand the differing cost functions for matched and unmatched outer tuples. There is more that could be done in cost_hashjoin but this already helps a great deal. Per discussions with Robert Haas.	2009-05-09 22:51:41 +00:00
Bruce Momjian	a600605bc1	'PGDLLIMPORT' ShmemVariableCache, needed for pg_migrator.so function linkage on Win32. Tested by Hiroshi Saito	2009-05-08 03:21:35 +00:00
Tom Lane	1e06ed1abe	Add an option to AlterTableCreateToastTable() to allow its caller to force a toast table to be built, even if the sum-of-column-widths calculation indicates one isn't needed. This is needed by pg_migrator because if the old table has a toast table, we have to migrate over the toast table since it might contain some live data, even though subsequent column drops could mean that no recently-added rows could require toasting.	2009-05-07 22:58:28 +00:00
Tom Lane	969d7cd431	Install a "dead man switch" to allow the postmaster to detect cases where a backend has done exit(0) or exit(1) without having disengaged itself from shared memory. We are at risk for this whenever third-party code is loaded into a backend, since such code might not know it's supposed to go through proc_exit() instead. Also, it is reported that under Windows there are ways to externally kill a process that cause the status code returned to the postmaster to be indistinguishable from a voluntary exit (thank you, Microsoft). If this does happen then the system is probably hosed --- for instance, the dead session might still be holding locks. So the best recovery method is to treat this like a backend crash. The dead man switch is armed for a particular child process when it acquires a regular PGPROC, and disarmed when the PGPROC is released; these should be the first and last touches of shared memory resources in a backend, or close enough anyway. This choice means there is no coverage for auxiliary processes, but I doubt we need that, since they shouldn't be executing any user-provided code anyway. This patch also improves the management of the EXEC_BACKEND ShmemBackendArray array a bit, by reducing search costs. Although this problem is of long standing, the lack of field complaints seems to mean it's not critical enough to risk back-patching; at least not till we get some more testing of this mechanism.	2009-05-05 19:59:00 +00:00
Tom Lane	c59d8dd44d	Improve pull_up_subqueries logic so that it doesn't insert unnecessary PlaceHolderVar nodes in join quals appearing in or below the lowest outer join that could null the subquery being pulled up. This improves the planner's ability to recognize constant join quals, and probably helps with detection of common sort keys (equivalence classes) as well.	2009-04-28 21:31:16 +00:00
Magnus Hagander	420ea68817	Move gettext encoding names into encnames.c, so we only have one place to update. Per discussion.	2009-04-24 08:43:51 +00:00
Tom Lane	ce53791b2a	Assorted portability fixes for Borland C, from Pavel Golub.	2009-04-19 22:37:13 +00:00
Tom Lane	85128e5d56	Rethink the idea of having plpgsql depend on parser/gram.h. Aside from the fact that this is breaking the MSVC build, it's probably not really a good idea to expand the dependencies of gram.h any further than the core parser; for instance the value of SCONST might depend on which bison version you'd built with. Better to expose an additional call point in parser.c, so move what I had put into pl_funcs.c into parser.c. Also PGDLLIMPORT'ify the reference to standard_conforming_strings, per buildfarm results.	2009-04-19 21:50:09 +00:00
Tom Lane	1d97c19a0f	Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from Stefan Kaltenbrunner. The most reasonable behavior (at least for the near term) seems to be to ignore the PlaceHolderVar and examine its argument instead. In support of this, change the API of pull_var_clause() to allow callers to request recursion into PlaceHolderVars. Currently estimate_num_groups() is the only customer for that behavior, but where there's one there may be others.	2009-04-19 19:46:33 +00:00
Tom Lane	d7a6a04dc7	Fix planner to restore its previous level of intelligence about pushing constants through full joins, as in select * from tenk1 a full join tenk1 b using (unique1) where unique1 = 42; which should generate a fairly cheap plan where we apply the constraint unique1 = 42 in each relation scan. This had been broken by my patch of 2008-06-27, which is now reverted in favor of a more invasive but hopefully less incorrect approach. That patch was meant to prevent incorrect extraction of OR'd indexclauses from OR conditions above an outer join. To do that correctly we need more information than the outerjoin_delay flag can provide, so add a nullable_relids field to RestrictInfo that records exactly which relations are nulled by outer joins that are underneath a particular qual clause. A side benefit is that we can make the test in create_or_index_quals more specific: it is now smart enough to extract an OR'd indexclause into the outer side of an outer join, even though it must not do so in the inner side. The old coding couldn't distinguish these cases so it could not do either.	2009-04-16 20:42:16 +00:00
Marc G. Fournier	4c9c0b85fb	commit and tag beta1	2009-04-10 00:20:10 +00:00
Tom Lane	06e2757277	Remove SQL-compatibility function cardinality(). It is not exactly clear how this ought to behave for multi-dimensional arrays. Per discussion, not having it at all seems better than having it with what might prove to be the wrong behavior. We can always add it later when we have consensus on the correct behavior.	2009-04-09 17:39:50 +00:00
Peter Eisentraut	77d67a4a3b	XMLATTRIBUTES() should send the attribute values through map_sql_value_to_xml_value() instead of directly through the data type output function. This is per SQL standard, and consistent with XMLELEMENT().	2009-04-08 21:51:38 +00:00
Heikki Linnakangas	1fe5020558	Tell gettext which codeset to use by calling bind_textdomain_codeset(). We already did that on Windows, but it's needed on other platforms too when LC_CTYPE=C. With other locales, we enforce (or trust) that the codeset of the locale matches the server encoding so we don't need to bind it explicitly. It should do no harm in that case either, but I don't have full faith in the PG encoding -> OS codeset mapping table yet. Per recent discussion on pgsql-hackers.	2009-04-08 09:50:48 +00:00
Tom Lane	387060951e	Add an optional parameter to pg_start_backup() that specifies whether to do the checkpoint in immediate or lazy mode. This is to address complaints that pg_start_backup() takes a long time even when there's no need to minimize its I/O consumption.	2009-04-07 00:31:26 +00:00
Heikki Linnakangas	1eef90d0a2	Rename the new CREATE DATABASE options to set collation and ctype into LC_COLLATE and LC_CTYPE, per discussion on pgsql-hackers.	2009-04-06 08:42:53 +00:00
Tom Lane	f2110a757d	Change cardinality() into a C-code function, instead of a SQL-language alias for array_length(v,1). The efficiency gain here is doubtless negligible --- what I'm interested in is making sure that if we have second thoughts about the definition, we will not have to force a post-beta initdb to change the implementation.	2009-04-05 22:28:59 +00:00
Tom Lane	fbcce08046	Change EXPLAIN output so that subplans and initplans (particularly CTEs) are individually labeled, rather than just grouped under an "InitPlan" or "SubPlan" heading. This in turn makes it possible for decompilation of a subplan reference to usefully identify which subplan it's referencing. I also made InitPlans identify which parameter symbol(s) they compute, so that references to those parameters elsewhere in the plan tree can be connected to the initplan that will be executed. Per a gripe from Robert Haas about EXPLAIN output of a WITH query being inadequate, plus some longstanding pet peeves of my own.	2009-04-05 19:59:40 +00:00
Tom Lane	27fbfd396c	Remove a boatload of useless definitions of 'int optreset'. If we are using our own ports of getopt or getopt_long, those will define the variable for themselves; and if not, we don't need these, because we never touch the variable anyway.	2009-04-05 04:19:59 +00:00
Tom Lane	2227e2f16d	I had always wondered why pg_config.h.win32 claimed that Windows provides optreset. Current mastodon results prove that in fact it does not; it was only because getopt.c defined the variable anyway that things failed to fall over.	2009-04-05 04:09:01 +00:00
Tom Lane	1d26226d95	Make an attempt at fixing our current Solaris 11 breakage: add a configure probe for opterr (exactly like the one for optreset) and have getopt.c define the variables only if configure doesn't find them in libc.	2009-04-04 21:55:50 +00:00
Tom Lane	090173a3f9	Remove the recently added node types ReloptElem and OptionDefElem in favor of adding optional namespace and action fields to DefElem. Having three node types that do essentially the same thing bloats the code and leads to errors of confusion, such as in yesterday's bug report from Khee Chin.	2009-04-04 21:12:31 +00:00
Tom Lane	c973051ae6	A session that does not have any live snapshots does not have to be waited for when we are waiting for old snapshots to go away during a concurrent index build. In particular, this rule lets us avoid waiting for idle-in-transaction sessions. This logic could be improved further if we had some way to wake up when the session we are currently waiting for goes idle-in-transaction. However that would be a significantly more complex/invasive patch, so it'll have to wait for some other day. Simon Riggs, with some improvements by Tom.	2009-04-04 17:40:36 +00:00
Alvaro Herrera	1c855f01ea	Disallow setting fillfactor for TOAST tables. To implement this without almost duplicating the reloption table, treat relopt_kind as a bitmask instead of an integer value. This decreases the range of allowed values, but it's not clear that there's need for that much values anyway. This patch also makes heap_reloptions explicitly a no-op for relation kinds other than heap and TOAST tables. Patch by ITAGAKI Takahiro with minor edits from me. (In particular I removed the bit about adding relation kind to an error message, which I intend to commit separately.)	2009-04-04 00:45:02 +00:00
Tom Lane	85369f888e	Refactor ExecProject and associated routines so that fast-path code is used for simple Var targetlist entries all the time, even when there are other entries that are not simple Vars. Also, ensure that we prefetch attributes (with slot_getsomeattrs) for all Vars in the targetlist, even those buried within expressions. In combination these changes seem to significantly reduce the runtime for cases where tlists are mostly but not exclusively Vars. Per my proposal of yesterday.	2009-04-02 22:39:30 +00:00
Tom Lane	c26ffb1ead	Fix SetClientEncoding() to maintain a cache of previously selected encoding conversion functions. This allows transaction rollback to revert to a previous client_encoding setting without doing fresh catalog lookups. I believe that this explains and fixes the recent report of "failed to commit client_encoding" failures. This bug is present in 8.3.x, but it doesn't seem prudent to back-patch the fix, at least not till it's had some time for field testing in HEAD. In passing, remove SetDefaultClientEncoding(), which was used nowhere.	2009-04-02 17:30:53 +00:00
Tom Lane	948d6ec90f	Modify the relcache to record the temp status of both local and nonlocal temp relations; this is no more expensive than before, now that we have pg_class.relistemp. Insert tests into bufmgr.c to prevent attempting to fetch pages from nonlocal temp relations. This provides a low-level defense against bugs-of-omission allowing temp pages to be loaded into shared buffers, as in the contrib/pgstattuple problem reported by Stuart Bishop. While at it, tweak a bunch of places to use new relcache tests (instead of expensive probes into pg_namespace) to detect local or nonlocal temp tables.	2009-03-31 22:12:48 +00:00
Tom Lane	df13324f08	Add a "relistemp" boolean column to pg_class, which is true for temporary relations (including a temp table's indexes and toast table/index), and false for normal relations. For ease of checking, this commit just adds the column and fills it correctly --- revising the relation access machinery to use it will come separately.	2009-03-31 17:59:56 +00:00
Tom Lane	793d5662e8	Fix an oversight in the support for storing/retrieving "minimal tuples" in TupleTableSlots. We have functions for retrieving a minimal tuple from a slot after storing a regular tuple in it, or vice versa; but these were implemented by converting the internal storage from one format to the other. The problem with that is it invalidates any pass-by-reference Datums that were already fetched from the slot, since they'll be pointing into the just-freed version of the tuple. The known problem cases involve fetching both a whole-row variable and a pass-by-reference value from a slot that is fed from a tuplestore or tuplesort object. The added regression tests illustrate some simple cases, but there may be other failure scenarios traceable to the same bug. Note that the added tests probably only fail on unpatched code if it's built with --enable-cassert; otherwise the bug leads to fetching from freed memory, which will not have been overwritten without additional conditions. Fix by allowing a slot to contain both formats simultaneously; which turns out not to complicate the logic much at all, if anything it seems less contorted than before. Back-patch to 8.2, where minimal tuples were introduced.	2009-03-30 04:08:43 +00:00
Tom Lane	25bf7f8b9b	Fix possible failures when a tuplestore switches from in-memory to on-disk mode while callers hold pointers to in-memory tuples. I reported this for the case of nodeWindowAgg's primary scan tuple, but inspection of the code shows that all of the calls in nodeWindowAgg and nodeCtescan are at risk. For the moment, fix it with a rather brute-force approach of copying whenever one of the at-risk callers requests a tuple. Later we might think of some sort of reference-count approach to reduce tuple copying.	2009-03-27 18:30:21 +00:00
Peter Eisentraut	8032d76b5b	Gettext plural support In the backend, I changed only a handful of exemplary or important-looking instances to make use of the plural support; there is probably more work there. For the rest of the source, this should cover all relevant cases.	2009-03-26 22:26:08 +00:00
Tom Lane	f38fbf31f5	If we expect a hash join to be performed in multiple batches, suppress "physical tlist" optimization on the outer relation (ie, force a projection step to occur in its scan). This avoids storing useless column values when the outer relation's tuples are written to temporary batch files. Modified version of a patch by Michael Henderson and Ramon Lawrence.	2009-03-26 17:15:35 +00:00
Tom Lane	87b8db3774	Adjust the APIs for GIN opclass support functions to allow the extractQuery() method to pass extra data to the consistent() and comparePartial() methods. This is the core infrastructure needed to support the soon-to-appear contrib/btree_gin module. The APIs are still upward compatible with the definitions used in 8.3 and before, although not with the previous 8.4devel function definitions. catversion bump for changes in pg_proc entries (although these are just cosmetic, since GIN doesn't actually look at the function signature before calling it...) Teodor Sigaev and Oleg Bartunov	2009-03-25 22:19:02 +00:00
Tom Lane	e5efda442c	Install a search tree depth limit in GIN bulk-insert operations, to prevent them from degrading badly when the input is sorted or nearly so. In this scenario the tree is unbalanced to the point of becoming a mere linked list, so insertions become O(N^2). The easiest and most safely back-patchable solution is to stop growing the tree sooner, ie limit the growth of N. We might later consider a rebalancing tree algorithm, but it's not clear that the benefit would be worth the cost and complexity. Per report from Sergey Burladyan and an earlier complaint from Heikki. Back-patch to 8.2; older versions didn't have GIN indexes.	2009-03-24 22:06:03 +00:00
Tom Lane	ff301d6e69	Implement "fastupdate" support for GIN indexes, in which we try to accumulate multiple index entries in a holding area before adding them to the main index structure. This helps because bulk insert is (usually) significantly faster than retail insert for GIN. This patch also removes GIN support for amgettuple-style index scans. The API defined for amgettuple is difficult to support with fastupdate, and the previously committed partial-match feature didn't really work with it either. We might eventually figure a way to put back amgettuple support, but it won't happen for 8.4. catversion bumped because of change in GIN's pg_am entry, and because the format of GIN indexes changed on-disk (there's a metapage now, and possibly a pending list). Teodor Sigaev	2009-03-24 20:17:18 +00:00
Tom Lane	1079564979	Const-ify the parse table passed to fillRelOptions. The previous coding meant it had to be built on-the-fly at each entry to default_reloptions.	2009-03-23 16:36:27 +00:00
Tom Lane	596efd27ed	Optimize multi-batch hash joins when the outer relation has a nonuniform distribution, by creating a special fast path for the (first few) most common values of the outer relation. Tuples having hashvalues matching the MCVs are effectively forced to be in the first batch, so that we never write them out to the batch temp files. Bryce Cutt and Ramon Lawrence, with some editorialization by me.	2009-03-21 00:04:40 +00:00
Tom Lane	dcf3902f02	Make SubPlan nodes carry the result's typmod as well as datatype OID. This is for consistency with the (relatively) recent addition of typmod to SubLink. An example of why it's a good idea is to be seen in the recent "failed to locate grouping columns" bug, which wouldn't have happened if a SubPlan exposed the same typmod info as the SubLink it was derived from. This could be back-patched, since it doesn't affect any on-disk data format, but for the moment it doesn't seem necessary to do so.	2009-03-10 22:09:26 +00:00
Peter Eisentraut	05a7db0582	Accept 'on' and 'off' as input for boolean data type, unifying the syntax that the data type and GUC accepts. ITAGAKI Takahiro	2009-03-09 14:34:35 +00:00
Alvaro Herrera	e43fd89762	Revert pg_bind_textdomain_codeset to a existant-but-empty function when ENABLE_NLS is not defined, for better compatibility of the backend with modules compiled the other way. Per note from Tom after my previous commit.	2009-03-09 00:01:32 +00:00
Alvaro Herrera	4022f94c24	pg_bind_textdomain_codeset must exist only on ENABLE_NLS.	2009-03-08 18:10:17 +00:00
Alvaro Herrera	c3b5d2f138	On Windows, call bind_textdomain_codeset on domains other than the default one, too, so that the codeset is properly mapped on the newly added PL domains.	2009-03-08 16:07:12 +00:00
Alvaro Herrera	328d235571	Separate the key word list that lived in keywords.c into a new header file kwlist.h, to avoid having to link the backend object file into other programs like pg_dump. We can now simply symlink a single source file from the backend (kwlookup.c, containing the shared routine ScanKeywordLookup) and compile it locally, which is a lot cleaner.	2009-03-07 00:13:58 +00:00
Tom Lane	00ce73778b	Teach the planner to support index access methods that only implement amgettuple or only implement amgetbitmap, instead of the former assumption that every AM supports both APIs. Extracted with minor editorialization from Teodor's fast-GIN-insert patch; whatever becomes of that, this seems like a simple and reasonable generalization of the index AM interface spec.	2009-03-05 23:06:45 +00:00
Peter Eisentraut	12f87b2c82	Add new SQL:2008 error codes for invalid LIMIT and OFFSET values. Remove unused nonstandard error code that was perhaps intended for this but never used.	2009-03-04 10:55:00 +00:00
Heikki Linnakangas	d657843a9a	Remove the placeholder LWLockId in place of the removed FreeSpaceLock. As pointed out by ITAGAKI Takahiro, we split SInvalLock into two in 8.4, so to keep the numbers of the rest of the locks unchanged from 8.3, we don't need a placeholder.	2009-03-03 08:11:24 +00:00
Tom Lane	fd9e2accef	When we are in error recursion trouble, arrange to suppress translation and encoding conversion of any elog/ereport message being sent to the frontend. This generalizes a patch that I put in last October, which suppressed translation of only specific messages known to be associated with recursive can't-translate-the-message behavior. As shown in bug #4680, we need a more general answer in order to have some hope of coping with broken encoding conversion setups. This approach seems a good deal less klugy anyway. Patch in all supported branches.	2009-03-02 21:18:43 +00:00
Peter Eisentraut	9de59fd191	Add a -w/--no-password option that prevents all password prompts to all programs that have a -W/--password option. In passing, remove the ancient PSQL_ALWAYS_GET_PASSWORDS compile option.	2009-02-26 16:02:39 +00:00
Tom Lane	e549722a8b	Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of making pull_up_sublinks() construct a full-blown JoinExpr tree representation of IN/EXISTS SubLinks that it is able to convert to semi or anti joins. This makes pull_up_sublinks() a shade more complex, but the gain in semantic clarity is worth it. I still have more to do in this area to address the previously-discussed problems, but this commit in itself fixes at least one bug in HEAD, as shown by added regression test case.	2009-02-25 03:30:38 +00:00
Peter Eisentraut	7babccb915	Add the possibility to specify an explicit validator function for foreign-data wrappers (similar to procedural languages). This way we don't need to retain the nearly empty libraries, and we are more free in how to implement the wrapper API in the future.	2009-02-24 10:06:36 +00:00
Tom Lane	f73bed308a	Repair a longstanding bug in CLUSTER and the rewriting variants of ALTER TABLE: if the command is executed by someone other than the table owner (eg, a superuser) and the table has a toast table, the toast table's pg_type row ends up with the wrong typowner, ie, the command issuer not the table owner. This is quite harmless for most purposes, since no interesting permissions checks consult the pg_type row. However, it could lead to unexpected failures if one later tries to drop the role that issued the command (in 8.1 or 8.2), or strange warnings from pg_dump afterwards (in 8.3 and up, which will allow the DROP ROLE because we don't create a "redundant" owner dependency for table rowtypes). Problem identified by Cott Lang. Back-patch to 8.1. The problem is actually far older --- the CLUSTER variant can be demonstrated in 7.0 --- but it's mostly cosmetic before 8.1 because we didn't track ownership dependencies before 8.1. Also, fixing it before 8.1 would require changing the call signature of heap_create_with_catalog(), which seems to carry a nontrivial risk of breaking add-on modules.	2009-02-24 01:38:10 +00:00
Heikki Linnakangas	bc134d7a51	Change the signaling of end-of-recovery. Startup process now indicates end of recovery by exiting with exit code 0, like in previous releases. Per Tom's suggestion.	2009-02-23 09:28:50 +00:00
Heikki Linnakangas	6ebc6d9089	Increase NUM_AUXILIARY_PROCS, now that the startup process can co-exist with other auxiliary processes for a short period. As witnessed by buildfarm member dungbeetle.	2009-02-19 08:02:32 +00:00
Heikki Linnakangas	cdd46c7654	Start background writer during archive recovery. Background writer now performs its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.	2009-02-18 15:58:41 +00:00
Tom Lane	6d1e361852	Change ALTER TABLE SET WITHOUT OIDS to rewrite the whole table to physically get rid of the OID column. This eliminates the problem discovered by Heikki back in November that 8.4's suppression of "unnecessary" junk filtering in INSERT/SELECT could lead to an Assert failure, or storing of oids into a table that shouldn't have them if Asserts are off. While that particular problem could have been solved in other ways, it seems likely to be just a forerunner of things to come if we continue to allow tables to contain rows that disagree with the pg_class.relhasoids setting. It's better to make this operation slow than to sacrifice performance or risk bugs in more common code paths. Also, add ALTER TABLE SET WITH OIDS to rewrite the table to add oids. This was a bit more controversial, but in view of the very small amount of extra code needed given the current ALTER TABLE infrastructure, it seems best to eliminate the asymmetry in features.	2009-02-11 21:11:16 +00:00
Peter Eisentraut	8b9dd6b5fd	Support for KOI8U encoding	2009-02-10 19:29:39 +00:00
Tom Lane	8205258fa6	Adopt Bob Jenkins' improved hash function for hash_any(). This changes the contents of hash indexes (again), so bump catversion. Kenneth Marshall	2009-02-09 21:18:28 +00:00
Alvaro Herrera	834a6da4f7	Update autovacuum to use reloptions instead of a system catalog, for per-table overrides of parameters. This removes a whole class of problems related to misusing the catalog, and perhaps more importantly, gives us pg_dump support for the parameters. Based on a patch by Euler Taveira de Oliveira, heavily reworked by me.	2009-02-09 20:57:59 +00:00
Tom Lane	c473d92351	Fix cost_mergejoin's failure to adjust for rescanning of non-unique merge join keys when considering a semi or anti join. This requires estimating the selectivity of the merge qual as though it were a regular inner join condition. To allow caching both that and the real outer-join-aware selectivity, split RestrictInfo.this_selec into two fields. This fixes one of the problems reported by Kevin Grittner.	2009-02-06 23:43:24 +00:00
Tom Lane	7449427a1e	Clean up some loose ends from the column privileges patch: add has_column_privilege and has_any_column_privilege SQL functions; fix the information_schema views that are supposed to pay attention to column privileges; adjust pg_stats to show stats for any column you have select privilege on; and fix COPY to allow copying a subset of columns if the user has suitable per-column privileges for all the columns. To improve efficiency of some of the information_schema views, extend the has_xxx_privilege functions to allow inquiring about the OR of a set of privileges in just one call. This is just exposing capability that already existed in the underlying aclcheck routines. In passing, make the information_schema views report the owner's own privileges as being grantable, since Postgres assumes this even when the grant option bit is not set in the ACL. This is a longstanding oversight. Also, make the new has_xxx_privilege functions for foreign data objects follow the same coding conventions used by the older ones. Stephen Frost and Tom Lane	2009-02-06 21:15:12 +00:00
Alvaro Herrera	3a5b773715	Allow reloption names to have qualifiers, initially supporting a TOAST qualifier, and add support for this in pg_dump. This allows TOAST tables to have user-defined fillfactor, and will also enable us to move the autovacuum parameters to reloptions without taking away the possibility of setting values for TOAST tables.	2009-02-02 19:31:40 +00:00
Tom Lane	0d65eea3da	Replace argument-checking Asserts with regular test-and-elog checks in all encoding conversion functions. These are not can't-happen cases because it's possible to create a conversion with the wrong conversion function for the specified encoding pair. That would lead to an Assert crash in an Assert-enabled build, or incorrect conversion otherwise, neither of which is desirable. This would be a DOS issue if production databases were customarily built with asserts enabled, but fortunately that's not so. Per an observation by Heikki. Back-patch to all supported branches.	2009-01-29 19:23:42 +00:00
Peter Eisentraut	5fe3da927b	Revert updatable views	2009-01-27 12:40:15 +00:00
Alvaro Herrera	c0f92b57dc	Allow extracting and parsing of reloptions from a bare pg_class tuple, and refactor the relcache code that used to do that. This allows other callers (particularly autovacuum) to do the same without necessarily having to open and lock a table.	2009-01-26 19:41:06 +00:00
Tom Lane	3cb5d6580a	Support column-level privileges, as required by SQL standard. Stephen Frost, with help from KaiGai Kohei and others	2009-01-22 20:16:10 +00:00
Heikki Linnakangas	c079090bbc	Update comments to reflect that tgenabled is not a boolean anymore. Jonah Harris, with minor tinkering by me.	2009-01-22 19:16:31 +00:00
Peter Eisentraut	dd7e54a17f	Automatic view update rules Bernd Helmle	2009-01-22 17:27:55 +00:00
Heikki Linnakangas	94136d5a18	Add new SPI_OK_REWRITTEN return code to SPI_execute and friends, for the case that the command is rewritten into another type of command. The old behavior to return the command tag of the last executed command was pretty surprising. In PL/pgSQL, for example, it meant that if a command was rewritten to a utility statement, FOUND wasn't set at all.	2009-01-21 11:02:40 +00:00
Magnus Hagander	0154345078	Make win32 builds always do SetEnvironmentVariable() when doing putenv(). Also, if linked against other versions than the default MSVCRT library (for example the MSVC build which links against MSVCRT80), also update the cache in the default MSVCRT at the same time. This should fix the issues with setting LC_MESSAGES on the MSVC build. Original patch from Hiroshi Inoue and Hiroshi Saito, much rewritten by me.	2009-01-21 10:30:02 +00:00
Heikki Linnakangas	b2a667b9ee	Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.	2009-01-20 18:59:37 +00:00
Heikki Linnakangas	6587818542	Add vacuum_freeze_table_age GUC option, to control when VACUUM should ignore the visibility map and scan the whole table, to advance relfrozenxid.	2009-01-16 13:27:24 +00:00
Teodor Sigaev	41d17e042b	Fix URL generation in headline. Only tag lexeme will be replaced by space. Per http://archives.postgresql.org/pgsql-bugs/2008-12/msg00013.php	2009-01-15 16:33:59 +00:00
Alvaro Herrera	8ebe1e356c	Simplify the writing of amoptions routines by introducing a convenience fillRelOptions routine that stores the parsed values in the struct using a table-based approach. Per Tom suggestion. Also remove the "continue" in HANDLE_*_RELOPTION macros, which were useless and in spirit they were assuming too much of how the macros were going to be used. (Note that these macros are now unused, but the intention is to introduce some usage in a future autovacuum patch, which is why they weren't completely removed.) Also, do not call the string validation routine when not validating. It seems less error-prone this way, per commentary on the amoptions SGML docs.	2009-01-12 21:02:15 +00:00
Tom Lane	b7b8f0b609	Implement prefetching via posix_fadvise() for bitmap index scans. A new GUC variable effective_io_concurrency controls how many concurrent block prefetch requests will be issued. (The best way to handle this for plain index scans is still under debate, so that part is not applied yet --- tgl) Greg Stark	2009-01-12 05:10:45 +00:00
Tom Lane	1a37056a74	Re-enable the old code in xlog.c that tried to use posix_fadvise(), so that we can get some buildfarm feedback about whether that function is still problematic. (Note that the planned async-preread patch will not really prove anything one way or the other in buildfarm testing, since it will be inactive with default GUC settings.)	2009-01-11 18:02:17 +00:00
Tom Lane	43a57cf365	Revise the TIDBitmap API to support multiple concurrent iterations over a bitmap. This is extracted from Greg Stark's posix_fadvise patch; it seems worth committing separately, since it's potentially useful independently of posix_fadvise.	2009-01-10 21:08:36 +00:00
Tom Lane	d04db37072	Arrange for function default arguments to be processed properly in expressions that are set up for execution with ExecPrepareExpr rather than going through the full planner process. By introducing an explicit notion of "expression planning", this patch also lays a bit of groundwork for maybe someday allowing sub-selects in standalone expressions.	2009-01-09 15:46:11 +00:00
Alvaro Herrera	b813c8daca	A couple further reloptions improvements, per KaiGai Kohei: add a validation function to the string type and add a couple of macros for string handling. In passing, fix an off-by-one bug of mine.	2009-01-08 19:34:41 +00:00
Tom Lane	445ce15702	Create a third option named "partition" for constraint_exclusion, and make it the default. This setting enables constraint exclusion checks only for appendrel members (ie, inheritance children and UNION ALL arms), which are the cases in which constraint exclusion is most likely to be useful. Avoiding the overhead for simple queries that are unlikely to benefit should bring the cost down to the point where this is a reasonable default setting. Per today's discussion.	2009-01-07 22:40:49 +00:00
Tom Lane	deac9488d3	Insert conditional SPI_push/SPI_pop calls into InputFunctionCall, OutputFunctionCall, and friends. This allows SPI-using functions to invoke datatype I/O without concern for the possibility that a SPI-using function will be called (which could be either the I/O function itself, or a function used in a domain check constraint). It's a tad ugly, but not nearly as ugly as what'd be needed to make this work via retail insertion of push/pop operations in all the PLs. This reverts my patch of 2007-01-30 that inserted some retail SPI_push/pop calls into plpgsql; that approach only fixed plpgsql, and not any other PLs. But the other PLs have the issue too, as illustrated by a recent gripe from Christian Schröder. Back-patch to 8.2, which is as far back as this solution will work. It's also as far back as we need to worry about the domain-constraint case, since earlier versions did not attempt to check domain constraints within datatype input. I'm not aware of any old I/O functions that use SPI themselves, so this should be sufficient for a back-patch.	2009-01-07 20:38:56 +00:00
Andrew Dunstan	678e597ee3	define HAVE_FSEEKO for MSVC	2009-01-07 13:51:04 +00:00
Tom Lane	1cfd9e8834	Fix executor/spi.h to follow our usual conventions for include files, ie, not include postgres.h nor anything else it doesn't directly need. Add #includes to calling files as needed to compensate. Per my proposal of yesterday. This should be noted as a source code change in the 8.4 release notes, since it's likely to require changes in add-on modules.	2009-01-07 13:44:37 +00:00
Magnus Hagander	b09f930d2e	Add hba parameter include_realm to krb5, gss and sspi authentication, used to pass the full username@realm string to the authentication instead of just the username. This makes it possible to use pg_ident.conf to authenticate users from multiple realms as different database users.	2009-01-07 13:09:21 +00:00
Magnus Hagander	32c469d7b1	Allow krb_realm (krb5, gssapi and sspi) and krb_server_hostname (krb5 only) authentication options to be set in pg_hba.conf on a per-line basis, to override the defaults set in postgresql.conf.	2009-01-07 12:38:11 +00:00
Bruce Momjian	d00a3472cf	Update MinGW so it handles fseeko() similar to Unix.	2009-01-07 03:39:33 +00:00
Tom Lane	7c63d0c72e	Change a couple of ill-advised uses of INFO elog level to WARNINGs; in particular this allows EmitWarningsOnPlaceholders messages to show up in the postmaster log by default. Update elog.h comment to make it clearer what INFO is for, and fix one example in the SGML docs that was misusing it. Per my gripe of yesterday.	2009-01-06 16:39:52 +00:00

... 30 31 32 33 34 ...

7870 commits