Commit graph

360 commits

Author SHA1 Message Date
Tom Lane
ae93e5fd6e Make the world very nearly safe for composite-type columns in tables.
1. Solve the problem of not having TOAST references hiding inside composite
values by establishing the rule that toasting only goes one level deep:
a tuple can contain toasted fields, but a composite-type datum that is
to be inserted into a tuple cannot.  Enforcing this in heap_formtuple
is relatively cheap and it avoids a large increase in the cost of running
the tuptoaster during final storage of a row.
2. Fix some interesting problems in expansion of inherited queries that
reference whole-row variables.  We never really did this correctly before,
but it's now relatively painless to solve by expanding the parent's
whole-row Var into a RowExpr() selecting the proper columns from the
child.
If you dike out the preventive check in CheckAttributeType(),
composite-type columns now seem to actually work.  However, we surely
cannot ship them like this --- without I/O for composite types, you
can't get pg_dump to dump tables containing them.  So a little more
work still to do.
2004-06-05 01:55:05 +00:00
Tom Lane
8f2ea8b7b5 Resurrect heap_deformtuple(), this time implemented as a singly nested
loop over the fields instead of a loop around heap_getattr.  This is
considerably faster (O(N) instead of O(N^2)) when there are nulls or
varlena fields, since those prevent use of attcacheoff.  Replace loops
over heap_getattr with heap_deformtuple in situations where all or most
of the fields have to be fetched, such as printtup and tuptoaster.
Profiling done more than a year ago shows that this should be a nice
win for situations involving many-column tables.
2004-06-04 20:35:21 +00:00
Tom Lane
2095206de1 Adjust btree index build to not use shared buffers, thereby avoiding the
locking conflict against concurrent CHECKPOINT that was discussed a few
weeks ago.  Also, if not using WAL archiving (which is always true ATM
but won't be if PITR makes it into this release), there's no need to
WAL-log the index build process; it's sufficient to force-fsync the
completed index before commit.  This seems to gain about a factor of 2
in my tests, which is consistent with writing half as much data.  I did
not try it with WAL on a separate drive though --- probably the gain would
be a lot less in that scenario.
2004-06-02 17:28:18 +00:00
Tom Lane
9b178555fc Per previous discussions, get rid of use of sync(2) in favor of
explicitly fsync'ing every (non-temp) file we have written since the
last checkpoint.  In the vast majority of cases, the burden of the
fsyncs should fall on the bgwriter process not on backends.  (To this
end, we assume that an fsync issued by the bgwriter will force out
blocks written to the same file by other processes using other file
descriptors.  Anyone have a problem with that?)  This makes the world
safe for WIN32, which ain't even got sync(2), and really makes the world
safe for Unixen as well, because sync(2) never had the semantics we need:
it offers no way to wait for the requested I/O to finish.

Along the way, fix a bug I recently introduced in xlog recovery:
file truncation replay failed to clear bufmgr buffers for the dropped
blocks, which could result in 'PANIC:  heap_delete_redo: no block'
later on in xlog replay.
2004-05-31 03:48:10 +00:00
Tom Lane
076a055acf Separate out bgwriter code into a logically separate module, rather
than being random pieces of other files.  Give bgwriter responsibility
for all checkpoint activity (other than a post-recovery checkpoint);
so this child process absorbs the functionality of the former transient
checkpoint and shutdown subprocesses.  While at it, create an actual
include file for postmaster.c, which for some reason never had its own
file before.
2004-05-29 22:48:23 +00:00
Tom Lane
1a321f26d8 Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs by
about a third, make it work on non-Windows platforms again.  (But perhaps
I broke the WIN32 code, since I have no way to test that.)  Fold all the
paths that fork postmaster child processes to go through the single
routine SubPostmasterMain, which takes care of resurrecting the state that
would normally be inherited from the postmaster (including GUC variables).
Clean up some places where there's no particularly good reason for the
EXEC and non-EXEC cases to work differently.  Take care of one or two
FIXMEs that remained in the code.
2004-05-28 05:13:32 +00:00
Tom Lane
16974ee910 Get rid of the former rather baroque mechanism for propagating the values
of ThisStartUpID and RedoRecPtr into new backends.  It's a lot easier just
to make them all grab the values out of shared memory during startup.
This helps to decouple the postmaster from checkpoint execution, which I
need since I'm intending to let the bgwriter do it instead, and it also
fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at
all AFAICS.  (Doesn't give me a lot of faith in the amount of testing that
port has gotten.)
2004-05-27 17:12:57 +00:00
Tom Lane
4d86ae4260 For multi-table ANALYZE, use per-table transactions when possible
(ie, when not inside a transaction block), so that we can avoid holding
locks longer than necessary.  Per trouble report from Philip Warner.
2004-05-22 23:14:38 +00:00
Tom Lane
4af3421161 Get rid of rd_nblocks field in relcache entries. Turns out this was
costing us lots more to maintain than it was worth.  On shared tables
it was of exactly zero benefit because we couldn't trust it to be
up to date.  On temp tables it sometimes saved an lseek, but not often
enough to be worth getting excited about.  And the real problem was that
we forced an lseek on every relcache flush in order to update the field.
So all in all it seems best to lose the complexity.
2004-05-08 19:09:25 +00:00
Tom Lane
37fa3b6c89 Tweak indexscan and seqscan code to arrange that steps from one page to
the next are handled by ReleaseAndReadBuffer rather than separate
ReleaseBuffer and ReadBuffer calls.  This cuts the number of acquisitions
of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens
to pull successive rows from the same heap page).  Unfortunately this
doesn't seem enough to get us out of the recently discussed context-switch
storm problem, but it's surely worth doing anyway.
2004-04-21 18:24:26 +00:00
Bruce Momjian
823ac7c2b4 This is a cleanup patch for access/transam/xact.c. It only removes some
#ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact
that StartTransaction() has been executed.

Alvaro Herrera
2004-04-05 03:11:39 +00:00
Tom Lane
375369acd1 Replace TupleTableSlot convention for whole-row variables and function
results with tuples as ordinary varlena Datums.  This commit does not
in itself do much for us, except eliminate the horrid memory leak
associated with evaluation of whole-row variables.  However, it lays the
groundwork for allowing composite types as table columns, and perhaps
some other useful features as well.  Per my proposal of a few days ago.
2004-04-01 21:28:47 +00:00
Teodor Sigaev
f2c064afcb Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-aligned
boxes. Change interface to user-defined GiST support methods union and
picksplit. Now instead of bytea struct it used special GistEntryVector
structure.
2004-03-30 15:45:33 +00:00
Tatsuo Ishii
0b86ade1c2 Add NOWAIT option to LOCK command 2004-03-11 01:47:41 +00:00
Tom Lane
c3c09be34b Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, to
wit: Add a header record to each WAL segment file so that it can be reliably
identified.  Avoid splitting WAL records across segment files (this is not
strictly necessary, but makes it simpler to incorporate the header records).
Make WAL entries for file creation, deletion, and truncation (as foreseen but
never implemented by Vadim).  Also, add support for making XLOG_SEG_SIZE
configurable at compile time, similarly to BLCKSZ.  Fix a couple bugs I
introduced in WAL replay during recent smgr API changes.  initdb is forced
due to changes in pg_control contents.
2004-02-11 22:55:26 +00:00
Tom Lane
391c3811a2 Rename SortMem and VacuumMem to work_mem and maintenance_work_mem.
Make btree index creation and initial validation of foreign-key constraints
use maintenance_work_mem rather than work_mem as their memory limit.
Add some code to guc.c to allow these variables to be referenced by their
old names in SHOW and SET commands, for backwards compatibility.
2004-02-03 17:34:04 +00:00
Bruce Momjian
f4921e5ca3 Attached is a patch that fixes some trivial typos and alignment. Please
apply.

Alvaro Herrera
2004-01-26 22:51:56 +00:00
Tom Lane
9bd681a522 Repair problem identified by Olivier Prenant: ALTER DATABASE SET search_path
should not be too eager to reject paths involving unknown schemas, since
it can't really tell whether the schemas exist in the target database.
(Also, when reading pg_dumpall output, it could be that the schemas
don't exist yet, but eventually will.)  ALTER USER SET has a similar issue.
So, reduce the normal ERROR to a NOTICE when checking search_path values
for these commands.  Supporting this requires changing the API for GUC
assign_hook functions, which causes the patch to touch a lot of places,
but the changes are conceptually trivial.
2004-01-19 19:04:40 +00:00
Tom Lane
0966516b75 Tighten short-circuit tests for deciding whether we need to invoke
tuptoaster.c --- fields that are compressed in-line are not a reason
to invoke the toaster.  Along the way, add a couple more htup.h macros
to eliminate confusing negated tests, and get rid of the already
vestigial TUPLE_TOASTER_ACTIVE symbol.
2004-01-16 20:51:30 +00:00
Neil Conway
bc028beb16 Make the 'wal_debug' GUC variable a boolean (rather than an integer), and
hide it behind #ifdef WAL_DEBUG blocks.
2004-01-06 17:26:23 +00:00
Tom Lane
569659ae16 Improve btree's initial-positioning-strategy code so that we never need
to step more than one entry after descending the search tree to arrive at
the correct place to start the scan.  This can improve the behavior
substantially when there are many entries equal to the chosen boundary
value.  Per suggestion from Dmitry Tkach, 14-Jul-03.
2003-12-21 01:23:06 +00:00
Bruce Momjian
d75b2ec4eb This patch is the next step towards (re)allowing fork/exec.
Claudio Natoli
2003-12-20 17:31:21 +00:00
Peter Eisentraut
2afacfc403 This patch properly sets the prototype for the on_shmem_exit and
on_proc_exit functions, and adjust all other related code to use
the proper types too.

by Kurt Roeckx
2003-12-12 18:45:10 +00:00
PostgreSQL Daemon
55b113257c make sure the $Id tags are converted to $PostgreSQL as well ... 2003-11-29 22:41:33 +00:00
Tom Lane
fa5c8a055a Cross-data-type comparisons are now indexable by btrees, pursuant to my
pghackers proposal of 8-Nov.  All the existing cross-type comparison
operators (int2/int4/int8 and float4/float8) have appropriate support.
The original proposal of storing the right-hand-side datatype as part of
the primary key for pg_amop and pg_amproc got modified a bit in the event;
it is easier to store zero as the 'default' case and only store a nonzero
when the operator is actually cross-type.  Along the way, remove the
long-since-defunct bigbox_ops operator class.
2003-11-12 21:15:59 +00:00
Tom Lane
c1d62bfd00 Add operator strategy and comparison-value datatype fields to ScanKey.
Remove the 'strategy map' code, which was a large amount of mechanism
that no longer had any use except reverse-mapping from procedure OID to
strategy number.  Passing the strategy number to the index AM in the
first place is simpler and faster.
This is a preliminary step in planned support for cross-datatype index
operations.  I'm committing it now since the ScanKeyEntryInitialize()
API change touches quite a lot of files, and I want to commit those
changes before the tree drifts under me.
2003-11-09 21:30:38 +00:00
Peter Eisentraut
96889392e9 Implement isolation levels read uncommitted and repeatable read as acting
like the next higher one.
2003-11-06 22:08:15 +00:00
Tom Lane
90b2202975 Fix bad interaction between NOTIFY processing and V3 extended query
protocol, per report from Igor Shevchenko.  NOTIFY thought it could
do its thing if transaction blockState is TBLOCK_DEFAULT, but in
reality it had better check the low-level transaction state is
TRANS_DEFAULT as well.  Formerly it was not possible to wait for the
client in a state where the first is true and the second is not ...
but now we can have such a state.  Minor cleanup in StartTransaction()
as well.
2003-10-16 16:50:41 +00:00
Tom Lane
55d85f42a8 Repair RI trigger visibility problems (this time for sure ;-)) per recent
discussion on pgsql-hackers: in READ COMMITTED mode we just have to force
a QuerySnapshot update in the trigger, but in SERIALIZABLE mode we have
to run the scan under a current snapshot and then complain if any rows
would be updated/deleted that are not visible in the transaction snapshot.
2003-10-01 21:30:53 +00:00
Tom Lane
e33f205a94 Adjust btree index build procedure so that the btree metapage looks
invalid (has the wrong magic number) until the build is entirely
complete.  This turns out to cost no additional writes in the normal
case, since we were rewriting the metapage at the end of the process
anyway.  In normal scenarios there's no real gain in security, because
a failed index build would roll back the transaction leaving an unused
index file, but for rebuilding shared system indexes this seems to add
some useful protection.
2003-09-29 23:40:26 +00:00
Tom Lane
8934790052 Add a mechanism to let dynamically loaded modules register post-commit/
post-abort cleanup hooks.  I'm surprised that we have not needed this
already, but I need it now to fix a plpgsql problem, and the usefulness
for other dynamically loaded modules seems obvious.
2003-09-28 23:26:20 +00:00
Tom Lane
c63a5452d8 Get rid of ReferentialIntegritySnapshotOverride by extending Executor API
to allow es_snapshot to be set to SnapshotNow rather than a query snapshot.
This solves a bug reported by Wade Klaver, wherein triggers fired as a
result of RI cascade updates could misbehave.
2003-09-25 18:58:36 +00:00
Tom Lane
db18703b5a Fix LISTEN/NOTIFY race condition reported by Gavin Sherry. While a
really general fix might be difficult, I believe the only case where
AtCommit_Notify could see an uncommitted tuple is where the other guy
has just unlistened and not yet committed.  The best solution seems to
be to just skip updating that tuple, on the assumption that the other
guy does not want to hear about the notification anyway.  This is not
perfect --- if the other guy rolls back his unlisten instead of committing,
then he really should have gotten this notify.  But to do that, we'd have
to wait to see if he commits or not, or make UNLISTEN hold exclusive lock
on pg_listener until commit.  Either of these answers is deadlock-prone,
not to mention horrible for interactive performance.  Do it this way
for now.  (What happened to that project to do LISTEN/NOTIFY in memory
with no table, anyway?)
2003-09-15 23:33:43 +00:00
Tom Lane
7a3693716d Reimplement hash index locking algorithms, per my recent proposal to
pghackers.  This fixes the problem recently reported by Markus KrÌutner
(hash bucket split corrupts the state of scans being done concurrently),
and I believe it also fixes all the known problems with deadlocks in
hash index operations.  Hash indexes are still not really ready for prime
time (since they aren't WAL-logged), but this is a step forward.
2003-09-04 22:06:27 +00:00
Tom Lane
d70610c4ee Several fixes for hash indexes that involve changing the on-disk index
layout; therefore, this change forces REINDEX of hash indexes (though
not a full initdb).  Widen hashm_ntuples to double so that hash space
management doesn't get confused by more than 4G entries; enlarge the
allowed number of free-space-bitmap pages; replace the useless bshift
field with a useful bmshift field; eliminate 4 bytes of wasted space
in the per-page special area.
2003-09-02 18:13:32 +00:00
Tom Lane
39673ca47b Rewrite hashbulkdelete() to make it amenable to new bucket locking
scheme.  A pleasant side effect is that it is *much* faster when deleting
a large fraction of the indexed tuples, because of elimination of
redundant hash_step activity induced by hash_adjscans.  Various other
continuing code cleanup.
2003-09-02 02:18:38 +00:00
Tom Lane
65c2d427fb Preliminary cleanup for hash index code (doesn't attack the locking problem
yet).  Fix a couple of bugs that would only appear if multiple bitmap pages
are used, including a buffer reference leak and incorrect computation of bit
indexes.  Get rid of 'overflow address' concept, which accomplished nothing
except obfuscating the code and creating a risk of failure due to limited
range of offset field.  Rename some misleadingly-named fields and routines,
and improve documentation.
2003-09-01 20:26:34 +00:00
Tom Lane
302f1a86dc Rewriter and planner should use only resno, not resname, to identify
target columns in INSERT and UPDATE targetlists.  Don't rely on resname
to be accurate in ruleutils, either.  This fixes bug reported by
Donald Fraser, in which renaming a column referenced in a rule did not
work very well.
2003-08-11 23:04:50 +00:00
Bruce Momjian
46785776c4 Another pgindent run with updated typedefs. 2003-08-08 21:42:59 +00:00
Tom Lane
2f9c859ea1 Fix some copyright notices that weren't updated. Improve copyright tool
so it won't miss 'em again.
2003-08-04 23:59:41 +00:00
Bruce Momjian
f3c3deb7d0 Update copyrights to 2003. 2003-08-04 02:40:20 +00:00
Bruce Momjian
089003fb46 pgindent run. 2003-08-04 00:43:34 +00:00
Tom Lane
e8db9b26d0 elog mop-up. 2003-07-27 17:10:07 +00:00
Tom Lane
bff0422b6c Revise hash join and hash aggregation code to use the same datatype-
specific hash functions used by hash indexes, rather than the old
not-datatype-aware ComputeHashFunc routine.  This makes it safe to do
hash joining on several datatypes that previously couldn't use hashing.
The sets of datatypes that are hash indexable and hash joinable are now
exactly the same, whereas before each had some that weren't in the other.
2003-06-22 22:04:55 +00:00
Bruce Momjian
0abe7431c6 This patch extracts page buffer pooling and the simple
least-recently-used strategy from clog.c into slru.c.  It doesn't
change any visible behaviour and passes all regression tests plus a
TruncateCLOG test done manually.

Apart from refactoring I made a little change to SlruRecentlyUsed,
formerly ClogRecentlyUsed:  It now skips incrementing lru_counts, if
slotno is already the LRU slot, thus saving a few CPU cycles.  To make
this work, lru_counts are initialised to 1 in SimpleLruInit.

SimpleLru will be used by pg_subtrans (part of the nested transactions
project), so the main purpose of this patch is to avoid future code
duplication.

Manfred Koizar
2003-06-11 22:37:46 +00:00
Tom Lane
f85f43dfb5 Backend support for autocommit removed, per recent discussions. The
only remnant of this failed experiment is that the server will take
SET AUTOCOMMIT TO ON.  Still TODO: provide some client-side autocommit
logic in libpq.
2003-05-14 03:26:03 +00:00
Tom Lane
30f609484d Add binary I/O routines for a bunch more datatypes. Still a few to go,
but that was enough tedium for one day.  Along the way, move the few
support routines for types xid and cid into a more logical place.
2003-05-12 23:08:52 +00:00
Tom Lane
c0a8c3ac13 Update 3.0 protocol support to match recent agreements about how to
handle multiple 'formats' for data I/O.  Restructure CommandDest and
DestReceiver stuff one more time (it's finally starting to look a bit
clean though).  Code now matches latest 3.0 protocol document as far
as message formats go --- but there is no support for binary I/O yet.
2003-05-08 18:16:37 +00:00
Tom Lane
79913910d4 Restructure command destination handling so that we pass around
DestReceiver pointers instead of just CommandDest values.  The DestReceiver
is made at the point where the destination is selected, rather than
deep inside the executor.  This cleans up the original kluge implementation
of tstoreReceiver.c, and makes it easy to support retrieving results
from utility statements inside portals.  Thus, you can now do fun things
like Bind and Execute a FETCH or EXPLAIN command, and it'll all work
as expected (e.g., you can Describe the portal, or use Execute's count
parameter to suspend the output partway through).  Implementation involves
stuffing the utility command's output into a Tuplestore, which would be
kind of annoying for huge output sets, but should be quite acceptable
for typical uses of utility commands.
2003-05-06 20:26:28 +00:00
Tom Lane
2cf57c8f8d Implement feature of new FE/BE protocol whereby RowDescription identifies
the column by table OID and column number, if it's a simple column
reference.  Along the way, get rid of reskey/reskeyop fields in Resdoms.
Turns out that representation was not convenient for either the planner
or the executor; we can make the planner deliver exactly what the
executor wants with no more effort.
initdb forced due to change in stored rule representation.
2003-05-06 00:20:33 +00:00