and standard_conforming_strings. The encoding changes are needed for proper
escaping in multibyte encodings, as per the SQL-injection vulnerabilities
noted in CVE-2006-2313 and CVE-2006-2314. Concurrent fixes are being applied
to the server to ensure that it rejects queries that may have been corrupted
by attempted SQL injection, but this merely guarantees that unpatched clients
will fail rather than allow injection. An actual fix requires changing the
client-side code. While at it we have also fixed these routines to understand
about standard_conforming_strings, so that the upcoming changeover to SQL-spec
string syntax can be somewhat transparent to client code.
Since the existing API of PQescapeString and PQescapeBytea provides no way to
inform them which settings are in use, these functions are now deprecated in
favor of new functions PQescapeStringConn and PQescapeByteaConn. The new
functions take the PGconn to which the string will be sent as an additional
parameter, and look inside the connection structure to determine what to do.
So as to provide some functionality for clients using the old functions,
libpq stores the latest encoding and standard_conforming_strings values
received from the backend in static variables, and the old functions consult
these variables. This will work reliably in clients using only one Postgres
connection at a time, or even multiple connections if they all use the same
encoding and string syntax settings; which should cover many practical
scenarios.
Clients that use homebrew escaping methods, such as PHP's addslashes()
function or even hardwired regexp substitution, will require extra effort
to fix :-(. It is strongly recommended that such code be replaced by use of
PQescapeStringConn/PQescapeByteaConn if at all feasible.
parser will allow "\'" to be used to represent a literal quote mark. The
"\'" representation has been deprecated for some time in favor of the
SQL-standard representation "''" (two single quote marks), but it has been
used often enough that just disallowing it immediately won't do. Hence
backslash_quote allows the settings "on", "off", and "safe_encoding",
the last meaning to allow "\'" only if client_encoding is a valid server
encoding. That is now the default, and the reason is that in encodings
such as SJIS that allow 0x5c (ASCII backslash) to be the last byte of a
multibyte character, accepting "\'" allows SQL-injection attacks as per
CVE-2006-2314 (further details will be published after release). The
"on" setting is available for backward compatibility, but it must not be
used with clients that are exposed to untrusted input.
Thanks to Akio Ishida and Yasuo Ohgaki for identifying this security issue.
characters in all cases. Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release). Embedded
zero (null) bytes will be rejected as well. The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated. Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.
Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.
Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.
deciding whether a potential additional indexscan is redundant or not. As now
coded, any use of a partial index that was already used in a previous AND arm
will be rejected as redundant. This might be overly restrictive, but not
considering the point at all is definitely bad, as per example in bug #2441
from Arjen van der Meijden. In particular, a clauseless scan of a partial
index was *never* considered redundant by the previous coding, and that's
surely wrong. Being more flexible would also require some consideration
of how not to double-count the index predicate's selectivity.
the partial index predicate in the scan's "recheck condition". Otherwise,
if the scan becomes lossy for lack of bitmap memory, we would fail to enforce
that returned rows satisfy the predicate. Noted while studying bug #2441
from Arjen van der Meijden.
initPlan sets a parameter for another. This could not (I think) happen before
8.1, but it's possible now because the initPlans generated by MIN/MAX
optimization might themselves use initPlans. We attach those initPlans as
siblings of the MIN/MAX ones, not children, to avoid duplicate computation
when multiple MIN/MAX aggregates are present; so this leads to the case of an
initPlan needing the result of a sibling initPlan, which is not possible with
ordinary query nesting. Hadn't been noticed because in most contexts having
too much stuff listed in extParam is fairly harmless. Fixes "plan should not
reference subplan's variable" bug reported by Catalin Pitis.
the union of its child relations as well. This might have been a good idea
when it was originally coded, but it's a fatally bad idea when inheritance is
being used for partitioning. It's better to have no stats at all than
completely misleading stats. Per report from Mark Liberman.
The bug arguably exists all the way back, but I've only patched HEAD and 8.1
because we weren't particularly trying to support partitioning before 8.1.
Eventually we ought to look at deriving union statistics instead of just
punting, but for now the drop kick looks good.
MIN/MAX not be converted to use an index if the query WHERE clause contains
any volatile functions or subplans.
I had originally feared that the conversion might alter the behavior of such a
query with respect to a volatile function. Well, so it might, but only in the
sense that the function would get evaluated at a subset of the table rows
rather than all of them --- and we have never made any such guarantee anyway.
(For instance, we don't refuse to use an index for an ordinary non-aggregate
query when one of the non-indexable filter conditions contains a volatile
function.)
The prohibition against subplans was because of worry that that case wasn't
adequately tested, which it wasn't, but it turns out to be possible to make
8.1 fail anyway:
regression=# select o.ten, (select max(unique2) from tenk1 i where ten = o.ten
or ten = (select f1 from int4_tbl limit 1)) from tenk1 o;
ERROR: direct correlated subquery unsupported as initplan
This is due to bogus code in SS_make_initplan_from_plan (it's an initplan,
ergo it can't have any parParams). Having fixed that, we might as well allow
subplans as well as initplans.
set to the large object context ("fscxt"), as this is inevitably a source of
transaction-duration memory leaks. Not sure why we'd not noticed it before;
maybe people weren't touching a whole lot of LOs in the same transaction
before the 8.1 pg_dump changes. Per report from Wayne Conrad.
Backpatched as far as 8.1, but the problem doubtless goes all the way back.
I'm disinclined to spend the time to try to verify that the older branches
would still work if patched, seeing that this code was significantly modified
for 8.0 and again for 8.1, and that we don't have any trouble reports before
8.1. (Maybe the leaks were smaller before?)
implied by the predicate of a partial index being used to scan a table.
However, this optimization is unsafe in an UPDATE, DELETE, or SELECT FOR
UPDATE query, because the quals need to be rechecked by EvalPlanQual if
there's an update conflict. Per example from Jean-Samuel Reynaud.
8.0), and add as suggestion to use log_min_error_statement for this
purpose. I also fixed the code so the first EXECUTE has it's prepare,
rather than the last which is what was in the current code. Also remove
"protocol" prefix for SQL EXECUTE output because it is not accurate.
Backpatch to 8.1.X.
alternatives ("|" symbol). The original coding allowed the added ^ and $
constraints to be absorbed into the first and last alternatives, producing
a pattern that would match more than it should. Per report from Eric Noriega.
I also changed the pattern to add an ARE director ("***:"), ensuring that
SIMILAR TO patterns do not change behavior if regex_flavor is changed. This
is necessary to make the non-capturing parentheses work, and seems like a
good idea on general principles.
Back-patched as far as 7.4. 7.3 also has the bug, but a fix seems impractical
because that version's regex engine doesn't have non-capturing parens.
had a bad side-effect: it stopped finding plans that involved BitmapAnd
combinations of indexscans using both join and non-join conditions. Instead,
make choose_bitmap_and more aggressive about detecting redundancies between
BitmapOr subplans.
at least one join condition as an indexqual. Before bitmap indexscans, this
oversight didn't really cost much except for redundantly considering the
same join paths twice; but as of 8.1 it could result in silly bitmap scans
that would do the same BitmapOr twice and then BitmapAnd these together :-(
identically named user and group: we merge these into a single entity
with LOGIN permission. Also, add ORDER BY commands to ensure consistent
dump ordering, for ease of comparing outputs from different installations.
output, ie, no OR immediately below an OR. Otherwise we get Asserts or
wrong answers for cases such as
select * from tenk1 a, tenk1 b
where (a.ten = b.ten and (a.unique1 = 100 or a.unique1 = 101))
or (a.hundred = b.hundred and a.unique1 = 42);
Per report from Rafael Martinez Guerrero.
startup or recovery process. Since such a process isn't a real backend,
pgstat.c gets confused. This accounts for recent reports of strange
"invalid server process ID -1" log messages during crash recovery.
There isn't any point in attempting to make the report, since we'll discard
stats in such scenarios anyhow.
have symlinks (ie, Windows). Although it'll never be called on to do anything
useful during normal operation on such a platform, it's still needed to
re-create dropped directories during WAL replay.
failures even when the hardware and OS did nothing wrong. Per recent analysis
of a problem report from Alex Bahdushka.
For the moment I've just diked out the test of the parameter, rather than
removing the GUC infrastructure and documentation, in case we conclude that
there's something salvageable there. There seems no chance of it being
resurrected in the 8.1 branch though.
passed extend = true whenever we are reading a page we intend to reinitialize
completely, even if we think the page "should exist". This is because it
might indeed not exist, if the relation got truncated sometime after the
current xlog record was made and before the crash we're trying to recover
from. These two thinkos appear to explain both of the old bug reports
discussed here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php
tuples as needed "to keep VACUUM from complaining", but actually there is
a more compelling reason to do it: failure to do so violates MVCC semantics.
This is because a pre-existing serializable transaction might try to use
the index after we finish (re)building it, and it might fail to find tuples
it should be able to see. We got this mostly right, but not in the case
of partial indexes: the code mistakenly discarded recently-dead tuples for
partial indexes. Fix that, and adjust the comments.
command or expression, rather than one copy for each textual occurrence as
it did before. This might result in some small performance improvement,
but the compelling reason to do it is that not doing so can result in
unexpected grouping failures because the main SQL parser won't see different
parameter numbers as equivalent. Add a regression test for the failure case.
Per report from Robert Davidson.
the logic it contained to switch to insertion sort for near-sorted input was
in fact a big loss, because it could fairly easily be fooled into applying
insertion sort to large subfiles that weren't all that well ordered. Remove
that, and instead add a simple check for already-perfectly-sorted input, as
per suggestion from Dann Corbit. This adds at worst O(N*lgN) overhead, and
usually far less, while sometimes allowing a subfile sort to finish in O(N)
time. Preliminary testing says this is an improvement over the basic
Bentley & McIlroy code for many nonrandom inputs, and it costs almost
nothing when the input is random.
byte-swapping on the port number which causes the call to fail on Intel
Macs.
This patch uses htons() instead of htonl() and fixes this bug.
Ashley Clark
2005-05-13. When we find that a new inner tuple can't possibly match any
outer tuple (because it contains a NULL), we can't immediately skip the
tuple when we are in NEXTINNER state. Doing so can lead to emitting
multiple copies of the tuple in FillInner mode, because we may rescan the
tuple after returning to a previous marked tuple. Instead, proceed to
NEXTOUTER state the same as we used to do. After we've found that there's
no need to return to the marked position, we can go to SKIPINNER_ADVANCE
state instead of SKIP_TEST when the inner tuple is unmatchable; this
preserves the performance improvement. Per bug report from Bruce.
I also made a couple of cosmetic code rearrangements and added a regression
test for the problem.
> I've now tested this patch at home w/ 8.2HEAD and it seems to fix the
> bug. I plan on testing it under 8.1.2 at work tommorow with
> mod_auth_krb5, etc, and expect it'll work there. Assuming all goes
> well and unless someone objects I'll forward the patch to -patches.
> It'd be great to have this fixed as it'll allow us to use Kerberos to
> authenticate to phppgadmin and other web-based tools which use
> Postgres.
While playing with this patch under 8.1.2 at home I discovered a
mistake in how I manually applied one of the hunks to fe-auth.c.
Basically, the base code had changed and so the patch needed to be
modified slightly. This is because the code no longer either has a
freeable pointer under 'name' or has 'name' as NULL.
The attached patch correctly frees the string from pg_krb5_authname
(where it had been strdup'd) if and only if pg_krb5_authname returned
a string (as opposed to falling through and having name be set using
name = pw->name;). Also added a comment to this effect.
Backpatch to 8.1.X.
Stephen Frost
/dev/tty, but it isn't a device file and doesn't work as expected.
This fixes a known bug where psql does not prompt for a password on some
Win32 systems.
Backpatch to 8.1.X.
Robert Kinberg
---------------------------------------------------------------------------
> True, but they're not being used where you'd expect. This seems to be
> something to do with the fact that it's not pg_authid which is being
> accessed, but rather the view pg_roles.
I looked into this and it seems the problem is that the view doesn't
get flattened into the main query because of the has_nullable_targetlist
limitation in prepjointree.c. That's triggered because pg_roles has
'********'::text AS rolpassword
which isn't nullable, meaning it would produce wrong behavior if
referenced above the outer join.
Ultimately, the reason this is a problem is that the planner deals only
in simple Vars while processing joins; it doesn't want to think about
expressions. I'm starting to think that it may be time to fix this,
because I've run into several related restrictions lately, but it seems
like a nontrivial project.
In the meantime, reducing the LEFT JOIN to pg_roles to a JOIN as per
Peter's suggestion seems like the best short-term workaround.
then modified within the same transaction. The code was using a linked list
of active PLpgSQL_expr structs, which was OK when it was written because
plpgsql never released any parse data structures for the life of the backend.
But since Neil fixed plpgsql's memory management, elements of the linked list
could be freed, leading to crash when the list is chased. Per report and test
case from Kris Jurka.
by decompiling the typdefaultbin expression, not just printing the typdefault
text which may be out-of-date or assume the wrong schema search path. (It's
the same hazard as for adbin vs adsrc in column defaults.) The catalogs.sgml
spec for pg_type implies that the correct procedure is to look to
typdefaultbin first and consider typdefault only if typdefaultbin is NULL.
I made dumping of both domains and base types do that, even though in the
current backend code typdefaultbin is always correct for domains and
typdefault for base types --- might as well try to future-proof it a little.
Per bug report from Alexander Galler.
in leaking memory when invoking a PL/Python procedure that raises an
exception. Unfortunately this still leaks memory, but at least the
largest leak has been plugged.
This patch also fixes a reference counting mistake in PLy_modify_tuple()
for 8.0, 8.1 and HEAD: we don't actually own a reference to `platt', so
we shouldn't Py_DECREF() it.
we are not holding a buffer content lock; where it was, InterruptHoldoffCount
is positive and so we'd not respond to cancel signals as intended. Also
add missing vacuum_delay_point() call in btvacuumcleanup. This should fix
complaint from Evgeny Gridasov about failure to respond to SIGINT/SIGTERM
in a timely fashion (bug #2257).
Var referencing the subselect output. While this case could possibly be made
to work, it seems not worth expending effort on. Per report from Magnus
Naeslund(f).
id (CVE-2006-0553). Also fix related bug in SET SESSION AUTHORIZATION that
allows unprivileged users to crash the server, if it has been compiled with
Asserts enabled. The escalation-of-privilege risk exists only in 8.1.0-8.1.2.
However, the Assert-crash risk exists in all releases back to 7.3.
Thanks to Akio Ishida for reporting this problem.
regardless of the current schema search path. Since CREATE OPERATOR CLASS
only allows one default opclass per datatype regardless of schemas, this
should have minimal impact, and it fixes problems with failure to find a
desired opclass while restoring dump files. Per discussion at
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php.
Remove now-redundant-or-unused code in typcache.c and namespace.c,
and backpatch as far as 8.0.
it later. This fixes a problem where EXEC_BACKEND didn't have progname
set, causing a segfault if log_min_messages was set below debug2 and our
own snprintf.c was being used.
Also alway strdup() progname.
Backpatch to 8.1.X and 8.0.X.
to avoid sharing substructure with the lower-level indexquals. This is
currently only an issue if there are SubPlans in the indexquals, which is
uncommon but not impossible --- see bug #2218 reported by Nicholas Vinen.
We use the same kluge for indexqual vs indexqualorig in the index scans
themselves ... would be nice to clean this up someday.
requested sort order. It was assuming that build_index_pathkeys always
generates a pathkey per index column, which was not true if implied equality
deduction had determined that two index columns were effectively equated to
each other. Simplest fix seems to be to install an option that causes
build_index_pathkeys to support this behavior as well as the original one.
Per report from Brian Hirt.
memory in the executor's per-query memory context. It also inefficient:
it invokes get_call_result_type() and TupleDescGetAttInMetadata() for
every call to return_next, rather than invoking them once (per PL/Perl
function call) and memoizing the result.
This patch makes the following changes:
- refactor the code to include all the "per PL/Perl function call" data
inside a single struct, "current_call_data". This means we don't need to
save and restore N pointers for every recursive call into PL/Perl, we
can just save and restore one.
- lookup the return type metadata needed by plperl_return_next() once,
and then stash it in "current_call_data", so as to avoid doing the
lookup for every call to return_next.
- create a temporary memory context in which to evaluate the return
type's input functions. This memory context is reset for each call to
return_next.
The patch appears to fix the memory leak, and substantially reduces
the overhead imposed by return_next.
While we normally prefer the notation "foo.*" for a whole-row Var, that does
not work at SELECT top level, because in that context the parser will assume
that what is wanted is to expand the "*" into a list of separate target
columns, yielding behavior different from a whole-row Var. We have to emit
just "foo" instead in that context. Per report from Sokolov Yura.
because pqSendSome will absorb input data anytime it'd be forced to block.
Avoiding a kernel call per PQputCopyData call helps COPY speed materially.
Alon Goldshuv
to try to create a log segment file concurrently, but the code erroneously
specified O_EXCL to open(), resulting in a needless failure. Before 7.4,
it was even a PANIC condition :-(. Correct code is actually simpler than
what we had, because we can just say O_CREAT to start with and not need a
second open() call. I believe this accounts for several recent reports of
hard-to-reproduce "could not create file ...: File exists" errors in both
pg_clog and pg_subtrans.
temp table not only our own process' tables. It's not real important
since vacuum.c will skip temp tables anyway, but might as well make the
code do what it claims to do.
index's support-function cache (in index_getprocinfo). Since none of that
data can change for an index that's in active use, it seems sufficient to
treat all open indexes the same way we were treating "nailed" system indexes
--- that is, just re-read the pg_class row and leave the rest of the relcache
entry strictly alone. The pg_class re-read might not be strictly necessary
either, but since the reltablespace and relfilenode can change in normal
operation it seems safest to do it. (We don't support changing any of the
other info about an index at all, at the moment.)
Back-patch as far as 8.0. It might be possible to adapt the patch to 7.4,
but it would take more work than I care to expend for such a low-probability
problem. 7.3 is out of luck for sure.
occurs when it tries to heap_open pg_tablespace. When control returns to
smgrcreate, that routine will be holding a dangling pointer to a closed
SMgrRelation, resulting in mayhem. This is of course a consequence of
the violation of proper module layering inherent in having smgr.c call
a tablespace command routine, but the simplest fix seems to be to change
the locking mechanism. There's no real need for TablespaceCreateDbspace
to touch pg_tablespace at all --- it's only opening it as a way of locking
against a parallel DROP TABLESPACE command. A much better answer is to
create a special-purpose LWLock to interlock these two operations.
This drops TablespaceCreateDbspace quite a few layers down the food chain
and makes it something reasonably safe for smgr to call.
This is utterly insignificant in normal operation, but it becomes a
problem during cache inval stress testing. The original coding in fact
had no leak --- the 8.0 List rewrite created the issue. I wonder whether
list_concat should pfree the discarded header?
files: avoid creating stats hashtable entries for tables that aren't being
touched except by vacuum/analyze, ensure that entries for dropped tables are
removed promptly, and tweak the data layout to avoid storing useless struct
padding. Also improve the performance of pgstat_vacuum_tabstat(), and make
sure that autovacuum invokes it exactly once per autovac cycle rather than
multiple times or not at all. This should cure recent complaints about 8.1
showing much higher stats I/O volume than was seen in 8.0. It'd still be a
good idea to revisit the design with an eye to not re-writing the entire
stats dataset every half second ... but that would be too much to backpatch,
I fear.