This patch converts the input functions for bool, int2, int4, int8,
float4, float8, numeric, and contrib/cube to the new soft-error style.
array_in and record_in are also converted. There's lots more to do,
but this is enough to provide proof-of-concept that the soft-error
API is usable, as well as reference examples for how to convert
input functions.
This patch is mostly by me, but it owes very substantial debt to
earlier work by Nikita Glukhov, Andrew Dunstan, and Amul Sul.
Thanks to Andres Freund for review.
Discussion: https://postgr.es/m/3bbbb0df-7382-bf87-9737-340ba096e034@postgrespro.ru
This makes the choice of result scale of numeric power() for integer
exponents consistent with the choice for non-integer exponents, and
with the result scale of other numeric functions. Specifically, the
result scale will be at least as large as the scale of either input,
and sufficient to ensure that the result has at least 16 significant
digits.
Formerly, the result scale was based only on the scale of the first
input, without taking into account the weight of the result. For
results with negative weight, that could lead to results with very few
or even no non-zero significant digits (e.g., 10.0 ^ (-18) produced
0.0000000000000000).
Fix this by moving responsibility for the choice of result scale into
power_var_int(), which already has code to estimate the result weight.
Per report by Adrian Klaver and suggested fix by Tom Lane.
No back-patch -- arguably this is a bug fix, but one which is easy to
work around, so it doesn't seem worth the risk of changing query
results in stable branches.
Discussion: https://postgr.es/m/12a40226-70ac-3a3b-3d3a-fdaf9e32d312%40aklaver.com
This fixes a loss of precision that occurs when the first input is
very close to 1, so that its logarithm is very small.
Formerly, during the initial low-precision calculation to estimate the
result weight, the logarithm was computed to a local rscale that was
capped to NUMERIC_MAX_DISPLAY_SCALE (1000). However, the base may be
as close as 1e-16383 to 1, hence its logarithm may be as small as
1e-16383, and so the local rscale needs to be allowed to exceed 16383,
otherwise all precision is lost, leading to a poor choice of rscale
for the full-precision calculation.
Fix this by removing the cap on the local rscale during the initial
low-precision calculation, as we already do in the full-precision
calculation. This doesn't change the fact that the initial calculation
is a low-precision approximation, computing the logarithm to around 8
significant digits, which is very fast, especially when the base is
very close to 1.
Patch by me, reviewed by Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCV-Ceu%2BHpRMf416yUe4KKFv%3DtdgXQAe5-7S9tD%3D5E-T1g%40mail.gmail.com
Formerly, the numeric code tested whether an integer value of a larger
type would fit in a smaller type by casting it to the smaller type and
then testing if the reverse conversion produced the original value.
That's perfectly fine, except that it caused a test failure on
buildfarm animal castoroides, most likely due to a compiler bug.
Instead, do these tests by comparing against PG_INT16/32_MIN/MAX. That
matches existing code in other places, such as int84(), which is more
widely tested, and so is less likely to go wrong.
While at it, add regression tests covering the numeric-to-int8/4/2
conversions, and adjust the recently added tests to the style of
434ddfb79a (on the v11 branch) to make failures easier to diagnose.
Per buildfarm via Tom Lane, reviewed by Tom Lane.
Discussion: https://postgr.es/m/2394813.1628179479%40sss.pgh.pa.us
This fixes a long-standing bug when using to_char() to format a
numeric value in scientific notation -- if the value's exponent is
less than -NUMERIC_MAX_DISPLAY_SCALE-1 (-1001), it produced a
division-by-zero error.
The reason for this error was that get_str_from_var_sci() divides its
input by 10^exp, which it produced using power_var_int(). However, the
underflow test in power_var_int() causes it to return zero if the
result scale is too small. That's not a problem for power_var_int()'s
only other caller, power_var(), since that limits the rscale to 1000,
but in get_str_from_var_sci() the exponent can be much smaller,
requiring a much larger rscale. Fix by introducing a new function to
compute 10^exp directly, with no rscale limit. This also allows 10^exp
to be computed more efficiently, without any numeric multiplication,
division or rounding.
Discussion: https://postgr.es/m/CAEZATCWhojfH4whaqgUKBe8D5jNHB8ytzemL-PnRx+KCTyMXmg@mail.gmail.com
This fixes a couple of related problems that arise when raising
numbers to very large powers.
Firstly, when raising a negative number to a very large integer power,
the result should be well-defined, but the previous code would only
cope if the exponent was small enough to go through power_var_int().
Otherwise it would throw an internal error, attempting to take the
logarithm of a negative number. Fix this by adding suitable handling
to the general case in power_var() to cope with negative bases,
checking for integer powers there.
Next, when raising a (positive or negative) number whose absolute
value is slightly less than 1 to a very large power, the result should
approach zero as the power is increased. However, in some cases, for
sufficiently large powers, this would lose all precision and return 1
instead of 0. This was due to the way that the local_rscale was being
calculated for the final full-precision calculation:
local_rscale = rscale + (int) val - ln_dweight + 8
The first two terms on the right hand side are meant to give the
number of significant digits required in the result ("val" being the
estimated result weight). However, this failed to account for the fact
that rscale is clipped to a maximum of NUMERIC_MAX_DISPLAY_SCALE
(1000), and the result weight might be less then -1000, causing their
sum to be negative, leading to a loss of precision. Fix this by
forcing the number of significant digits calculated to be nonnegative.
It's OK for it to be zero (when the result weight is less than -1000),
since the local_rscale value then includes a few extra digits to
ensure an accurate result.
Finally, add additional underflow checks to exp_var() and power_var(),
so that they consistently return zero for cases like this where the
result is indistinguishable from zero. Some paths through this code
already returned zero in such cases, but others were throwing overflow
errors.
Dean Rasheed, reviewed by Yugo Nagata.
Discussion: http://postgr.es/m/CAEZATCW6Dvq7+3wN3tt5jLj-FyOcUgT5xNoOqce5=6Su0bCR0w@mail.gmail.com
Formerly, when specifying NUMERIC(precision, scale), the scale had to
be in the range [0, precision], which was per SQL spec. This commit
extends the range of allowed scales to [-1000, 1000], independent of
the precision (whose valid range remains [1, 1000]).
A negative scale implies rounding before the decimal point. For
example, a column might be declared with a scale of -3 to round values
to the nearest thousand. Note that the display scale remains
non-negative, so in this case the display scale will be zero, and all
digits before the decimal point will be displayed.
A scale greater than the precision supports fractional values with
zeros immediately after the decimal point.
Take the opportunity to tidy up the code that packs, unpacks and
validates the contents of a typmod integer, encapsulating it in a
small set of new inline functions.
Bump the catversion because the allowed contents of atttypmod have
changed for numeric columns. This isn't a change that requires a
re-initdb, but negative scale values in the typmod would confuse old
backends.
Dean Rasheed, with additional improvements by Tom Lane. Reviewed by
Tom Lane.
Discussion: https://postgr.es/m/CAEZATCWdNLgpKihmURF8nfofP0RFtAKJ7ktY6GcZOPnMfUoRqA@mail.gmail.com
This fixes an overflow error when using the numeric * operator if the
result has more than 16383 digits after the decimal point by rounding
the result. Overflow errors should only occur if the result has too
many digits *before* the decimal point.
Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
Formerly various numeric aggregate functions supported parallel
aggregation by having each worker convert partial aggregate values to
Numeric and use numeric_send() as part of serializing their state.
That's problematic, since the range of Numeric is smaller than that of
NumericVar, so it's possible for it to overflow (on either side of the
decimal point) in cases that would succeed in non-parallel mode.
Fix by serializing NumericVars instead, to avoid the overflow risk and
ensure that parallel and non-parallel modes work the same.
A side benefit is that this improves the efficiency of the
serialization/deserialization code, which can make a noticeable
difference to performance with large numbers of parallel workers.
No back-patch due to risk from changing the binary format of the
aggregate serialization states, as well as lack of prior field
complaints and low probability of such overflows in practice.
Patch by me. Thanks to David Rowley for review and performance
testing, and Ranier Vilela for an additional suggestion.
Discussion: https://postgr.es/m/CAEZATCUmeFWCrq2dNzZpRj5+6LfN85jYiDoqm+ucSXhb9U2TbA@mail.gmail.com
In power_var_int(), the computation of the number of significant
digits to use in the computation used log(Abs(exp)), which isn't safe
because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs()
instead of Abs(), so that the exponent is cast to a double before the
absolute value is taken.
Back-patch to 9.6, where this was introduced (by 7d9a4737c2).
Discussion: https://postgr.es/m/CAEZATCVd6pMkz=BrZEgBKyqqJrt2xghr=fNc8+Z=5xC6cgWrWA@mail.gmail.com
Many older tests where written in a style like
SELECT '' AS two, i.* FROM INT2_TBL
where the first column indicated the number of expected result rows.
This has gotten increasingly out of date, as the test data fixtures
have expanded, so a lot of these were wrong and misleading. Moreover,
this style isn't really necessary, since the psql output already shows
the number of result rows.
To clean this up, remove all those extra columns.
Discussion: https://www.postgresql.org/message-id/flat/1a25312b-2686-380d-3c67-7a69094a999f%40enterprisedb.com
Multiply before dividing, not the reverse, so that cases that should
produce exact results do produce exact results. (width_bucket_float8
got this right already.) Even when the result is inexact, this avoids
making it more inexact, since only the division step introduces any
imprecision.
While at it, fix compute_bucket() to not uselessly repeat the sign
check already done by its caller, and avoid duplicating the
multiply/divide steps by adjusting variable usage.
Per complaint from Martin Visser. Although this seems like a bug fix,
I'm hesitant to risk changing width_bucket()'s results in stable
branches, so no back-patch.
Discussion: https://postgr.es/m/6FA5117D-6AED-4656-8FEF-B74AC18FAD85@brytlyt.com
While the calculation is not well-defined if the bounds arguments are
infinite, there is a perfectly sane outcome if the test operand is
infinite: it's just like any other value that's before the first bucket
or after the last one. width_bucket_float8() got this right, but
I was too hasty about the case when adding infinities to numerics
(commit a57d312a7), so that width_bucket_numeric() just rejected it.
Fix that, and sync the relevant error message strings.
No back-patch needed, since infinities-in-numeric haven't shipped yet.
Discussion: https://postgr.es/m/2465409.1602170063@sss.pgh.pa.us
The "!" operator is our only built-in postfix operator. Remove it,
on the way to removal of grammar support for postfix operators.
There is also a "!!" prefix operator, but since it's been marked
deprecated for most of its existence, we might as well remove it too.
Also zap the SQL alias function numeric_fac(), which seems to have
equally little reason to live.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
Add infinities that behave the same as they do in the floating-point
data types. Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.
The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.
Patch by me, reviewed by Dean Rasheed and Andrew Gierth
Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
Instead of using Newton's method to compute numeric square roots, use
the Karatsuba square root algorithm, which performs better for numbers
of all sizes. In practice, this is 3-5 times faster for inputs with
just a few digits and up to around 10 times faster for larger inputs.
Also, the new algorithm guarantees that the final digit of the result
is correctly rounded, since it computes an integer square root with
truncation, containing at least 1 extra decimal digit before rounding.
The former algorithm would occasionally round the wrong way because
it rounded both the intermediate and final results.
In addition, arrange for sqrt_var() to explicitly support negative
rscale values (rounding before the decimal point). This allows the
argument reduction phase of ln_var() to be optimised for large inputs,
since it only needs to compute square roots with a few more digits
than the final ln() result, rather than computing all the digits
before the decimal point. For very large inputs, this can be many
thousands of times faster.
In passing, optimise div_var_fast() in a couple of places where it was
doing unnecessary work.
Patch be me, reviewed by Tom Lane and Tels.
Discussion: https://postgr.es/m/CAEZATCV1A7+jD3P30Zu31KjaxeSEyOn3v9d6tYegpxcq3cQu-g@mail.gmail.com
In commit 6bdf1303b, we ensured that power()/^ for float8 would honor
the NaN behaviors specified by POSIX standards released in this century,
ie NaN ^ 0 = 1 and 1 ^ NaN = 1. However, numeric_power() was not
touched and continued to follow the once-common behavior that every
case involving NaN input produces NaN. For consistency, let's switch
the numeric behavior to the modern spec in the same release that ensures
that behavior for float8.
(Note that while 6bdf1303b was initially back-patched, we later undid
that, concluding that any behavioral change should appear only in v11.)
Discussion: https://postgr.es/m/10898.1526421338@sss.pgh.pa.us
This code evidently intended to treat backslash as an escape character
within double-quoted substrings, but it was sufficiently confused that
cases like ..."foo\\"... did not work right: the second backslash
managed to quote the double-quote after it, despite being quoted itself.
Rewrite to get that right, while preserving the existing behavior
outside double-quoted substrings, which is that backslash isn't special
except in the combination \".
Comparing to Oracle, it seems that their version of to_char() for
timestamps allows literal alphanumerics only within double quotes, while
non-alphanumerics are allowed outside quotes; backslashes aren't special
anywhere; there is no way at all to emit a literal double quote.
(Bizarrely, their to_char() for numbers is different; it doesn't allow
literal text at all AFAICT.) The fact that they don't treat backslash
as special justifies our existing behavior for backslash outside double
quotes. I considered making backslash inside double quotes act the same
way (ie, special only if before "), which in a green field would be a
more consistent behavior. But that would likely break more existing SQL
code than what this patch does.
Add some test cases illustrating this behavior. (Only the last new
case actually changes behavior in this commit.)
Little of this behavior was documented, either, so fix that.
Discussion: https://postgr.es/m/3626.1510949486@sss.pgh.pa.us
Non-data template patterns would consume characters whether or not those
characters were what the pattern expected, for example
SELECT TO_NUMBER('1234', '9,999');
produced 134 because the '2' got eaten by the comma pattern. This seems
undesirable, not least because it doesn't happen in Oracle. For the ','
and 'G' template patterns, we can fix this by consuming characters only
if they match what the pattern would output. For non-data patterns such
as 'L' and 'TH', it seems impractical to tighten things up to the point of
consuming only exact matches to what the pattern would output; but we can
improve matters quite a lot by redefining the behavior as "consume only
characters that aren't digits, signs, decimal point, or comma".
Also, fix it so that the behavior is to consume the number of *characters*
the pattern would output, not the number of *bytes*. The old coding would
do surprising things with non-ASCII currency symbols, for example. (It
would be good to apply that rule for literal text as well, but this commit
only fixes it for non-data patterns.)
Oliver Ford, reviewed by Thomas Munro and Nathan Wagner, and whacked around
a bit more by me
Discussion: https://postgr.es/m/CAGMVOdvpbMqPf9XWNzOwBpzJfErkydr_fEGhmuDGa015z97mwg@mail.gmail.com
float8_numeric() and float4_numeric() failed to consider the possibility
that the input is an IEEE infinity. The results depended on the
platform-specific behavior of sprintf(): on most platforms you'd get
something like
ERROR: invalid input syntax for type numeric: "inf"
but at least on Windows it's possible for the conversion to succeed and
deliver a finite value (typically 1), due to a nonstandard output format
from sprintf and lack of syntax error checking in these functions.
Since our numeric type lacks the concept of infinity, a suitable conversion
is impossible; the best thing to do is throw an explicit error before
letting sprintf do its thing.
While at it, let's use snprintf not sprintf. Overrunning the buffer
should be impossible if sprintf does what it's supposed to, but this
is cheap insurance against a stack smash if it doesn't.
Problem reported by Taiki Kondo. Patch by me based on fix suggestion
from KaiGai Kohei. Back-patch to all supported branches.
Discussion: https://postgr.es/m/12A9442FBAE80D4E8953883E0B84E088C8C7A2@BPXM01GP.gisp.nec.co.jp
This introduces a numeric sum accumulator, which performs better than
repeatedly calling add_var(). The performance comes from using wider digits
and delaying carry propagation, tallying positive and negative values
separately, and avoiding a round of palloc/pfree on every value. This
speeds up SUM(), as well as other standard aggregates like AVG() and
STDDEV() that also calculate a sum internally.
Reviewed-by: Andrey Borodin
Discussion: <c0545351-a467-5b76-6d46-4840d1ea8aa4@iki.fi>
Set the "rscales" for intermediate-result calculations to ensure that
suitable numbers of significant digits are maintained throughout. The
previous coding hadn't thought this through in any detail, and as a result
could deliver results with many inaccurate digits, or in the worst cases
even fail with divide-by-zero errors as a result of losing all nonzero
digits of intermediate results.
In exp_var(), get rid entirely of the logic that separated the calculation
into integer and fractional parts: that was neither accurate nor
particularly fast. The existing range-reduction method of dividing by 2^n
can be applied across the full input range instead of only 0..1, as long as
we are careful to set an appropriate rscale for each step.
Also fix the logic in mul_var() for shortening the calculation when the
caller asks for fewer output digits than an exact calculation would
require. This bug doesn't affect simple multiplications since that code
path asks for an exact result, but it does contribute to accuracy issues
in the transcendental math functions.
In passing, improve performance of mul_var() a bit by forcing the shorter
input to be on the left, thus reducing the number of iterations of the
outer loop and probably also reducing the number of carry-propagation
steps needed.
This is arguably a bug fix, but in view of the lack of field complaints,
it does not seem worth the risk of back-patching.
Dean Rasheed
mul_var() postpones propagating carries until it risks overflow in its
internal digit array. However, the logic failed to account for the
possibility of overflow in the carry propagation step, allowing wrong
results to be generated in corner cases. We must slightly reduce the
when-to-propagate-carries threshold to avoid that.
Discovered and fixed by Dean Rasheed, with small adjustments by me.
This has been wrong since commit d72f6c7503,
so back-patch to all supported branches.
Revert "to_char(float4/8): zero pad to specified length". There are
too many platform-specific problems, and the proper rounding is missing.
Also revert companion patch 9d61b9953c.
Previously, zero padding was limited to the internal length, rather than
the specified length. This allows it to match to_char(int/numeric), which
always padded to the specified length.
Regression tests added.
BACKWARD INCOMPATIBILITY
In generate_series_step_numeric(), the variables "start_num"
and "stop_num" may be potentially freed until the next call.
So they should be put in the location which can survive across calls.
But previously they were not, and which could cause incorrect
behavior of generate_series(numeric, numeric). This commit fixes
this problem by copying them on multi_call_memory_ctx.
Andrew Gierth
The code for raising a NUMERIC value to an integer power wasn't very
careful about large powers. It got an outright wrong answer for an
exponent of INT_MIN, due to failure to consider overflow of the Abs(exp)
operation; which is fixable by using an unsigned rather than signed
exponent value after that point. Also, even though the number of
iterations of the power-computation loop is pretty limited, it's easy for
the repeated squarings to result in ridiculously enormous intermediate
values, which can take unreasonable amounts of time/memory to process,
or even overflow the internal "weight" field and so produce a wrong answer.
We can forestall misbehaviors of that sort by bailing out as soon as the
weight value exceeds what will fit in int16, since then the final answer
must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric
format.
Per off-list report from Pavel Stehule. Back-patch to all supported
branches.
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").
Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
algorithm. This is a good deal slower than our old roundoff-error-prone
code for long inputs, so we keep the old code for use in the transcendental
functions, where everything is approximate anyway. Also create a
user-accessible function div(numeric, numeric) to provide access to the
exact result of trunc(x/y) --- since the regular numeric / operator will
round off its result, simply computing that expression in SQL doesn't
reliably give the desired answer. This fixes bug #3387 and various related
corner cases, and improves the usefulness of PG for high-precision integer
arithmetic.
The implementation is somewhat ugly logic-wise, but I don't see an
easy way to make it more concise.
When writing this, I noticed that my previous implementation of
width_bucket() doesn't handle NaN correctly:
postgres=# select width_bucket('NaN', 1, 5, 5);
width_bucket
--------------
6
(1 row)
AFAICS SQL:2003 does not define a NaN value, so it doesn't address how
width_bucket() should behave here. The patch changes width_bucket() so
that ereport(ERROR) is raised if NaN is specified for the operand or the
lower or upper bounds to width_bucket(). For float8, NaN is disallowed
for any of the floating-point inputs, and +/- infinity is disallowed
for the histogram bounds (but allowed for the operand).
Update docs and regression tests, bump the catversion.
literally.
Add GUC variables:
"escape_string_warning" - warn about backslashes in non-E strings
"escape_string_syntax" - supports E'' syntax?
"standard_compliant_strings" - treats backslashes literally in ''
Update code to use E'' when escapes are used.
a variant of the function for the 'numeric' datatype; it would be possible
to add additional variants for other datatypes, but I haven't done so yet.
This commit includes regression tests and minimal documentation; if we
want developers to actually use this function in applications, we'll
probably need to document what it does more fully.
Regression tests and documentation have both been updated.
SQL2003 requires that both ceiling() and ceil() be present, so I have
documented both spellings. SQL2003 doesn't mention pow() as far as I
can see, so I decided to replace pow() with power() in the documentation:
there is little reason to encourage the continued usage of a function
that isn't compliant with the standard, given a standard-compliant
alternative.
RELEASE NOTES: should state that pow() is considered deprecated
(although I don't see the need to ever remove it.)
any amount of leading or trailing whitespace (where "whitespace"
is defined by isspace()). This is for SQL conformance, as well
as consistency with other numeric types (e.g. oid, numeric).
Also refactor pg_atoi() to avoid looking at errno where not
necessary, and add a bunch of regression tests for the input
to these types.
Implement TIME WITH TIME ZONE type (timetz internal type).
Remap length() for character strings to CHAR_LENGTH() for SQL92
and to remove the ambiguity with geometric length() functions.
Keep length() for character strings for backward compatibility.
Shrink stored views by removing internal column name list from visible rte.
Implement min(), max() for time and timetz data types.
Implement conversion of TIME to INTERVAL.
Implement abs(), mod(), fac() for the int8 data type.
Rename some math functions to generic names:
round(), sqrt(), cbrt(), pow(), etc.
Rename NUMERIC power() function to pow().
Fix int2 factorial to calculate result in int4.
Enhance the Oracle compatibility function translate() to work with string
arguments (from Edwin Ramirez).
Modify pg_proc system table to remove OID holes.
the to_char() source code is large, here are regression tests for
numeric/timestamp/int8 part. It is probably enough test for formatting
code in the formatting.c module. The others (float4/float8/int4) types
share this formatting code and eventual bugs for these types aren't
few probable.
Patch fix timestamp_to_char() for infinity/invalid timestamp too.
Karel