postgresql/src/include
Tom Lane 6498287696 Handle constant inputs to corr() and related aggregates more precisely.
The SQL standard says that corr() and friends should return NULL in
the mathematically-undefined case where all the inputs in one of
the columns have the same value.  We were checking that by seeing
if the sums Sxx and Syy were zero, but that approach is very
vulnerable to roundoff error: if a sum is close to zero but not
exactly that, we'd come out with a pretty silly non-NULL result.

Instead, directly track whether the inputs are all equal by
remembering the common value in each column.  Once we detect
that a new input is different from before, represent that by
storing NaN for the common value.  (An objection to this scheme
is that if the inputs are all NaN, we will consider that they
were not all equal.  But under IEEE float arithmetic rules,
one NaN is never equal to another, so this behavior is arguably
correct.  Moreover it matches what we did before in such cases.)
Then, leave the sums at their exact value of zero for as long
as we haven't detected different input values.

This solution requires the aggregate transition state to contain
8 float values not 6, which is not problematic, and it seems to add
less than 1% to the aggregates' runtime, which seems acceptable.

While we're here, improve corr()'s final function to cope with
overflow/underflow in the final calculation, and to clamp its
result to [-1, 1] in case of roundoff error.

Although this is arguably a bug fix, it requires a catversion bump
due to the change in aggregates' initial states, so it can't be
back-patched.

Patch written by me, but many of the ideas are due to Dean Rasheed,
who also did a deal of testing.

Bug: #19340
Reported-by: Oleg Ivanov <o15611@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org
2025-12-06 18:31:26 -05:00
..
access Set next multixid's offset when creating a new multixid 2025-12-03 19:15:08 +02:00
archive Update copyright for 2025 2025-01-01 11:21:55 -05:00
backup Add backup_type column to pg_stat_progress_basebackup. 2025-08-05 10:50:45 -07:00
bootstrap Allow redeclaration of typedef yyscan_t 2025-09-12 08:16:00 +02:00
catalog Handle constant inputs to corr() and related aggregates more precisely. 2025-12-06 18:31:26 -05:00
commands Move WAL sequence code into its own file 2025-12-01 16:21:41 +09:00
common Add pg_add_size_overflow() and friends 2025-11-24 09:59:38 -08:00
datatype Avoid using timezone Asia/Manila in regression tests. 2025-01-20 15:47:53 -05:00
executor Add parallelism support for TID Range Scans 2025-11-27 14:05:04 +13:00
fe_utils Add \pset options for boolean value display 2025-11-03 17:40:39 +01:00
foreign Improve ExplainState type handling in header files 2025-09-15 11:04:10 +02:00
jit jit: Fix type used for Datum values in LLVM IR. 2025-09-17 13:38:35 +12:00
lib Add pairingheap_initialize() for shared memory usage 2025-11-05 11:44:13 +02:00
libpq Fix pg_isblank() 2025-11-28 08:33:07 +01:00
mb Use C11 char16_t and char32_t for Unicode code points. 2025-10-29 14:17:13 -07:00
nodes Fix stray references to SubscriptRef 2025-12-03 14:44:14 +01:00
optimizer Add parallelism support for TID Range Scans 2025-11-27 14:05:04 +13:00
parser Improve detection of implicitly-temporary views. 2025-11-24 17:00:16 -05:00
partitioning Mark function arguments of type "Datum *" as "const Datum *" where possible 2025-10-31 10:47:25 +01:00
pch meson: Increase minimum version to 0.57.2 2025-07-02 11:14:53 +02:00
port Add pg_atomic_unlocked_write_u64 2025-12-03 18:38:20 -05:00
portability Update copyright for 2025 2025-01-01 11:21:55 -05:00
postmaster Add log_autoanalyze_min_duration 2025-10-15 14:31:12 +02:00
regex pg_regc_locale.c: rename some static functions. 2025-10-14 11:04:04 -07:00
replication Add slotsync_skip_reason column to pg_replication_slots view. 2025-11-28 05:21:35 +00:00
rewrite Update various forward declarations to use typedef 2025-09-15 11:04:10 +02:00
snowball Update to latest Snowball sources. 2025-02-18 21:13:54 -05:00
statistics Rework output format of pg_dependencies 2025-11-17 10:44:26 +09:00
storage bufmgr: Turn BUFFER_LOCK_* into an enum 2025-12-03 18:38:20 -05:00
tcop Implement WAIT FOR command 2025-11-05 11:44:13 +02:00
tsearch Update copyright for 2025 2025-01-01 11:21:55 -05:00
utils Rename BUFFERPIN wait event class to BUFFER 2025-12-03 18:38:20 -05:00
.gitignore Use <stdint.h> and <inttypes.h> for c.h integers. 2024-12-04 15:05:38 +13:00
c.h Change Pointer to void * 2025-12-03 10:22:17 +01:00
fmgr.h Remove no longer needed casts to Pointer 2025-12-04 19:40:08 +01:00
funcapi.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
getopt_long.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
Makefile Clean up newly added guc_tables.inc.c 2025-09-04 17:25:43 +02:00
meson.build meson: add and use stamp files for generated headers 2025-08-11 15:18:23 -04:00
miscadmin.h Avoid mixing void and integer in a conditional expression. 2025-11-02 12:30:44 -05:00
pg_config.h.in Re-run autoheader 2025-11-06 07:37:22 +01:00
pg_config_manual.h Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h 2025-11-10 16:11:41 +02:00
pg_getopt.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
pg_trace.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
pgstat.h Rename column slotsync_skip_at to slotsync_last_skip. 2025-12-05 04:12:55 +00:00
pgtar.h Update copyright for 2025 2025-01-01 11:21:55 -05:00
pgtime.h Seek zone abbreviations in the IANA data before timezone_abbreviations. 2025-01-16 14:11:19 -05:00
port.h Inline pg_ascii_tolower() and pg_ascii_toupper(). 2025-11-26 10:04:32 -08:00
postgres.h Grab the low-hanging fruit from forcing USE_FLOAT8_BYVAL to true. 2025-08-13 17:18:22 -04:00
postgres_ext.h Move pg_int64 back to postgres_ext.h 2025-09-16 10:48:56 +02:00
postgres_fe.h IWYU widely useful pragmas 2025-01-15 18:57:53 +01:00
varatt.h Convert varatt.h access macros to static inline functions. 2025-08-05 17:01:25 +02:00
windowapi.h Add IGNORE NULLS/RESPECT NULLS option to Window functions. 2025-10-03 09:47:36 +09:00