postgresql/src/backend
Andrew Gierth c8ea87e4bd Avoid quadratic slowdown in regexp match/split functions.
regexp_matches, regexp_split_to_table and regexp_split_to_array all
work by compiling a list of match positions as character offsets (NOT
byte positions) in the source string.

Formerly, they then used text_substr to extract the matched text; but
in a multi-byte encoding, that counts the characters in the string,
and the characters needed to reach the starting byte position, on
every call. Accordingly, the performance degraded as the product of
the input string length and the number of match positions, such that
splitting a string of a few hundred kbytes could take many minutes.

Repair by keeping the wide-character copy of the input string
available (only in the case where encoding_max_length is not 1) after
performing the match operation, and extracting substrings from that
instead. This reduces the complexity to being linear in the number of
result bytes, discounting the actual regexp match itself (which is not
affected by this patch).

In passing, remove cleanup using retail pfree() which was obsoleted by
commit ff428cded (Feb 2008) which made cleanup of SRF multi-call
contexts automatic. Also increase (to ~134 million) the maximum number
of matches and provide an error message when it is reached.

Backpatch all the way because this has been wrong forever.

Analysis and patch by me; review by Kaiting Chen.

Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk

see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk
2018-08-28 12:17:33 +01:00
..
access Deduplicate code between slot_getallattrs() and slot_getsomeattrs(). 2018-08-23 16:58:53 -07:00
bootstrap Use a ResourceOwner to track buffer pins in all cases. 2018-07-18 12:15:16 -04:00
catalog Clarify comment about assignment and reset of temp namespace ID in MyProc 2018-08-21 08:32:18 +09:00
commands Improve VACUUM and ANALYZE by avoiding early lock queue 2018-08-27 09:11:12 +09:00
executor Set scan direction appropriately for SubPlans (bug #15336) 2018-08-17 15:44:13 +01:00
foreign Remove bogus "extern" annotations on function definitions. 2018-02-19 12:07:44 -05:00
jit LLVMJIT: LLVMGetHostCPUFeatures now is upstream, use LLMV version if available. 2018-08-24 10:21:38 -07:00
lib doc: Update redirecting links 2018-07-16 10:48:05 +02:00
libpq Suppress uninitialized-variable warning in new SCRAM code. 2018-08-24 10:51:10 -04:00
main Update copyright for 2018 2018-01-02 23:30:12 -05:00
nodes Fix run-time partition pruning for appends with multiple source rels. 2018-08-01 19:42:52 -04:00
optimizer Fix wrong order of operations in inheritance_planner. 2018-08-11 15:53:20 -04:00
parser Fix lexing of standard multi-character operators in edge cases. 2018-08-23 21:42:40 +01:00
partitioning Fix typos. 2018-08-27 09:32:59 +12:00
po Translation updates 2018-06-25 12:37:18 +02:00
port Remove obsolete netbsd dynloader code 2018-08-13 23:21:01 +02:00
postmaster Make syslogger more robust against failures in opening CSV log files. 2018-08-26 14:21:55 -04:00
regex Clean up warnings from -Wimplicit-fallthrough. 2018-05-01 19:35:08 -04:00
replication Reconsider new file extension in commit 91f26d5f. 2018-08-25 22:52:46 -07:00
rewrite Fix set of NLS translation issues 2018-08-21 15:17:13 +09:00
snowball Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers. 2018-01-26 18:25:14 -05:00
statistics Fix typos. 2018-08-27 09:32:59 +12:00
storage Introduce minimal C99 usage to verify compiler support. 2018-08-23 18:36:07 -07:00
tcop Introduce minimal C99 usage to verify compiler support. 2018-08-23 18:36:07 -07:00
tsearch Hand code string to integer conversion for performance. 2018-07-22 14:58:23 -07:00
utils Avoid quadratic slowdown in regexp match/split functions. 2018-08-28 12:17:33 +01:00
.gitignore Add .gitignore entries for AIX-specific intermediate build artifacts. 2015-07-08 20:44:22 -04:00
common.mk Remove PARTIAL_LINKING build mode. 2018-03-30 17:33:04 -07:00
Makefile Rearrange makefile rules for running Gen_fmgrtab.pl. 2018-05-03 17:54:18 -04:00
nls.mk Translation updates 2018-06-25 12:37:18 +02:00