opnsense-src

mirror of https://github.com/opnsense/src.git synced 2026-03-06 15:20:53 -05:00

History

Kyle Evans adeebf4cd4 regex(3): Interpret many escaped ordinary characters as EESCAPE In IEEE 1003.1-2008 [1] and earlier revisions, BRE/ERE grammar allows for any character to be escaped, but "ORD_CHAR preceded by an unescaped <backslash> character [gives undefined results]". Historically, we've interpreted an escaped ordinary character as the ordinary character itself. This becomes problematic when some extensions give special meanings to an otherwise ordinary character (e.g. GNU's \b, \s, \w), meaning we may have two different valid interpretations of the same sequence. To make this easier to deal with and given that the standard calls this undefined, we should throw an error (EESCAPE) if we run into this scenario to ease transition into a state where some escaped ordinaries are blessed with a special meaning -- it will either error out or have extended behavior, rather than have two entirely different versions of undefined behavior that leave the consumer of regex(3) guessing as to what behavior will be used or leaving them with false impressions. This change bumps the symbol version of regcomp to FBSD_1.6 and provides the old escape semantics for legacy applications, just in case one has an older application that would immediately turn into a pumpkin because of an extraneous escape that's embedded or otherwise critical to its operation. This is the final piece needed before enhancing libregex with GNU extensions and flipping the switch on bsdgrep. [1] http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/ PR: 229925 (exp-run, courtesy of antoine) Differential Revision: https://reviews.freebsd.org/D10510		2020-07-29 23:21:56 +00:00
..
data	regex(3): Interpret many escaped ordinary characters as EESCAPE	2020-07-29 23:21:56 +00:00
debug.c	Merge content currently under test from ^/vendor/NetBSD/tests/dist/@r312123	2017-01-14 06:49:17 +00:00
main.c
README
split.c
t_exhaust.c	Only skip problematic test in CI env.	2019-09-11 18:40:05 +00:00
t_regex.sh
t_regex_att.c	Pull in ^/vendor/NetBSD/tests/dist@r312219	2017-01-15 10:04:20 +00:00
test_regex.h

README

regular expression test set
Lines are at least three fields, separated by one or more tabs.  "" stands
for an empty field.  First field is an RE.  Second field is flags.  If
C flag given, regcomp() is expected to fail, and the third field is the
error name (minus the leading REG_).

Otherwise it is expected to succeed, and the third field is the string to
try matching it against.  If there is no fourth field, the match is
expected to fail.  If there is a fourth field, it is the substring that
the RE is expected to match.  If there is a fifth field, it is a comma-
separated list of what the subexpressions should match, with - indicating
no match for that one.  In both the fourth and fifth fields, a (sub)field
starting with @ indicates that the (sub)expression is expected to match
a null string followed by the stuff after the @; this provides a way to
test where null strings match.  The character `N' in REs and strings
is newline, `S' is space, `T' is tab, `Z' is NUL.

The full list of flags:
  -	placeholder, does nothing
  b	RE is a BRE, not an ERE
  &	try it as both an ERE and a BRE
  C	regcomp() error expected, third field is error name
  i	REG_ICASE
  m	("mundane") REG_NOSPEC
  s	REG_NOSUB (not really testable)
  n	REG_NEWLINE
  ^	REG_NOTBOL
  $	REG_NOTEOL
  #	REG_STARTEND (see below)
  p	REG_PEND

For REG_STARTEND, the start/end offsets are those of the substring
enclosed in ().