Commit graph

42 commits

Author SHA1 Message Date
Kyle Evans
26e1c38fbb Update copyright e-mail address to @FreeBSD.org address
Approved by:	emaste (mentor)
Differential Revision:	https://reviews.freebsd.org/D11508
2017-07-06 19:53:30 +00:00
Ed Maste
b05c7cdeb1 bsdgrep: bump version number and add Kyle Evans copyright
The following changes have been made over the last couple of months:

Features:

 - With bsdgrep -r, the working directory is implied if no directory is
   specified
 - bsdgrep will now behave as bsdgrep -r does when it's named rgrep
 - bsdgrep now understands -z/--null-data to use \0 as EOL
 - GNU regex compatibility is now indicated with a "GNU compatible" in
   the version string

Fixes:

 - --mmap no longer hangs when coming across an EOF without an
   accompanying EOL
 - -o/--color matching generally improved, now produces earliest /
   longest matches
 - Context output now more closely aligns with GNU grep
 - Zero-length matches no longer exhibit broken behavior
 - Every output line now honors -b/-H/-n flags

Tests have been added for previous regressions as well as other
previously untested behaviors.

Various other fixes have been commited, and refactoring for further /
later improvements has taken place.

(The original submission changed the version string to 2.5.2, but I
decided to use 2.6.0 to reflect the addition of new features.)

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Differential Revision:	https://reviews.freebsd.org/D10982
2017-05-29 13:10:01 +00:00
Ed Maste
8bf4606408 bsdgrep: correct assumptions to prepare for chunking
Correct a couple of minor BSD grep assumptions that are valid for line
processing but not future chunk-based processing.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	bapt, cem
Differential Revision:	https://reviews.freebsd.org/D10824
2017-05-26 02:30:26 +00:00
Ed Maste
6d635d3b32 bsdgrep: Correct per-line line metadata printing
Metadata printing with -b, -H, or -n flags suffered from a few flaws:

1) -b/offset printing was broken when used in conjunction with -o

2) With -o, bsdgrep did not print metadata for every match/line, just
   the first match of a line

3) There were no tests for this

Address these issues by outputting this data per-match if the -o flag is
specified, and prior to outputting any matches if -o but not --color,
since --color alone will not generate a new line of output for every
iteration over the matches.

To correct -b output, fudge the line offset as we're printing matches.

While here, make sure we're using grep_printline in -A context.  Context
printing should *never* look at the parsing context, just the line.

The tests included do not pass with gnugrep in base due to it exhibiting
similar quirky behavior that bsdgrep previously exhibited.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10580
2017-05-20 11:20:03 +00:00
Ed Maste
fe8c9d5bf1 bsdgrep: emit more than MAX_LINE_MATCHES per line
We should not set an arbitrary cap on the number of matches on a line,
and in any case MAX_LINE_MATCHES of 32 is much too low.  Instead, if we
match more than MAX_LINE_MATCHES, keep processing and matching from the
last match until all are found.

For the regression test, we produce 4096 matches (larger than we expect
we'll ever set MAX_LINE_MATCHES) and make sure we actually get 4096
lines of output with the -o flag.

We'll also make sure that every distinct line is getting its own line
number to detect line metadata not being printed as appropriate along
the way.

PR:		218811
Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reported by:	jbeich
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10577
2017-05-20 03:51:31 +00:00
Ed Maste
b5fc583c27 bsdgrep: don't allow negative -A / -B / -C
Previously, when given a negative -A/-B/-C argument bsdgrep would
overflow the respective context flag(s) and exhibited surprising
behavior.

Fix this by removing unsignedness of Aflag/Bflag and erroring out if
we're given a value < 0.  Also adjust the type used to track 'tail'
context in procfile() so that it accurately reflects the Aflag value
rather than overflowing and losing trailing context.

This also fixes an inconsistency previously existing between -n and
-C "n" behavior.  They are now both limited to LLONG_MAX, to be
consistent.

Add some test cases to make sure grep errors out properly for both
negative context values as well as non-numeric context values rather
than giving bogus matches.

Submitted by:	Kyle Evans <kevans91@ksu.edu>
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10675
2017-05-15 17:51:01 +00:00
Ed Maste
e2127de812 bsdgrep: don't ouptut matches with -c, -l, -L
Refactoring done in r317703 broke -c, -l, and -L flags implying
suppression of match printing.  Fortunately this is just a matter of not
doing any printing of the resulting matches and context printing was not
broken in this refactoring.

Add some regression tests since this area may still see further
refactoring, include different context flags as well even though they
were not broken in this case.

PR:		219077
Submitted by:	Kyle kevans91@ksu.edu
Reported by:	markj
Reviewed by:	cem, ngie
Differential Revision:	https://reviews.freebsd.org/D10607
2017-05-05 17:35:05 +00:00
Ed Maste
83fd8885c4 bsdgrep: correct uninitialized variable introduced in r317703
CID:		1374747
Submitted by:	Kyle Evans <kevans91@ksu.edu>
2017-05-03 13:47:02 +00:00
Ed Maste
a4f3f02be6 bsdgrep: fix -w flag matching with an empty pattern
-w flag matching with an empty pattern was generally 'broken', allowing
matches to occur on any line whether or not it actually matches -w
criteria.

This fix required a good amount of refactoring to address.  procline()
is altered to *only* process the line and return whether it was a match
or not, necessary to be able to short-circuit the whole function in case
of this matchall flag. -m flag handling is moved out as well because it
suffers from the same fate as context handling if we bypass any actual
pattern matching.

The matching context (matches, mostly) didn't previously exist outside
of procline(), so we go ahead and create context object for file
processing bits to pass around.  grep_printline() was created due to
this, for the scenarios where the matches don't actually matter and we
just want to print a line or two, a la flushing the context queue and
no -o or --color specified.

Damage from this broken behavior would have been mitigated by the fact
that it is unlikely users would invoke grep -w with an empty pattern.

This was identified while checking PR 105221 for problems it this may
cause in BSD grep, but PR 105221 is *not* a report of this behavior.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Differential Revision:	https://reviews.freebsd.org/D10433
2017-05-02 20:39:33 +00:00
Ed Maste
945fc991b2 bsdgrep: fix -w -v matching improperly with certain patterns
-w and -v flag matching was mostly functional but had some minor
problems:

1. -w flag processing only allowed one iteration through pattern
   matching on a line. This was problematic if one pattern could match
   more than once, or if there were multiple patterns and the earliest/
   longest match was not the most ideal, and

2. Previous work "fixed" things to not further process a line if the
   first iteration through patterns produced no matches. This is clearly
   wrong if we're dealing with the more restrictive -w matching.

#2 breakage could have also occurred before recent broad rewrites, but
it would be more arbitrary based on input patterns as to whether or not
it actually affected things.

Fix both of these by forcing a retry of the patterns after advancing
just past the start of the first match if we're doing more restrictive
-w matching and we didn't get any hits to start with. Also move -v flag
processing outside of the loop so that we have a greater change to match
in the more restrictive cases. This wasn't strictly wrong, but it could
be a little more error prone.

While here, introduce some regressions tests for this behavior and fix
some excessive wrapping nearby that hindered readability. GNU grep
passes these new tests.

PR:		218467, 218811
Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem, ngie
Differential Revision:	https://reviews.freebsd.org/D10329
2017-05-02 02:32:10 +00:00
Ed Maste
3f39ffc893 bsdgrep: add BSD_GREP_FASTMATCH knob for built-in fastmatch
Bugs have been found in the fastmatch implementation as used in bsdgrep.
Some have been fixed (r316495) while fixes for others are in review
(D10098).

In comparison with the fastmatch implementation, Kyle Evans found that:

- regex(3)'s performance with literal expressions offers a speed
  improvement over fastmatch

- regex(3)'s performance, both with simple BREs and EREs, seems to be
  comparable

The regex implementation was imported in r226035, and the commit message
reports:

    This is a temporary solution until the whole regex library is
    not replaced so that BSD grep development can continue and the
    backported code gets some review and testing. This change only
    improves scalability slightly, there is no big performance boost
    yet but several minor bugs have been found and fixed.

Introduce a WITH_/WITHOUT_BSD_GREP_FASTMATCH knob to support testing
of both approaches.

PR:		175314, 194823
Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	bdrewery (in part)
Differential Revision:	https://reviews.freebsd.org/D10282
2017-04-21 14:36:09 +00:00
Ed Maste
e06ffa3230 bsdgrep: fix zero-length matches without the -o flag
r316477 broke zero-length matches when not using the -o flag, by
skipping over them entirely.

Add a regression test so that it doesn't break again in the future.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem emaste ngie
Differential Revision:	https://reviews.freebsd.org/D10333
2017-04-17 14:59:55 +00:00
Ed Maste
22130a21ba bsdgrep: remove output separators between overlapping segments
Make bsdgrep more sensitive to context overlaps.  If it's printing
context that either overlaps or is immediately adjacent to another bit
of context, don't print a separator.

- Non-overlapping segments no longer have two separators between them

- Overlapping segments no longer have separators between them with
  overlapping sections repeated

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D10105
2017-04-17 13:36:30 +00:00
Ed Maste
a461896a2f bsdgrep: for -r, use the working directory if none specified
This is more sensible than the previous behaviour of grepping stdin,
and matches newer GNU grep behaviour.

PR:		216307
Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem, emaste, ngie
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/
2017-04-17 13:22:39 +00:00
Ed Maste
5ee1ea02fd bsdgrep: add -z/--null-data support
-z treats input and output data as sequences of lines terminated by a
zero byte instead of a newline. This brings it more in line with GNU grep
and brings us closer to passing the current tests with BSD grep.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reviewed by:	cem
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D10101
2017-04-17 13:14:18 +00:00
Ed Maste
a5ed868511 bsdgrep: revert color changes from r316477
r316477 changed the color output to match exactly the in-tree GNU grep,
but introduces unnecessary escape sequences.

Submitted by:	Kyle Evans <kevans91 at ksu.edu>
Reported by:	ache
MFC after:	1 month
MFC with:	r316477
2017-04-04 14:17:50 +00:00
Ed Maste
a734ae9c14 bsdgrep: Initialize vars to avoid a false positive GCC warning
Reported by:	lwhsu
MFC after:	1 month
MFC with:	r316477
2017-04-04 13:34:19 +00:00
Ed Maste
87c485cfb5 bsdgrep: fix matching behaviour
- Set REG_NOTBOL if we've already matched beginning of line and we're
  examining later parts

- For each pattern we examine, apply it to the remaining bits of the
  line rather than (potentially) smaller subsets

- Check for REG_NOSUB after we've looked at all patterns initially
  matching the line

- Keep track of the last match we made to later determine if we're
  simply not matching any longer or if we need to proceed another byte
  because we hit a zero-length match

- Match the earliest and longest bit of each line before moving the
  beginning of what we match to further in the line, past the end of the
  longest match; this generally matches how gnugrep(1) seems to behave,
  and seems like pretty good behavior to me

- Finally, bail out of printing any matches if we were set to print all
  (empty pattern) but -o (output matches) was set

PR:		195763, 180990, 197555, 197531, 181263, 209116
Submitted by:	"Kyle Evans" <kevans91@ksu.edu>
Reviewed by:	cem
MFC after:	1 month
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D10104
2017-04-03 23:16:51 +00:00
Ed Schouten
33f5799a81 Call basename() in a portable way.
Pull a copy of the filename string before calling basename(). Change the
loop to not return on its own, so we can put a free() statement at the
bottom.
2016-07-28 15:19:47 +00:00
Gabor Kovesdan
2ac5b1c763 - Do not look for more matching lines if -L is specified
Submitted by:   eadler (based on)
MFC after:	2 weeks
2014-08-18 12:29:28 +00:00
Pedro F. Giffuni
78bef01fe4 grep: Fix type.
Obtained from:	NetBSD (CVS rev. 1.17)
MFC after:	3 days
2014-07-17 14:51:50 +00:00
Glen Barber
66edec08b0 Fix a bug in bsdgrep(1) where patterns are not correctly
detected.

Certain criteria must be met for this bug to show up:

 * the -w flag is specified, and
 * neither -o or --color are specified, and
 * the pattern is part of another word in the line, and
 * the other word that contains the pattern occurs first

PR:		181973
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2014-06-20 21:53:50 +00:00
Eitan Adler
924500b74d Make bsdgrep behave as gnugrep and as documented: -m should only stop
reading the specific file, not any file.

Tested by:	frogs (irc)
Reviewed by:	gabor
Approved by:	cperciva (implicit)
MFC after:	1 week
2012-12-20 17:38:14 +00:00
Gabor Kovesdan
6f4cbf7cb3 - Match GNU behavior of exit code
- Rename variable that has a different meaning now

PR:		bin/162930
Submitted by:	Jan Beich <jbeich@tormail.net>
MFC after:	1 week
2011-12-07 12:25:28 +00:00
Gabor Kovesdan
ede01be2f2 - Call warnx() instead of errx() if a directory is not readable when using
a recursive search.  This is the expected behavior instead of aborting.

PR:		bin/162907
Submitted by:	Jan Beich <jbeich@tormail.net>
MFC after:	3 days
2011-11-28 20:04:26 +00:00
Gabor Kovesdan
f0c94259d6 - Fix behavior of --null to match GNU grep
PR:		bin/162906
Submitted by:	Jan Beich <jbeich@tormail.net>
MFC after:	3 days
2011-11-28 20:00:31 +00:00
Gabor Kovesdan
bbf9339dbb - Fix counting of match limit (-m)
Reported by:	Nali Toja <nalitoja@gmail.com>
Approved by:	delphij (mentor)
2011-10-12 01:09:57 +00:00
Gabor Kovesdan
f20f6f3fdf Update BSD grep to the latest development version. It has some code
backported that was written for the TRE integration project in Google
Summer of Code 2011.  This is a temporary solution until the whole
regex library is not replaced so that BSD grep development can continue
and the backported code gets some review and testing.  This change only
improves scalability slightly, there is no big performance boost yet
but several minor bugs have been found and fixed.

Approved by:	delphij (mentor)
Sposored by:	Google Summer of Code 2011
MFC after:	1 week
2011-10-05 09:56:43 +00:00
Gabor Kovesdan
ad92276ece - Fix exclusion of directories from a recursive search
- Use FTS_SKIP for exclusion instead of custom code

Submitted by:	ttsestt@gmail.com
Approved by:	re (kib), delphij (mentor)
2011-08-17 13:58:39 +00:00
Gabor Kovesdan
69a6d1985d - Use REG_NOSUB to bypass submatch counting when not necessary. This may
yield in somewhat better performance in a few cases.

Approved by:	delphij (mentor)
2011-06-12 12:51:58 +00:00
Gabor Kovesdan
cbe6b9e50f - Fix -w behavior
- Make -F and -w work together
- Fix --color to colorize all of the matches

PR:		bin/156826
Submitted by:	Yuri Pankov <yuri.pankov@gmail.com>
Approved by:	delphij (mentor)
2011-06-12 12:44:02 +00:00
Gabor Kovesdan
b66a823be8 - Adjust a comment to actual behaviour
- Makefile nit
- Add more CVS/SVN keywords to make it easier to track changes from NetBSD
  in case they add further improvements

Approved by:	delphij (mentor)
Obtained from:	The NetBSD Project
2011-04-07 13:03:35 +00:00
Gabor Kovesdan
d841ecb30d - Simplify the fixed string pattern preprocessing code
- Improve readability

Approved by:	delphij (mentor)
Obtained from:	The NetBSD Project
2011-04-07 13:01:03 +00:00
Dag-Erling Smørgrav
a0ef9ad699 UTFize my name. 2010-08-19 09:28:59 +00:00
Gabor Kovesdan
3ed1008b89 - Refactor file reading code to use pure syscalls and an internal buffer
instead of stdio.  This gives BSD grep a very big performance boost,
  its speed is now almost comparable to GNU grep.

Submitted by:	Dimitry Andric <dimitry@andric.com>
Approved by:	delphij (mentor)
2010-08-18 17:40:10 +00:00
Gabor Kovesdan
59218eb770 - Revert strlcpy() changes to memcpy() because it's more efficient and
former may be safer but in this case it doesn't add extra
  safety [1]
- Fix -w option [2]
- Fix handling of GREP_OPTIONS [3]
- Fix --line-buffered
- Make stdin input imply --line-buffered so that tail -f can be piped
  to grep [4]
- Imply -h if single file is grepped, this is the GNU behaviour
- Reduce locking overhead to gain some more performance [5]
- Inline some functions to help the compiler better optimize the code
- Use shortcut for empty files [6]

PR:		bin/149425 [6]
Prodded by:	jilles [1]
Reported by:	Alex Kozlov <spam@rm-rf.kiev.ua> [2] [3],
		swell.k@gmail.com [2],
		poyopoyo@puripuri.plala.or.jp [4]
Submitted by:	scf [5],
		Shuichi KITAGUCHI <ki@hh.iij4u.or.jp> [6]
Approved by:	delphij (mentor)
2010-08-15 22:15:04 +00:00
Gabor Kovesdan
97a012f24a - Some minor changes to the messages to increase usefulness of error msgs
Reviewed by:	hrs (Japanese catalogs),
		pluknet <pluknet at gmail dot com> (Russian catalog)
Approved by:	delphij (mentor)
2010-07-29 18:02:57 +00:00
Gabor Kovesdan
55e44f511d - Use the traditional behaviour for filename and directory name inclusion
and exclusion patterns [1]
- Some improvements on the exiting code, like replacing memcpy with
  strlcpy/strcpy

Approved by:	delphij (mentor)
Pointed out by:	bf [1], des [1]
2010-07-29 00:11:14 +00:00
Gabor Kovesdan
36bcf7c1fb - Fix -l and -L by really surpressing output and just showing filenames
Submitted by:	swell.k@gmail.com
Approved by:	delphij (mentor)
2010-07-25 18:57:48 +00:00
Gabor Kovesdan
2711628675 - Fix --color behaviour to only output color sequences if stdout is a tty
or if forced mode is specified [1]
- While here, add some alternative names for the options and make then
  case-insensitive
- Fix -q and -l behaviour [2]
- Some small changes to make the code easier to review

Submitted by:   swell.k@gmail.com [1],
                dougb [2]
Approved by:    delphij (mentor)
2010-07-25 08:42:18 +00:00
Xin LI
0c41ffb3d4 Fix crashes when using grep -R:
- Explicitly pre-zero memory for fts_open parameters.
 - Don't test against directory patterns when we are testing direct
   leaf of current directory.

While I'm there plug a few of memory leaks.
2010-07-23 19:36:11 +00:00
Gabor Kovesdan
4dc88ebedf Add BSD grep to the base system and make it our default grep.
Deliverables: Small and clean code (1,4 KSLOC vs GNU's 8,5 KSLOC),
              lower memory usage than GNU grep, GNU compatibility,
              BSD license.

TODO:         Performance is somewhat behind GNU grep but it is only
              significant for bigger searches.  The reason is complex, the
              most important factor is that GNU grep uses lots of
              optimizations to improve the speed of the regex library.
              First, we need a modern regex library (practically by adopting
              TRE), add support for GNU-style non-standard regexes and then
              reevalute the performance issues and look for bottlenecks.  In
              the meantime, for those, who need better performance, it is
              possible to build GNU grep by setting WITH_GNU_GREP.

Approved by:            delphij (mentor)
Obtained from:          OpenBSD (http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/),
                        freegrep (http://github.com/howardjp/freegrep)
Sponsored by:           Google SoC 2008
Portbuild tests run by: kris, pav, erwin
Acknowledgements to:    fjoe (as SoC 2008 mentor),
                        everyone who helped in reviewing and testing
2010-07-22 19:11:57 +00:00