mirror of
https://github.com/monitoring-plugins/monitoring-plugins.git
synced 2026-02-03 18:49:29 -05:00
git-svn-id: https://nagiosplug.svn.sourceforge.net/svnroot/nagiosplug/nagiosplug/trunk@38 f882894a-f735-0410-b71e-b25c423dba1c
483 lines
16 KiB
Text
483 lines
16 KiB
Text
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
|
|
<book>
|
|
<title>Nagios Plug-in Developer Guidelines</title>
|
|
|
|
<bookinfo>
|
|
<authorgroup>
|
|
<author>
|
|
<firstname>Karl</firstname>
|
|
<surname>DeBisschop</surname>
|
|
<affiliation>
|
|
<address><email>karl@debisschop.net</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
<author>
|
|
<firstname>Ethan</firstname>
|
|
<surname>Galstad</surname>
|
|
<authorblurb>
|
|
<para>Author of Nagios</para>
|
|
<para><ulink url="http://www.nagios.org"></ulink></para>
|
|
</authorblurb>
|
|
<affiliation>
|
|
<address><email>netsaint@linuxbox.com</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
<author>
|
|
<firstname>Hugo</firstname>
|
|
<surname>Gayosso</surname>
|
|
<affiliation>
|
|
<address><email>hgayosso@gnu.org</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
|
|
<author>
|
|
<firstname>Subhendu</firstname>
|
|
<surname>Ghosh</surname>
|
|
<affiliation>
|
|
<address><email>sghosh@sourceforge.net</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
<author>
|
|
<firstname>Stanley</firstname>
|
|
<surname>Hopcroft</surname>
|
|
<affiliation>
|
|
<address><email>stanleyhopcroft@sourceforge.net</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
</authorgroup>
|
|
|
|
<pubdate>2002</pubdate>
|
|
<title>Nagios plug-in development guidelines</title>
|
|
|
|
<revhistory>
|
|
<revision>
|
|
<revnumber>0.4</revnumber>
|
|
<date>2 May 2002</date>
|
|
</revision>
|
|
</revhistory>
|
|
|
|
<copyright>
|
|
<year>2000 2001 2002</year>
|
|
<holder>Karl DeBisschop, Ethan Galstad,
|
|
Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh</holder>
|
|
</copyright>
|
|
|
|
</bookinfo>
|
|
|
|
|
|
<preface id=preface>
|
|
<title>About the guidelines</title>
|
|
|
|
<para>The purpose of this guidelines is to provide a reference for
|
|
the plug-in developers and encourage the standarization of the
|
|
different kind of plug-ins: C, shell, perl, python, etc.</para>
|
|
|
|
|
|
<section> <title>Copyright</title>
|
|
|
|
<para>Nagios Plug-in Development Guidelines Copyright (C) 2000 2001
|
|
2002
|
|
Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft,
|
|
Subhendu Ghosh</para>
|
|
|
|
<para>Permission is granted to make and distribute verbatim
|
|
copies of this manual provided the copyright notice and this
|
|
permission notice are preserved on all copies.</para>
|
|
|
|
<para>The plugins themselves are copyrighted by their respective
|
|
authors.</para>
|
|
|
|
</section>
|
|
</preface>
|
|
|
|
<article>
|
|
<section id="PlugOutput"><title>Plugin Output for Nagios</title>
|
|
|
|
<para>You should always print something to STDOUT that tells if the
|
|
service is working or why its failing. Try to keep the output short -
|
|
probably less that 80 characters. Remember that you ideally would like
|
|
the entire output to appear in a pager message, which will get chopped
|
|
off after a certain length.</para>
|
|
|
|
<section><title>Print only one line of text</title>
|
|
<para>Nagios will only grab the first line of text from STDOUT
|
|
when it notifies contacts about potential problems. If you print
|
|
multiple lines, you're out of luck. Remember, keep it short and
|
|
to the point.</para>
|
|
</section>
|
|
|
|
<section><title>Screen Output</title>
|
|
<para>The plug-in should print the diagnostic and just the
|
|
synopsis part of the help message. A well written plugin would
|
|
then have --help as a way to get the verbose help.</para>
|
|
<para>Code and output should try to respect the 80x25 size of a
|
|
crt (remember when fixing stuff in the server room!)</para>
|
|
</section>
|
|
|
|
<section><title>Return the proper status code</title>
|
|
<para>See <xref linkend="ReturnCodes"> below
|
|
for the numeric values of status codes and their
|
|
description. Remember to return an UNKNOWN state if bogus or
|
|
invalid command line arguments are supplied or it you are unable
|
|
to check the service.</para>
|
|
</section>
|
|
|
|
<section><title>Plugin Return Codes</title>
|
|
<para>The return codes below are based on the POSIX spec of returning
|
|
a positive value. Netsaint prior to v0.0.7 supported non-POSIX
|
|
compliant return code of "-1" for unknown. Nagios supports POSIX return
|
|
codes by default.</para>
|
|
|
|
<para>Note: Some plugins will on occasion print on STDOUT that an error
|
|
occurred and error code is 138 or 255 or some such number. These
|
|
are usually caused by plugins using system commands and having not
|
|
enough checks to catch unexpected output. Developers should include a
|
|
default catch-all for system command output that returns an UNKOWN
|
|
return code.</para>
|
|
|
|
<table id="ReturnCodes"><title>Plugin Return Codes</title>
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry><para>Numeric Value</para></entry>
|
|
<entry><para>Service Status</para></entry>
|
|
<entry><para>Status Description</para></entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry align=center><para>0</para></entry>
|
|
<entry valign=middle><para>OK</para></entry>
|
|
<entry><para>The plugin was able to check the service and it
|
|
appeared to be functioning properly</para></entry>
|
|
</row>
|
|
<row>
|
|
<entry align=center><para>1</para></entry>
|
|
<entry valign=middle><para>Warning</para></entry>
|
|
<entry><para>The plugin was able to check the service, but it
|
|
appeared to be above some "warning" threshold or did not appear
|
|
to be working properly</para></entry>
|
|
</row>
|
|
<row>
|
|
<entry align=center><para>2</para></entry>
|
|
<entry valign=middle><para>Critical</para></entry>
|
|
<entry><para>The plugin detected that either the service was not
|
|
running or it was above some "critical" threshold</para></entry>
|
|
</row>
|
|
<row>
|
|
<entry align=center><para>3</para></entry>
|
|
<entry valign=middle><para>Unknown</para></entry>
|
|
<entry><para>Invalid command line arguments were supplied to the
|
|
plugin or the plugin was unable to check the status of the given
|
|
hosts/service</para></entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
|
|
</section>
|
|
|
|
|
|
</section>
|
|
|
|
<section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title>
|
|
|
|
<section><title>Don't execute system commands without specifying their
|
|
full path</title>
|
|
<para>Don't use exec(), popen(), etc. to execute external
|
|
commands without explicity using the full path of the external
|
|
program.</para>
|
|
|
|
<para>Doing otherwise makes the plugin vulnerable to hijacking
|
|
by a trojan horse earlier in the search path. See the main
|
|
plugin distribution for examples on how this is done.</para>
|
|
</section>
|
|
|
|
<section><title>Use spopen() if external commands must be executed</title>
|
|
|
|
<para>If you have to execute external commands from within your
|
|
plugin and you're writing it in C, use the spopen() function
|
|
that Karl DeBisschop has written.</para>
|
|
|
|
<para>The code for spopen() and spclose() is included with the
|
|
core plugin distribution.</para>
|
|
</section>
|
|
|
|
<section><title>Don't make temp files unless absolutely required</title>
|
|
|
|
<para>If temp files are needed, make sure that the plugin will
|
|
fail cleanly if the file can't be written (e.g., too few file
|
|
handles, out of disk space, incorrect permissions, etc.) and
|
|
delete the temp file when processing is complete.</para>
|
|
</section>
|
|
|
|
<section><title>Don't be tricked into following symlinks</title>
|
|
|
|
<para>If your plugin opens any files, take steps to ensure that
|
|
you are not following a symlink to another location on the
|
|
system.</para>
|
|
</section>
|
|
|
|
<section><title>Validate all input</title>
|
|
|
|
<para>use routines in utils.c or utils.pm and write more as needed</para>
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section id="PerlPlugin"><title>Perl Plugins</title>
|
|
|
|
<para>Perl plugins are coded a little more defensively than other
|
|
plugins because of embedded Perl. When configured as such, embedded
|
|
Perl Nagios (ePN) requires stricter use of the some of Perl's features.
|
|
This section outlines some of the steps needed to use ePN
|
|
effectively.</para>
|
|
|
|
<orderedlist>
|
|
|
|
<listitem><para> Do not use BEGIN and END blocks since they will be called
|
|
the first time and when Nagios shuts down with Embedded Perl (ePN). In
|
|
particular, do not use BEGIN blocks to initialize variables.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>To use utils.pm, you need to provide a full path to the
|
|
module in order for it to work with ePN.</para>
|
|
|
|
<literallayout>
|
|
e.g.
|
|
use lib "/usr/local/nagios/libexec";
|
|
use utils qw(...);
|
|
</literallayout>
|
|
</listitem>
|
|
|
|
<listitem><para>Perl scripts should be called with "-w"</para>
|
|
</listitem>
|
|
|
|
<listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at
|
|
least explicitly package names as in "$main::x" or predeclare every
|
|
variable. </para>
|
|
|
|
|
|
<para>Explicitly initialize each varialable in use. Otherwise with
|
|
caching enabled, the plugin will not be recompilied each time, and
|
|
therefore Perl will not reinitialize all the variables. All old
|
|
variable values will still be in effect.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Do not use < DATA > (these simply do not compile under ePN).</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Do not use named subroutines</para>
|
|
</listitem>
|
|
|
|
<listitem><para>If writing to a file (perhaps recording
|
|
performance data) explicitly close close it. The plugin never
|
|
calls <emphasis role=strong>exit</emphasis>; that is caught by
|
|
p1.pl, so output streams are never closed.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>As in <xref linkend="runtime"> all plugins need
|
|
to monitor their runtime, specially if they are using network
|
|
resources. Use of the <emphasis>alarm</emphasis> is recommended.
|
|
Plugins may import a default time out ($TIMEOUT) from utils.pm.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Perl plugins should import %ERRORS from utils.pm
|
|
and then "exit $ERRORS{'OK'}" rather than "exit 0"
|
|
</para>
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</section>
|
|
|
|
<section id="runtime"><title>Runtime Timeouts</title>
|
|
|
|
<para>Plugins have a very limited runtime - typically 10 sec.
|
|
As a result, it is very important for plugins to maintain internal
|
|
code to exit if runtime exceeds a threshold. </para>
|
|
|
|
<para>All plugins should timeout gracefully, not just networking
|
|
plugins. For instance, df may lock if you have automounted
|
|
drives and your network fails - but on first glance, who'd think
|
|
df could lock up like that. Plus, it should just be more error
|
|
resistant to be able to time out rather than consume
|
|
resources.</para>
|
|
|
|
<section><title>Use DEFAULT_SOCKET_TIMEOUT</title>
|
|
|
|
<para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para>
|
|
|
|
</section>
|
|
|
|
|
|
<section><title>Add alarms to network plugins</title>
|
|
|
|
<para>If you write a plugin which communicates with another
|
|
networked host, you should make sure to set an alarm() in your
|
|
code that prevents the plugin from hanging due to abnormal
|
|
socket closures, etc. Nagios takes steps to protect itself
|
|
against unruly plugins that timeout, but any plugins you create
|
|
should be well behaved on their own.</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
<section id="PlugOptions"><title>Plugin Options</title>
|
|
|
|
<para>A well written plugin should have --help as a way to get
|
|
verbose help. Code and output should try to respect the 80x25 size of a
|
|
crt (remember when fixing stuff in the server room!)</para>
|
|
|
|
<section><title>Option Processing</title>
|
|
|
|
<para>For plugins written in C, we recommend the C standard
|
|
getopt library for short options. If using getopt_long, check to
|
|
be sure that HAVE_GETOPT_H is defined (configure checks this and
|
|
sets the #define in common/config.h).</para>
|
|
|
|
<para>For plugins written in Perl, we recommend Getopt::Long module.</para>
|
|
|
|
<para>Positional arguments are strongly discouraged.</para>
|
|
|
|
<para>There are a few reserved options that should not be used
|
|
for other purposes:</para>
|
|
|
|
<literallayout>
|
|
-V version (--version)
|
|
-h help (--help)
|
|
-t timeout (--timeout)
|
|
-w warning threshold (--warning)
|
|
-c critical threshold (--critical)
|
|
-H hostname (--hostname)
|
|
</literallayout>
|
|
|
|
<para>In addition to the reserved options above, some other standard options are:</para>
|
|
|
|
<literallayout>
|
|
-C SNMP community (--community)
|
|
-a authentication password (--authentication)
|
|
-l login name (--logname)
|
|
-p port or password (--port or --passwd/--password)monitors operational
|
|
-u url or username (--url or --username)
|
|
</literallayout>
|
|
|
|
<para>Look at check_pgsql and check_procs to see how I currently
|
|
think this can work. Standard options are:</para>
|
|
|
|
|
|
<para>The option -V or --version should be present in all
|
|
plugins. For C plugins it should result in a call to print_revision, a
|
|
function in utils.c which takes two character arguments, the
|
|
command name and the plugin revision.</para>
|
|
|
|
<para>The -? option, or any other unparsable set of options,
|
|
should print out a short usage statement. Character width should
|
|
be 80 and less and no more that 23 lines should be printed (it
|
|
should display cleanly on a dumb terminal in a server
|
|
room).</para>
|
|
|
|
<para>The option -h or --help should be present in all plugins.
|
|
In C plugins, it should result in a call to print_help (or
|
|
equivalent). The function print_help should call print_revision,
|
|
then print_usage, then should provide detailed
|
|
help. Help text should fit on an 80-character width display, but
|
|
may run as many lines as needed.</para>
|
|
|
|
</section>
|
|
|
|
<section>
|
|
<title>Plugins with more than one type of threshold, or with
|
|
threshold ranges</title>
|
|
|
|
<para>Old style was to do things like -ct for critical time and
|
|
-cv for critical value. That goes out the window with POSIX
|
|
getopt. The allowable alternatves are:</para>
|
|
|
|
<orderedlist>
|
|
<listitem>
|
|
<para>long options like -critical-time (or -ct and -cv, I
|
|
suppose).</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>repeated options like `check_load -w 10 -w 6 -w 4 -c
|
|
16 -c 10 -c 10`</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>for brevity, the above can be expressed as `check_load
|
|
-w 10,6,4 -c 16,10,10`</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>ranges are expressed with colons as in `check_procs -C
|
|
httpd -w 1:20 -c 1:30` which will warn above 20 instances,
|
|
and critical at 0 and above 30</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>lists are expressed with commas, so Jacob's check_nmap
|
|
uses constructs like '-p 1000,1010,1050:1060,2000'</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>If possible when writing lists, use tokens to make the
|
|
list easy to remember and non-order dependent - so
|
|
check_disk uses '-c 10000,10%' so that it is clear which is
|
|
the precentage and which is the KB values (note that due to
|
|
my own lack of foresight, that used to be '-c 10000:10%' but
|
|
such constructs should all be changed for consistency,
|
|
though providing reverse compatibility is fairly
|
|
easy).</para>
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
<para>As always, comments are welcome - making this consistent
|
|
without a host of long options was quite a hassle, and I would
|
|
suspect that there are flaws in this strategy. Perhaps clear
|
|
long-options is the most important of the above choices, but not
|
|
all POSIX systems have C libraries for long options, so the
|
|
short forms must exist as well.</para>
|
|
</section>
|
|
</section>
|
|
|
|
<section id="SubmittingChanges"><title>New submissions and patches</title>
|
|
|
|
<para>If you would like other to use your plugins and have it included in
|
|
the standard distribution, please include patches for the relavant
|
|
configuration files, in particular "configure.in" Otherwise submitted
|
|
plugins will be included in the contrib directory.</para>
|
|
|
|
<para>Plugins in the contrib directory are going to be migrated to the
|
|
standard plugins/plugin-scripts directory as time permits and per user
|
|
requests</para>
|
|
|
|
<para>Patches should be submitted via the SourceForge and be announced to
|
|
the mailing list.</para>
|
|
|
|
<para>For new plugins, provide a diff to add to the EXTRAS list (configure.in)
|
|
unless you are fairly sure that the plugin will work for all platforms with
|
|
no non-standard software added.</para>
|
|
|
|
<para>If possible please submit a test harness. Documentation on sample
|
|
tests coming soon.</para>
|
|
|
|
</section>
|
|
</article>
|
|
|
|
</book>
|