mirror of
https://github.com/opnsense/src.git
synced 2026-03-09 09:41:05 -04:00
Add two new manual pages related to general firewall and tuning issues
Reviewed by: hackers
This commit is contained in:
parent
58f43c087f
commit
fc32c80215
3 changed files with 854 additions and 1 deletions
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
#MISSING: eqnchar.7 ms.7 term.7
|
||||
MAN= ascii.7 build.7 clocks.7 environ.7 hier.7 hostname.7 intro.7 mailaddr.7 \
|
||||
operator.7 ports.7 security.7 \
|
||||
operator.7 ports.7 security.7 tuning.7 firewall.7 \
|
||||
style.perl.7
|
||||
MLINKS= intro.7 miscellaneous.7
|
||||
|
||||
|
|
|
|||
375
share/man/man7/firewall.7
Normal file
375
share/man/man7/firewall.7
Normal file
|
|
@ -0,0 +1,375 @@
|
|||
.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of
|
||||
.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
|
||||
.\" the source tree.
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd May 26, 2001
|
||||
.Dt FIREWALL 7
|
||||
.Os FreeBSD
|
||||
.Sh NAME
|
||||
.Nm firewall
|
||||
.Nd simple firewalls under FreeBSD
|
||||
.Sh FIREWALL BASICS
|
||||
A Firewall is most commonly used to protect an internal network
|
||||
from an outside network by preventing the outside network from
|
||||
making arbitrary connections into the internal network. Firewalls
|
||||
are also used to prevent outside entities from spoofing internal
|
||||
IP addresses and to isolate services such as NFS or SMBFS (Windows
|
||||
file sharing) within LAN segments.
|
||||
.Pp
|
||||
The
|
||||
.Fx
|
||||
firewalling system also has the capability to limit bandwidth using
|
||||
.Xr dummynet 4 .
|
||||
This feature can be useful when you need to guarentee a certain
|
||||
amount of bandwidth for a critical purpose. For example, if you
|
||||
are doing video conferencing over the internet via your
|
||||
office T1 (1.5 MBits), you may wish to bandwidth-limit all other
|
||||
T1 traffic to 1 MBit in order to reserve at least 0.5 MBits
|
||||
for your video conferencing connections. Similarly if you are
|
||||
running a popular web or ftp site from a colocation facility
|
||||
you might want to limit bandwidth to prevent excessive band
|
||||
width charges from your provider.
|
||||
.Pp
|
||||
Finally,
|
||||
.Fx
|
||||
firewalls may be used to divert packets or change the next-hop
|
||||
address for packets to help route them to the correct destination.
|
||||
Packet diversion is most often used to support NAT (network
|
||||
address translation), which allows an internal network using
|
||||
a private IP space to make connections to the outside for browsing
|
||||
or other purposes.
|
||||
.Pp
|
||||
Constructing a firewall may appear to be trivial, but most people
|
||||
get them wrong. The most common mistake is to create an exclusive
|
||||
firewall rather then an inclusive firewall. An exclusive firewall
|
||||
allows all packets through except for those matching a set of rules.
|
||||
An inclusive firewall allows only packets matching the rulset
|
||||
through. Inclusive firewalls are much, much safer then exclusive
|
||||
firewalls but a tad more difficult to build properly. The
|
||||
second most common mistake is to blackhole everything except the
|
||||
particular port you want to let through. TCP/IP needs to be able
|
||||
to get certain types of ICMP errors to function properly - for
|
||||
example, to implement MTU discovery. Also, a number of common
|
||||
system daemons make reverse connections to the
|
||||
.Sy auth
|
||||
service in an attempt to authenticate the user making a connection.
|
||||
Auth is rather dangerous but the proper implementation is to return
|
||||
a TCP reset for the connection attempt rather then simply blackholing
|
||||
the packet. We cover these and other quirks involved with constructing
|
||||
a firewall in the sample firewall section below.
|
||||
.Sh IPFW KERNEL CONFIGURATION
|
||||
To use the ip firewall features of
|
||||
.Fx
|
||||
you must create a custom kernel with the
|
||||
.Sy IPFIREWALL
|
||||
option set. The kernel defaults its firewall to deny all
|
||||
packets by default, which means that if you do not load in
|
||||
a permissive ruleset via
|
||||
.Em /etc/rc.conf ,
|
||||
rebooting into your new kernel will take the network offline
|
||||
and will prevent you from being able to access it if you
|
||||
are not sitting at the console. It is also quite common to
|
||||
update a kernel to a new release and reboot before updating
|
||||
the binaries. This can result in an incompatibility between
|
||||
the
|
||||
.Xr ipfw 8
|
||||
program and the kernel which prevents it from running in the
|
||||
boot sequence, also resulting in an inaccessible machine.
|
||||
Because of these problems the
|
||||
.Sy IPFIREWALL_DEFAULT_TO_ACCEPT
|
||||
kernel option is also available which changes the default firewall
|
||||
to pass through all packets. Note, however, that this is a very
|
||||
dangerous option to set because it means your firewall is disabled
|
||||
during booting. You should use this option while getting up to
|
||||
speed with
|
||||
.Fx
|
||||
firewalling, but get rid of it once you understand how it all works
|
||||
to close the loophole. There is a third option called
|
||||
.Sy IPDIVERT
|
||||
which allows you to use the firewall to divert packets to a user program
|
||||
and is necessary if you wish to use
|
||||
.Xr natd 8
|
||||
to give private internal networks access to the outside world.
|
||||
If you want to be able to limit the bandwidth used by certain types of
|
||||
traffic, the
|
||||
.Sy DUMMYNET
|
||||
option must be used to enable
|
||||
.Em ipfw pipe
|
||||
rules.
|
||||
.Pp
|
||||
.Sh SAMPLE IPFW-BASED FIREWALL
|
||||
Here is an example ipfw-based firewall taken from a machine with three
|
||||
interface cards. fxp0 is connected to the 'exposed' LAN. Machines
|
||||
on this LAN are dual-homed with both internal 10. IP addresses and
|
||||
internet-routed IP addresses. In our example, 192.100.5.x represents
|
||||
the internet-routed IP block while 10.x.x.x represents the internal
|
||||
networks. While it isn't relevant to the example, 10.0.1.x is
|
||||
assigned as the internal address block for the LAN on fxp0, 10.0.2.x
|
||||
for the LAN on fxp1, and 10.0.3.x for the LAN on fxp2.
|
||||
.Pp
|
||||
In this example we want to isolate all three LANs from the internet
|
||||
as well as isolate them from each other, and we want to give all
|
||||
internal addresses access to the internet through a NAT gateway running
|
||||
on this machine. To make the NAT gateway work, the firewall machine
|
||||
is given two internet-exposed addresses on fxp0 in addition to an
|
||||
internal 10. address on fxp0: one exposed address (not shown)
|
||||
represents the machine's official address, and the second exposed
|
||||
address (192.100.5.5 in our example) represents the NAT gateway
|
||||
rendezvous IP. We make the example more complex by giving the machines
|
||||
on the exposed LAN internal 10.0.0.x addresses as well as exposed
|
||||
addresses. The idea here is that you can bind internal services
|
||||
to internal addresses even on exposed machines and still protect
|
||||
those services from the internet. The only services you run on
|
||||
exposed IP addresses would be the ones you wish to expose to the
|
||||
internet.
|
||||
.Pp
|
||||
It is important to note that the 10.0.0.x network in our example
|
||||
is not protected by our firewall. You must make sure that your
|
||||
internet router protects this network from outside spoofing.
|
||||
Also, in our example, we pretty much give the exposed hosts free
|
||||
reign on our internal network when operating services through
|
||||
internal IP addresses (10.0.0.x). This is somewhat of security
|
||||
risk... what if an exposed host is compromised? To remove the
|
||||
risk and force everything coming in via LAN0 to go through
|
||||
the firewall, remove rules 01010 and 01011.
|
||||
.Pp
|
||||
Finally, note that the use of internal addresses represents a
|
||||
big piece of our firewall protection mechanism. With proper
|
||||
spoofing safeguards in place, nothing outside can directly
|
||||
access an internal (LAN1 or LAN2) host.
|
||||
.Bd -literal
|
||||
# /etc/rc.conf
|
||||
#
|
||||
firewall_enable="YES"
|
||||
firewall_type="/etc/ipfw.conf"
|
||||
|
||||
# temporary port binding range let
|
||||
# through the firewall.
|
||||
#
|
||||
# NOTE: heavily loaded services running through the firewall may require
|
||||
# a larger port range for local-size binding. 4000-10000 or 4000-30000
|
||||
# might be a better choice.
|
||||
ip_portrange_first=4000
|
||||
ip_portrange_last=5000
|
||||
...
|
||||
.Ed
|
||||
.Pp
|
||||
.Bd -literal
|
||||
# /etc/ipfw.conf
|
||||
#
|
||||
# FIREWALL: the firewall machine / nat gateway
|
||||
# LAN0 10.0.0.X and 192.100.5.X (dual homed)
|
||||
# LAN1 10.0.1.X
|
||||
# LAN2 10.0.2.X
|
||||
# sw: ethernet switch (unmanaged)
|
||||
#
|
||||
# 192.100.5.x represents IP addresses exposed to the internet
|
||||
# (i.e. internet routeable). 10.x.x.x represent internal IPs
|
||||
# (not exposed)
|
||||
#
|
||||
# [LAN1]
|
||||
# ^
|
||||
# |
|
||||
# FIREWALL -->[LAN2]
|
||||
# |
|
||||
# [LAN0]
|
||||
# |
|
||||
# +--> exposed host A
|
||||
# +--> exposed host B
|
||||
# +--> exposed host C
|
||||
# |
|
||||
# INTERNET (secondary firewall)
|
||||
# ROUTER
|
||||
# |
|
||||
# [internet]
|
||||
#
|
||||
# NOT SHOWN: The INTERNET ROUTER must contain rules to disallow
|
||||
# all packets with source IP addresses in the 10. block in order
|
||||
# to protect the dual-homed 10.0.0.x block. Exposed hosts are
|
||||
# not otherwise protected in this example - they should only bind
|
||||
# exposed services to exposed IPs but can safely bind internal
|
||||
# services to internal IPs.
|
||||
#
|
||||
# The NAT gateway works by taking packets sent from internal
|
||||
# IP addresses to external IP addresses and routing them to natd, which
|
||||
# is listening on port 8668. This is handled by rule 00300. Data coming
|
||||
# back to natd from the outside world must also be routed to natd using
|
||||
# rule 00301. To make the example interesting, we note that we do
|
||||
# NOT have to run internal requests to exposed hosts through natd
|
||||
# (rule 00290) because those exposed hosts know about our
|
||||
# 10. network. This can reduce the load on natd. Also note that we
|
||||
# of course do not have to route internal<->internal traffic through
|
||||
# natd since those hosts know how to route our 10. internal network.
|
||||
# The natd command we run from /etc/rc.local is shown below. See
|
||||
# also the in-kernel version of natd, ipnat.
|
||||
#
|
||||
# natd -s -u -a 208.161.114.67
|
||||
#
|
||||
#
|
||||
add 00290 skipto 1000 ip from 10.0.0.0/8 to 192.100.5.0/24
|
||||
add 00300 divert 8668 ip from 10.0.0.0/8 to not 10.0.0.0/8
|
||||
add 00301 divert 8668 ip from not 10.0.0.0/8 to 192.100.5.5
|
||||
|
||||
# Short cut the rules to avoid running high bandwidths through
|
||||
# the entire rule set. Allow established tcp connections through,
|
||||
# and shortcut all outgoing packets under the assumption that
|
||||
# we need only firewall incoming packets.
|
||||
#
|
||||
# Allowing established tcp connections through creates a small
|
||||
# hole but may be necessary to avoid overloading your firewall.
|
||||
# If you are worried, you can move the rule to after the spoof
|
||||
# checks.
|
||||
#
|
||||
add 01000 allow tcp from any to any established
|
||||
add 01001 allow all from any to any out via fxp0
|
||||
add 01001 allow all from any to any out via fxp1
|
||||
add 01001 allow all from any to any out via fxp2
|
||||
|
||||
# Spoof protection. This depends on how well you trust your
|
||||
# internal networks. Packets received via fxp1 MUST come from
|
||||
# 10.0.1.x. Packets received via fxp2 MUST come from 10.0.2.x.
|
||||
# Packets received via fxp0 cannot come from the LAN1 or LAN2
|
||||
# blocks. We can't protect 10.0.0.x here, the internet router
|
||||
# must do that for us.
|
||||
#
|
||||
add 01500 deny all from not 10.0.1.0/24 in via fxp1
|
||||
add 01500 deny all from not 10.0.2.0/24 in via fxp2
|
||||
add 01501 deny all from 10.0.1.0/24 in via fxp0
|
||||
add 01501 deny all from 10.0.2.0/24 in via fxp0
|
||||
|
||||
# In this example rule set there are no restrictions between
|
||||
# internal hosts, even those on the exposed LAN (as long as
|
||||
# they use an internal IP address). This represents a
|
||||
# potential security hole (what if an exposed host is
|
||||
# compromised?). If you want full restrictions to apply
|
||||
# between the three LANs, firewalling them off from each
|
||||
# other for added security, remove these two rules.
|
||||
#
|
||||
# If you want to isolate LAN1 and LAN2, but still want
|
||||
# to give exposed hosts free reign with each other, get
|
||||
# rid of rule 01010 and keep rule 01011.
|
||||
#
|
||||
# (commented out, uncomment for less restrictive firewall)
|
||||
#add 01010 allow all from 10.0.0.0/8 to 10.0.0.0/8
|
||||
#add 01011 allow all from 192.100.5.0/24 to 192.100.5.0/24
|
||||
#
|
||||
|
||||
# SPECIFIC SERVICES ALLOWED FROM SPECIFIC LANS
|
||||
#
|
||||
# If using a more restrictive firewall, allow specific LANs
|
||||
# access to specific services running on the firewall itself.
|
||||
# In this case we assume LAN1 needs access to filesharing running
|
||||
# on the firewall. If using a less restrictive firewall
|
||||
# (allowing rule 01010), you don't need these rules.
|
||||
#
|
||||
add 01012 allow tcp from 10.0.1.0/8 to 10.0.1.1 139
|
||||
add 01012 allow udp from 10.0.1.0/8 to 10.0.1.1 137,138
|
||||
|
||||
# GENERAL SERVICES ALLOWED TO CROSS INTERNAL AND EXPOSED LANS
|
||||
#
|
||||
# We allow specific UDP services through: DNS lookups, ntalk, and ntp.
|
||||
# Note that internal services are protected by virtue of having
|
||||
# spoof-proof internal IP addresses (10. net), so these rules
|
||||
# really only apply to services bound to exposed IPs. We have
|
||||
# to allow UDP fragments or larger fragmented UDP packets will
|
||||
# not survive the firewall.
|
||||
#
|
||||
# If we want to expose high-numbered temporary service ports
|
||||
# for things like DNS lookup responses we can use a port range,
|
||||
# in this example 4000-65535, and we set to /etc/rc.conf variables
|
||||
# on all exposed machines to make sure they bind temporary ports
|
||||
# to the exposed port range (see rc.conf example above)
|
||||
#
|
||||
add 02000 allow udp from any to any 4000-65535,domain,ntalk,ntp
|
||||
add 02500 allow udp from any to any frag
|
||||
|
||||
# Allow similar services for TCP. Again, these only apply to
|
||||
# services bound to exposed addresses. NOTE: we allow 'auth'
|
||||
# through but do not actually run an identd server on any exposed
|
||||
# port. This allows the machine being authed to respond with a
|
||||
# TCP RESET. Throwing the packet away would result in delays
|
||||
# when connecting to remote services that do reverse ident lookups.
|
||||
#
|
||||
# Note that we do not allow tcp fragments through, and that we do
|
||||
# not allow fragments in general (except for UDP fragments). We
|
||||
# expect the TCP mtu discovery protocol to work properly so there
|
||||
# should be no TCP fragments.
|
||||
#
|
||||
add 03000 allow tcp from any to any http,https
|
||||
add 03000 allow tcp from any to any 4000-65535,ssh,smtp,domain,ntalk
|
||||
add 03000 allow tcp from any to any auth,pop3,ftp,ftp-data
|
||||
|
||||
# It is important to allow certain ICMP types through:
|
||||
#
|
||||
# 0 Echo Reply
|
||||
# 3 Destination Unreachable
|
||||
# 4 Source Quench (typically not allowed)
|
||||
# 5 Redirect (typically not allowed - can be dangerous!)
|
||||
# 8 Echo
|
||||
# 11 Time Exceeded
|
||||
# 12 Parameter Problem
|
||||
# 13 Timestamp
|
||||
# 14 Timestamp Reply
|
||||
#
|
||||
# Sometimes people need to allow ICMP REDIRECT packets, which is
|
||||
# type 5, but if you allow it make sure that your internet router
|
||||
# disallows it.
|
||||
|
||||
add 04000 allow icmp from any to any icmptypes 0,5,8,11,12,13,14
|
||||
|
||||
# log any remaining fragments that get through. Might be useful,
|
||||
# otherwise don't bother. Have a final deny rule as a safety to
|
||||
# guarentee that your firewall is inclusive no matter how the kernel
|
||||
# is configured.
|
||||
#
|
||||
add 05000 deny log ip from any to any frag
|
||||
add 06000 deny all from any to any
|
||||
.Ed
|
||||
.Sh PORT BINDING INTERNAL AND EXTERNAL SERVICES
|
||||
We've mentioned multi-homing hosts and binding services to internal or
|
||||
external addresses but we haven't really explained it. When you have a
|
||||
host with multiple IP addresses assigned to it, you can bind services run
|
||||
on that host to specific IPs or interfaces rather then all IPs. Take
|
||||
the firewall machine for example: With three interfaces
|
||||
and two exposed IP addresses
|
||||
on one of those interfaces, the firewall machine is known by 5 different
|
||||
IP addresses (10.0.0.1, 10.0.1.1, 10.0.2.1, 192.100.5.5, and say
|
||||
192.100.5.1). If the firewall is providing file sharing services to the
|
||||
windows LAN segment (say it is LAN1), you can use samba's 'bind interfaces'
|
||||
directive to specifically bind it to just the LAN1 IP address. That
|
||||
way the file sharing services will not be made available to other LAN
|
||||
segments. The same goes for NFS. If LAN2 has your UNIX engineering
|
||||
workstations, you can tell nfsd to bind specifically to 10.0.2.1. You
|
||||
can specify how to bind virtually every service on the machine and you
|
||||
can use a light
|
||||
.Xr jail 8
|
||||
to indirectly bind services that do not otherwise give you the option.
|
||||
.Sh SEE ALSO
|
||||
.Pp
|
||||
.Xr config 8 ,
|
||||
.Xr dummynet 4 ,
|
||||
.Xr ipfw 8 ,
|
||||
.Xr ipnat 1 ,
|
||||
.Xr ipnat 5 ,
|
||||
.Xr jail 8 ,
|
||||
.Xr natd 8 ,
|
||||
.Xr nfsd 8 ,
|
||||
.Xr rc.conf 5 ,
|
||||
.Xr samba 7 [ /usr/ports/net/samba ]
|
||||
.Xr smb.conf 5 [ /usr/ports/net/samba ]
|
||||
.Sh ADDITIONAL READING
|
||||
.Pp
|
||||
.Xr ipf 5 ,
|
||||
.Xr ipf 8 ,
|
||||
.Xr ipfstat 8
|
||||
.Sh HISTORY
|
||||
The
|
||||
.Nm
|
||||
manual page was originally written by
|
||||
.An Matthew Dillon
|
||||
and first appeared
|
||||
in
|
||||
.Fx 4.3 ,
|
||||
May 2001.
|
||||
478
share/man/man7/tuning.7
Normal file
478
share/man/man7/tuning.7
Normal file
|
|
@ -0,0 +1,478 @@
|
|||
.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of
|
||||
.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
|
||||
.\" the source tree.
|
||||
.\"
|
||||
.\" $FreeBSD$
|
||||
.\"
|
||||
.Dd May 25, 2001
|
||||
.Dt TUNING 7
|
||||
.Os FreeBSD
|
||||
.Sh NAME
|
||||
.Nm tuning
|
||||
.Nd performance tuning under FreeBSD
|
||||
.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP
|
||||
.Pp
|
||||
When using
|
||||
.Xr disklabel 8
|
||||
to lay out your filesystems on a hard disk it is important to remember
|
||||
that hard drives can transfer data much more quickly from outer tracks
|
||||
then they can from inner tracks. To take advantage of this you should
|
||||
try to pack your smaller filesystems and swap closer to the outer tracks,
|
||||
follow with the larger filesystems, and end with the largest filesystems.
|
||||
It is also important to size system standard filesystems such that you
|
||||
will not be forced to resize them later as you scale the machine up.
|
||||
I usually create, in order, a 128M root, 1G swap, 128M /var, 128M /var/tmp,
|
||||
3G /usr, and use any remaining space for /home.
|
||||
.Pp
|
||||
You should typically size your swap space to approximately 2x main memory.
|
||||
If you do not have a lot of ram, though, you will generally want a lot
|
||||
more swap. It is not recommended that you configure any less than
|
||||
256M of swap on a system and you should keep in mind future memory
|
||||
expansion when sizing the swap partition.
|
||||
The kernel's VM paging algorithms are tuned to perform best when there is
|
||||
at least 2x swap versus main memory. Configuring too little swap can lead
|
||||
to inefficiencies in the VM page scanning code as well as create issues
|
||||
later on if you add more memory to your machine. Finally, on larger systems
|
||||
with multiple SCSI disks (or multiple IDE disks operating on different
|
||||
controllers), we strongly recommend that you configure swap on each drive
|
||||
(up to four drives). The swap partitions on the drives should be
|
||||
approximately the same size. The kernel can handle arbitrary sizes but
|
||||
internal data structures scale to 4 times the largest swap partition. Keeping
|
||||
the swap partitions near the same size will allow the kernel to optimally
|
||||
stripe swap space across the N disks. Don't worry about overdoing it a
|
||||
little, swap space is the saving grace of
|
||||
.Ux
|
||||
and even if you don't normally use much swap, it can give you more time to
|
||||
recover from a runaway program before being forced to reboot.
|
||||
.Pp
|
||||
How you size your
|
||||
.Em /var
|
||||
partition depends heavily on what you intend to use the machine for. This
|
||||
partition is primarily used to hold mailboxes, the print spool, and log
|
||||
files. Some people even make
|
||||
.Em /var/log
|
||||
its own partition (but except for extreme cases it isn't worth the waste
|
||||
of a partition id). If your machine is intended to act as a mail
|
||||
or print server,
|
||||
or you are running a heavily visited web server, you should consider
|
||||
creating a much larger partition - perhaps a gig or more. It is very easy
|
||||
to underestimate log file storage requirements.
|
||||
.Pp
|
||||
Sizing
|
||||
.Em /var/tmp
|
||||
depends on the kind of temporary file usage you think you will need. 128M is
|
||||
the minimum we recommend. Also note that you usually want to make
|
||||
.Em /tmp
|
||||
a softlink to
|
||||
.Em /var/tmp .
|
||||
Dedicating a partition for temporary file storage is important for
|
||||
two reasons: First, it reduces the possibility of filesystem corruption
|
||||
in a crash, and second it reduces the chance of a runaway process that
|
||||
fills up [/var]/tmp from blowing up more critical subsystems (mail,
|
||||
logging, etc). Filling up [/var]/tmp is a very common problem to have.
|
||||
.Pp
|
||||
In the old days there were differences between /tmp and /var/tmp,
|
||||
but the introduction of /var (and /var/tmp) led to massive confusion
|
||||
by program writers so today programs halfhazardly use one or the
|
||||
other and thus no real distinction can be made between the two. So
|
||||
it makes sense to have just one temporary directory. You can do the
|
||||
softlink either way. The one thing you do not want to do is leave /tmp
|
||||
on the root partition where it might cause root to fill up or possibly
|
||||
corrupt root in a crash/reboot situation.
|
||||
.Pp
|
||||
The
|
||||
.Em /usr
|
||||
partition holds the bulk of the files required to support the system and
|
||||
a subdirectory within it called
|
||||
.Em /usr/local
|
||||
holds the bulk of the files installed from the
|
||||
.Xr ports 7
|
||||
hierarchy. If you do not use ports all that much and do not intend to keep
|
||||
system source (/usr/src) on the machine, you can get away with
|
||||
a 1 gigabyte /usr partition. However, if you install a lot of ports
|
||||
(especially window managers and linux-emulated binaries), we recommend
|
||||
at least a 2 gigabyte /usr and if you also intend to keep system source
|
||||
on the machine, we recommend a 3 gigabyte /usr. Do not underestimate the
|
||||
amount of space you will need in this partition, it can creep up and
|
||||
surprise you!
|
||||
.Pp
|
||||
The
|
||||
.Em /home
|
||||
partition is typically used to hold user-specific data. I usually size it
|
||||
to the remainder of the disk.
|
||||
.Pp
|
||||
Why partition at all? Why not create one big
|
||||
.Em /
|
||||
partition and be done with it? Then I don't have to worry about undersizing
|
||||
things! Well, there are several reasons this isn't a good idea. First,
|
||||
each partition has different operational characteristics and separating them
|
||||
allows the filesystem to tune itself to those characteristics. For example,
|
||||
the root and /usr partitions are read-mostly, with very little writing, while
|
||||
a lot of reading and writing could occur in /var and /var/tmp. By properly
|
||||
partitioning your system, fragmentation introduced in the smaller more
|
||||
heavily write-loaded partitions will not bleed over into the mostly-read
|
||||
partitions. Additionally, keeping the write-loaded partitions closer to
|
||||
the edge of the disk (i.e. before the really big partitions instead of after
|
||||
in the partition table) will increase I/O performance in the partitions
|
||||
where you need it the most. Now it is true that you might also need I/O
|
||||
performance in the larger partitions, but they are so large that shifting
|
||||
them more towards the edge of the disk will not lead to a significnat
|
||||
performance improvement whereas moving /var to the edge can have a huge impact.
|
||||
Finally, there are safety concerns. Having a small neat root partition that
|
||||
is essentially read-only gives it a greater chance of surviving a bad crash
|
||||
intact.
|
||||
.Pp
|
||||
Properly partitioning your system also allows you to tune
|
||||
.Xr newfs 8 ,
|
||||
and
|
||||
.Xr tunefs 8
|
||||
parameters. Tuning
|
||||
.Fn newfs
|
||||
requires more experience but can lead to significant improvements in
|
||||
performance. There are three parameters that are relatively safe to
|
||||
tune:
|
||||
.Em blocksize ,
|
||||
.Em bytes/inode ,
|
||||
and
|
||||
.Em cylinders/group .
|
||||
.Pp
|
||||
.Fx
|
||||
performs best when using 8K or 16K filesystem block sizes. The default
|
||||
filesystem block size is 8K. For larger partitions it is usually a good
|
||||
idea to use a 16K block size. This also requires you to specify a larger
|
||||
fragment size. We recommend always using a fragment size that is 1/8
|
||||
the block size (less testing has been done on other fragment size factors).
|
||||
The
|
||||
.Fn newfs
|
||||
options for this would be
|
||||
.Em newfs -f 2048 -b 16384 ...
|
||||
Using a larger block size can cause fragmentation of the buffer cache and
|
||||
lead to lower performance.
|
||||
.Pp
|
||||
If a large partition is intended to be used to hold fewer, larger files, such
|
||||
as a database files, you can increase the
|
||||
.Em bytes/inode
|
||||
ratio which reduces the number if inodes (maximum number of files and
|
||||
directories that can be created) for that partition. Decreasing the number
|
||||
of inodes in a filesystem can greatly reduce
|
||||
.Xr fsck 8
|
||||
recovery times after a crash. Do not use this option
|
||||
unless you are actually storing large files on the partition, because if you
|
||||
overcompensate you can wind up with a filesystem that has lots of free
|
||||
space remaining but cannot accomodate any more files. Using
|
||||
32768, 65536, or 262144 bytes/inode is recommended. You can go higher but
|
||||
it will have only incremental effects on fsck recovery times. For
|
||||
example,
|
||||
.Em newfs -i 32768 ...
|
||||
.Pp
|
||||
Finally, increasing the
|
||||
.Em cylinders/group
|
||||
ratio has the effect of packing the inodes closer together. This can increase
|
||||
directory performance and also decrease fsck times. If you use this option
|
||||
at all, we recommend maxing it out. Use
|
||||
.Em newfs -c 999
|
||||
and newfs will error out and tell you what the maximum is, then use that.
|
||||
.Pp
|
||||
.Xr tunefs 8
|
||||
may be used to further tune a filesystem. This command can be run in
|
||||
single-user mode without having to reformat the filesystem. However, this
|
||||
is possibly the most abused program in the system. Many people attempt to
|
||||
increase available filesystem space by setting the min-free percentage to 0.
|
||||
This can lead to severe filesystem fragmentation and we do not recommend
|
||||
that you do this. Really the only tunefs option worthwhile here is turning on
|
||||
.Em softupdates
|
||||
with
|
||||
.Em tunefs -n enable /filesystem.
|
||||
(Note: In 5.x softupdates can be turned on using the -U option to newfs).
|
||||
Softupdates drastically improves meta-data performance, mainly file
|
||||
creation and deletion. We recommend turning softupdates on on all of your
|
||||
filesystems. There are two downsides to softupdates that you should be
|
||||
aware of: First, softupdates guarentees filesystem consistency in the
|
||||
case of a crash but could very easily be several seconds (even a minute!)
|
||||
behind updating the physical disk. If you crash you may lose more work
|
||||
then otherwise. Secondly, softupdates delays the freeing of filesystem
|
||||
blocks. If you have a filesystem (such as the root filesystem) which is
|
||||
close to full, doing a major update of it, e.g.
|
||||
.Em make installworld,
|
||||
can run it out of space and cause the update to fail.
|
||||
.Sh STRIPING DISKS
|
||||
In larger systems you can stripe partitions from several drives together
|
||||
to create a much larger overall partition. Striping can also improve
|
||||
the performance of a filesystem by splitting I/O operations across two
|
||||
or more disks. The
|
||||
.Xr vinum 8
|
||||
and
|
||||
.Xr ccd 4
|
||||
utilities may be used to create simple striped filesystems. Generally
|
||||
speaking, striping smaller partitions such as the root and /var/tmp,
|
||||
or essentially read-only partitions such as /usr is a complete waste of
|
||||
time. You should only stripe partitions that require serious I/O performance...
|
||||
typically /var, /home, or custom partitions used to hold databases and web
|
||||
pages. Choosing the proper stripe size is also
|
||||
important. Filesystems tend to store meta-data on power-of-2 boundries
|
||||
and you usually want to reduce seeking rather then increase seeking. This
|
||||
means you want to use a large off-center stripe size such as 1152 sectors
|
||||
so sequential I/O does not seek both disks and so meta-data is distributed
|
||||
across both disks rather then concentrated on a single disk. If
|
||||
you really need to get sophisticated, we recommend using a real hardware
|
||||
raid controller from the list of
|
||||
.Fx
|
||||
supported controllers.
|
||||
.Sh SYSCTL TUNING
|
||||
.Pp
|
||||
There are several hundred
|
||||
.Xr sysctl 8
|
||||
variables in the system, including many that appear to be candidates for
|
||||
tuning but actually aren't. In this document we will only cover the ones
|
||||
that have the greatest effect on the system.
|
||||
.Pp
|
||||
The
|
||||
.Em kern.ipc.shm_use_phys
|
||||
sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). Setting
|
||||
this parameter to 1 will cause all SysV shared memory segments to be
|
||||
mapped to unpageable physical ram. This feature only has an effect if you
|
||||
are either (A) mapping small amounts of shared memory across many (hundreds)
|
||||
of processes, or (B) mapping large amounts of shared memory across any
|
||||
number of processes. This feature allows the kernel to remove a great deal
|
||||
of internal memory management page-tracking overhead at the cost of wiring
|
||||
the shared memory into core, making it unswappable.
|
||||
.Pp
|
||||
The
|
||||
.Em vfs.vmiodirenable
|
||||
sysctl defaults to 0 (off) (though soon it will default to 1) and may be
|
||||
set to 0 (off) or 1 (on). This parameter controls how directories are cached
|
||||
by the system. Most directories are small and use but a single fragment
|
||||
(typically 1K) in the filesystem and even less (typically 512 bytes) in
|
||||
the buffer cache. However, when operating in the default mode the buffer
|
||||
cache will only cache a fixed number of directories even if you have a huge
|
||||
amount of memory. Turning on this sysctl allows the buffer cache to use
|
||||
the VM Page Cache to cache the directories. The advantage is that all of
|
||||
memory is now available for caching directories. The disadvantage is that
|
||||
the minimum in-core memory used to cache a directory is the physical page
|
||||
size (typically 4K) rather then 512 bytes. We recommend turning this option
|
||||
on if you are running any services which manipulate large numbers of files.
|
||||
Such services can include web caches, large mail systems, and news systems.
|
||||
Turning on this option will generally not reduce performance even with the
|
||||
wasted memory but you should experiment to find out.
|
||||
.Pp
|
||||
There are various buffer-cache and VM page cache related sysctls. We do
|
||||
not recommend messing around with these at all. As of
|
||||
.Fx 4.3 ,
|
||||
the VM system does an extremely good job tuning itself.
|
||||
.Pp
|
||||
The
|
||||
.Em net.inet.tcp.sendspace
|
||||
and
|
||||
.Em net.inet.tcp.recvspace
|
||||
sysctls are of particular interest if you are running network intensive
|
||||
applications. This controls the amount of send and receive buffer space
|
||||
allowed for any given TCP connection. The default is 16K. You can often
|
||||
improve bandwidth utilization by increasing the default at the cost of
|
||||
eating up more kernel memory for each connection. We do not recommend
|
||||
increasing the defaults if you are serving hundreds or thousands of
|
||||
simultanious connections because it is possible to quickly run the system
|
||||
out of memory due to stalled connections building up. But if you need
|
||||
high bandwidth over a fewer number of connections, especially if you have
|
||||
gigabit ethernet, increasing these defaults can make a huge difference.
|
||||
You can adjust the buffer size for incoming and outgoing data separately.
|
||||
For example, if your machine is primarily doing web serving you may want
|
||||
to decrease the recvspace in order to be able to increase the sendspace
|
||||
without eating too much kernel memory. Note that the route table, see
|
||||
.Xr route 8 ,
|
||||
can be used to introduce route-specific send and receive buffer size
|
||||
defaults. As an additional mangagement tool you can use pipes in your
|
||||
firewall rules, see
|
||||
.Xr ipfw 8 ,
|
||||
to limit the bandwidth going to or from particular IP blocks or ports.
|
||||
For example, if you have a T1 you might want to limit your web traffic
|
||||
to 70% of the T1's bandwidth in order to leave the remainder available
|
||||
for mail and interactive use. Normally a heavily loaded web server
|
||||
will not introduce significant latencies into other services even if
|
||||
the network link is maxed out, but enforcing a limit can smooth things
|
||||
out and lead to longer term stability. Many people also enforce artificial
|
||||
bandwidth limitations in order to ensure that they are not charged for
|
||||
using too much bandwidth.
|
||||
.Pp
|
||||
We recommend that you turn on (set to 1) and leave on the
|
||||
.Em net.inet.tcp.always_keepalive
|
||||
control. The default is usually off. This introduces a small amount of
|
||||
additional network bandwidth but guarentees that dead tcp connections
|
||||
will eventually be recognized and cleared. Dead tcp connections are a
|
||||
particular problem on systems accesed by users operating over dialups,
|
||||
because users often disconnect their modems without properly closing active
|
||||
connections.
|
||||
.Pp
|
||||
The
|
||||
.Em kern.ipc.somaxconn
|
||||
sysctl limits the size of the listen queue for accepting new tcp connections.
|
||||
The default value of 128 is typically too low for robust handling of new
|
||||
connections in a heavily loaded web server environment. For such environments,
|
||||
we recommend increasing this value to 1024 or higher. The service daemon
|
||||
may itself limit the listen queue size (e.g. sendmail, apache) but will
|
||||
often have a directive in its configuration file to adjust the queue size up.
|
||||
Larger listen queue also do a better job of fending of denial of service
|
||||
attacks.
|
||||
.Sh KERNEL CONFIG TUNING
|
||||
.Pp
|
||||
There are a number of kernel options that you may have to fiddle with in
|
||||
a large scale system. In order to change these options you need to be
|
||||
able to compile a new kernel from source. The
|
||||
.Xr config 8
|
||||
manual page and the handbook are good starting points for learning how to
|
||||
do this. Generally the first thing you do when creating your own custom
|
||||
kernel is to strip out all the drivers and services you don't use. Removing
|
||||
things like
|
||||
.Em INET6
|
||||
and drivers you don't have will reduce the size of your kernel, sometimes
|
||||
by a megabyte or more, leaving more memory available for applications.
|
||||
.Pp
|
||||
The
|
||||
.Em maxusers
|
||||
kernel option defaults to an incredibly low value. For most modern machines,
|
||||
you probably want to increase this value to 64, 128, or 256. We do not
|
||||
recommend going above 256 unless you need a huge number of file descriptors.
|
||||
Network buffers are also affected but can be controlled with a separate
|
||||
kernel option. Do not increase maxusers just to get more network mbufs.
|
||||
.Pp
|
||||
.Em NMBCLUSTERS
|
||||
may be adjusted to increase the number of network mbufs the system is
|
||||
willing to allocate. Each cluster represents approximately 2K of memory,
|
||||
so a value of 1024 represents 2M of kernel memory reserved for network
|
||||
buffers. You can do a simple calculation to figure out how many you need.
|
||||
If you have a web server which maxes out at 1000 simultanious connections,
|
||||
and each connection eats a 16K receive and 16K send buffer, you need
|
||||
approximate 32MB worth of network buffers to deal with it. A good rule of
|
||||
thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. So for this case
|
||||
you would want to se NMBCLUSTERS to 32768. We recommend values between
|
||||
1024 and 4096 for machines with moderates amount of memory, and between 4096
|
||||
and 32768 for machines with greater amounts of memory. Under no circumstances
|
||||
should you specify an arbitrarily high value for this parameter, it could
|
||||
lead to a boot-time crash. The -m option to
|
||||
.Xr netstat 1
|
||||
may be used to observe network cluster use.
|
||||
.Pp
|
||||
More and more programs are using the
|
||||
.Fn sendfile
|
||||
system call to transmit files over the network. The
|
||||
.Em NSFBUFS
|
||||
kernel parameter controls the number of filesystem buffers
|
||||
.Fn sendfile
|
||||
is allowed to use to perform its work. This parameter nominally scales
|
||||
with
|
||||
.Em maxusers
|
||||
so you should not need to mess with this parameter except under extreme
|
||||
circumstances.
|
||||
.Pp
|
||||
.Em SCSI_DELAY
|
||||
and
|
||||
.Em IDE_DELAY
|
||||
may be used to reduce system boot times. The defaults are fairly high and
|
||||
can be responsible for 15+ seconds of delay in the boot process. Reducing
|
||||
SCSI_DELAY to 5 seconds usually works (especially with modern drives).
|
||||
Reducing IDE_DELAY also works but you have to be a little more careful.
|
||||
.Pp
|
||||
There are a number of
|
||||
.Em XXX_CPU
|
||||
options that can be commented out. If you only want the kernel to run
|
||||
on a Pentium class cpu, you can easily remove
|
||||
.Em I386_CPU
|
||||
and
|
||||
.Em I486_CPU,
|
||||
but only remove
|
||||
.Em I586_CPU
|
||||
if you are sure your cpu is being recognized as a Pentium II or better.
|
||||
Some clones may be recognized as a pentium or even a 486 and not be able
|
||||
to boot without those options. If it works, great! The operating system
|
||||
will be able to better-use higher-end cpu features for mmu, task switching,
|
||||
timebase, and even device operations. Additionally, higher-end cpus support
|
||||
4MB MMU pages which the kernel uses to map the kernel itself into memory,
|
||||
which increases its efficiency under heavy syscall loads.
|
||||
.Sh IDE WRITE CACHING
|
||||
As of
|
||||
.Fx 4.3 ,
|
||||
IDE write caching is turned off by default. This will reduce write bandwidth
|
||||
to IDE disks but is considered necessary due to serious data consistency
|
||||
issues introduced by hard drive vendors. Basically the problem is that
|
||||
IDE drives lie about when a write completes. With IDE write caching turned
|
||||
on, IDE hard drives will not only write data to disk out of order, they
|
||||
will sometimes delay some of the blocks indefinitely when under heavy disk
|
||||
loads. A crash or power failure can result in serious filesystem
|
||||
corruption. So our default is to be safe. If you are willing to risk
|
||||
filesystem corruption, you can return to the old behavior by setting the
|
||||
hw.ata.wc
|
||||
kernel variable back to 1. This must be done from the boot loader at boot
|
||||
time. Please see
|
||||
.Xr ata 4 ,
|
||||
and
|
||||
.Xr loader 8 .
|
||||
.Pp
|
||||
There is a new experimental feature for IDE hard drives called hw.ata.tags
|
||||
(you also set this in the bootloader) which allows write caching to be safely
|
||||
turned on. This brings SCSI tagging features to IDE drives. As of this
|
||||
writing only IBM DPTA and DTLA drives support the feature.
|
||||
.Sh CPU, MEMORY, DISK, NETWORK
|
||||
The type of tuning you do depends heavily on where your system begins to
|
||||
bottleneck as load increases. If your system runs out of cpu (idle times
|
||||
are pepetually 0%) then you need to consider upgrading the cpu or moving to
|
||||
an SMP motherboard (multiple cpu's), or perhaps you need to revisit the
|
||||
programs that are causing the load and try to optimize them. If your system
|
||||
is paging to swap a lot you need to consider adding more memory. If your
|
||||
system is saturating the disk you typically see high cpu idle times and
|
||||
total disk saturation.
|
||||
.Xr systat 1
|
||||
can be used to monitor this. There are many solutions to saturated disks:
|
||||
increasing memory for caching, mirroring disks, distributing operations across
|
||||
several machines, and so forth. If disk performance is an issue and you
|
||||
are using IDE drives, switching to SCSI can help a great deal. While modern
|
||||
IDE drives compare with SCSI in raw sequential bandwidth, the moment you
|
||||
start seeking around the disk SCSI drives usually win.
|
||||
.Pp
|
||||
Finally, you might run out of network suds. The first line of defense for
|
||||
improving network performance is to make sure you are using switches instead
|
||||
of hubs, especially these days where switches are almost as cheap. Hubs
|
||||
have severe problems under heavy loads due to collision backoff and one bad
|
||||
host can severely degrade the entire LAN. Second, optimize the network path
|
||||
as much as possible. For example, in
|
||||
.Xr firewall 7
|
||||
we describe a firewall protecting internal hosts with a topology where
|
||||
the externally visible hosts are not routed through it. Use 100BaseT rather
|
||||
then 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs.
|
||||
Most bottlenecks occur at the WAN link (e.g. modem, T1, DSL, whatever).
|
||||
If expanding the link is not an option it may be possible to use ipfw's
|
||||
.Sy DUMMYNET
|
||||
feature to implement peak shaving or other forms of traffic shaping to
|
||||
prevent the overloaded service (such as web services) from effecting other
|
||||
services (such as email), or vise versa. In home installations this could
|
||||
be used to give interactive traffic (your browser, ssh logins) priority
|
||||
over services you export from your box (web services, email).
|
||||
.Sh SEE ALSO
|
||||
.Pp
|
||||
.Xr ata 4 ,
|
||||
.Xr boot 8 ,
|
||||
.Xr ccd 4 ,
|
||||
.Xr config 8 ,
|
||||
.Xr disklabel 8 ,
|
||||
.Xr firewall 7 ,
|
||||
.Xr fsck 8 ,
|
||||
.Xr hier 7 ,
|
||||
.Xr ifconfig 8 ,
|
||||
.Xr ipfw 8 ,
|
||||
.Xr loader 8 ,
|
||||
.Xr login.conf 5 ,
|
||||
.Xr netstat 1 ,
|
||||
.Xr newfs 8 ,
|
||||
.Xr ports 7 ,
|
||||
.Xr route 8 ,
|
||||
.Xr sysctl 8 ,
|
||||
.Xr systat 1 ,
|
||||
.Xr tunefs 8 ,
|
||||
.Xr vinum 8
|
||||
.Sh HISTORY
|
||||
The
|
||||
.Nm
|
||||
manual page was originally written by
|
||||
.An Matthew Dillon
|
||||
and first appeared
|
||||
in
|
||||
.Fx 4.3 ,
|
||||
May 2001.
|
||||
Loading…
Reference in a new issue