opnsense-src/sys/dev/mpr
Kenneth D. Merry 175ad3d003 Fix mpr(4) and mps(4) state transitions and a use-after-free panic.
When the mpr(4) and mps(4) drivers probe a SATA device, they issue an
ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the
target is fully setup in the driver.  The drivers wait for completion of
the identify command, and have a 5 second timeout.  If the timeout
fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be
freed later.

That is where the use-after-free problem comes in.  Once the ATA
Identify times out, the driver sends a target reset, and then frees any
identify commands that have timed out.  But, once the target reset
completes, commands that were queued to the drive are returned to the
driver by the controller.

At that point, the driver (in mp{s,r}_intr_locked()) looks up the
command descriptor for that particular SMID, marks it CM_STATE_BUSY and
sends it on for completion handling.

The problem at this stage is that the command has already been freed,
and put on the free queue, so its state is CM_STATE_FREE.  If INVARIANTS
are turned on, we get a panic as soon as this command is allocated,
because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY.

So, the solution is to not free ATA Identify commands that get stuck
until they actually return from the controller.  Hopefully this works
correctly on older firmware versions.  If not, it could result in
commands hanging around indefinitely.  But, the alternative is a
use-after-free panic or assertion (in the INVARIANTS case).

This also tightens up the state transitions between CM_STATE_FREE,
CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen
once, and we have assertions to make sure that commands are in the
correct state before transitioning to the next state.  Also, for each
state assertion, we print out the current state of the command if it is
incorrect.

mp{s,r}.c:      Add a new sysctl variable, dump_reqs_alltypes,
                that controls the behavior of the dump_reqs sysctl.
                If dump_reqs_alltypes is non-zero, it will dump
                all commands, not just the commands that are in the
                CM_STATE_INQUEUE state.  (You can see the commands
                that are in the queue by using mp{s,r}util debug
                dumpreqs.)

                Make sure that the INQUEUE -> BUSY state transition
                happens in one place, the mp{s,r}_complete_command
                routine.

mp{s,r}_sas.c:  Make sure we print the current command type in
                command state assertions.

mp{s,r}_sas_lsi.c:
                Add a new completion handler,
                mp{s,r}sas_ata_id_complete.  This completion
                handler will free data allocated for an ATA
                Identify command and free the command structure.

                In mp{s,r}_ata_id_timeout, do not set the command
                state to CM_STATE_BUSY.  The command is still in
                queue in the controller.  Since we were blocking
                waiting for this command to complete, there was
                no completion handler previously.  Set the
                completion handler, so that whenever the command
                does come back, it will get freed properly.

                Do not free ATA Identify commands that have timed
                out in mp{s,r}sas_add_device().  Wait for them
                to actually come back from the controller.

mp{s,r}var.h:   Add a dump_reqs_alltypes variable for the new
                dump_reqs_alltypes sysctl.

                Make sure we print the current state for state
                transition asserts.

This was tested in the Spectra Logic test bed (as described in the
review), as well Netflix's Open Connect fleet (where panics dropped from
a dozen or two a month to zero).

Reviewed by:		imp@ (who is handling the commit with ken's OK)
Sponsored by:		Spectra Logic
Differential Revision:	https://reviews.freebsd.org/D25476
2021-06-03 13:46:11 -06:00
..
mpi mpr: clean up empty lines in .c and .h files 2020-09-01 22:07:12 +00:00
mpr.c Fix mpr(4) and mps(4) state transitions and a use-after-free panic. 2021-06-03 13:46:11 -06:00
mpr_config.c mpr: big-endian support 2021-03-02 22:21:42 -03:00
mpr_ioctl.h mpr: clean up empty lines in .c and .h files 2020-09-01 22:07:12 +00:00
mpr_mapping.c mpr/mps(4): Make device mapping some more robust. 2021-04-23 23:36:51 -04:00
mpr_mapping.h Update copyright information 2018-12-26 10:43:31 +00:00
mpr_pci.c Refine the busdma template interface. Provide tools for filling in fields 2020-09-14 05:58:12 +00:00
mpr_sas.c Fix mpr(4) and mps(4) state transitions and a use-after-free panic. 2021-06-03 13:46:11 -06:00
mpr_sas.h Before issing the REMOVE_DEVICE command to the firmware, make sure that all 2020-02-25 04:27:23 +00:00
mpr_sas_lsi.c Fix mpr(4) and mps(4) state transitions and a use-after-free panic. 2021-06-03 13:46:11 -06:00
mpr_table.c mpr: big-endian support 2021-03-02 22:21:42 -03:00
mpr_table.h Convert some in-line printing of diagnostic into tables. 2017-09-09 22:02:36 +00:00
mpr_user.c mpr, mps: Fix an off-by-one bug in the BTDH_MAPPING ioctl 2021-01-08 13:32:05 -05:00
mprvar.h Fix mpr(4) and mps(4) state transitions and a use-after-free panic. 2021-06-03 13:46:11 -06:00