... and thus retire debug.kdb.stop_cpus tunable/sysctl.
The knob was to work around CPU stopping issues, which since have been
either fixed or greatly reduced. kdb should really operate in a special
environment with scheduler stopped and interrupts disabled to provide
deterministic debugging.
Discussed with: attilio, rwatson
X-MFC after: 2 months or never
... and also increase the timeout.
It's better to try to proceed somehow despite stuck CPUs than to hang
indefinitely. Especially so during shutdown and when entering kdb or panic.
Timeout value is still an aribitrary value.
Timeout diagnostic is just a printf; the work on something more
debuggable is planned by attilio. Need to be careful here as
stop_cpus_hard is called very early while enetering kdb and soon(-ish)
it may become called very early when entering panic.
Reviewed by: attilio
MFC after: 2 months
o Setting td_intr_frame to the XIVs trap frame because it's referenced
by the ET event handler.
o Signal EOI to the CPU before calling the registered XIV handlers.
This prevents lost ITC interrupts, which cause starvation in one-shot
mode.
o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters.
o Have the APs call cpu_initclocks() so as to limited the scattering of
clock related initialization. cpu_initclocks() calls the <self>_bsp()
or <self>_ap() version accordingly.
o Uncomment the ET clock handling in cpu_idle().
o Update the DDB 'show pcpu' output for the new MD fields.
o Entirely rewritten ia64_ih_clock(). Note that we don't create as many
clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale.
We can only have 240 XIVs and we can have more CPUs than that. There's
a single intrcnt index for the cumulative clock ticks and we keep per
CPU counts in the PCPU stats structure.
o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order).
Open issues:
o Clock interrupts can still be lost. Some tweaking is still necessary.
Thanks to: mav@ for his support, feedback and explanations.
ET stats while committing:
eris% sysctl machdep.cpu | grep nclks
machdep.cpu.0.nclks: 24007
machdep.cpu.1.nclks: 22895
machdep.cpu.2.nclks: 13523
machdep.cpu.3.nclks: 9342
machdep.cpu.4.nclks: 9103
machdep.cpu.5.nclks: 9298
machdep.cpu.6.nclks: 10039
machdep.cpu.7.nclks: 9479
eris% vmstat -i | grep clock
clock 108599 50
dynamically loaded device drivers get a chance to run their event hooks.
- Decouple the USB suspend and resume lock from witness. It produces some
false warnings due to reusing the lock name among multiple devices.
MFC after: 3 days
is now required by bus_autoconf.
- Allow interface class matching even if device class is vendor specific.
- Update bus_autoconf tool to not generate system and subsystem match lines
for the nomatch event.
PR: misc/157903
MFC after: 14 days
sorted according to the mode which they support:
host, device or dual mode
- Add generic tool to extract these data:
tools/bus_autoconf
Discussed with: imp
Suggested by: Robert Millan <rmh@debian.org>
PR: misc/157903
MFC after: 14 days
instead of a PCPU field for curthread. This averts a race on SMP systems
with a high interrupt rate where the thread looking up the value of
curthread could be preempted and migrated between obtaining the PCPU
pointer and reading the value of pc_curthread, resulting in curthread being
observed to be the current thread on the thread's original CPU. This played
merry havoc with the system, in particular with mutexes. Many thanks to
jhb for helping me work this one out.
Note that Book-E is in principle susceptible to the same problem, but has
not been modified yet due to lack of Book-E hardware.
MFC after: 2 weeks
some interesting bugs (mostly on SMP systems) with atomic operations
silently failing in interrupt heavy situations, especially when using
overflow pages.
switch the region registers. pmap_switch() returns the pmap for
which the region register are currently programmed, which needs
to be re-programmed on the CPU the ougoing thread gets switched
in. This change does not noticibly change anything or fix known
bugs, but does give me a warm fuzzy feeling by being more
correct.
unhappy (probably they don't handle crossing the 64k boundary, etc.).
Fix this by changing zfsldr to use a loop reading from the disk one
sector at a time. To avoid trashing the saved copy of the MBR which is
used for disk I/O, read zfsboot2 at address 0x9000. This has the
advantage that BTX no longer needs to be relocated as it is read into
the correct location. However, the loop to relocate zfsboot2.bin can
now cross a 64k boundary, so change it to use relative segments instead.
(This will need further work if zfsboot2.bin ever exceeds 64k.)
While here, stop storing a relocated copy of zfsldr at 0x700. This was
only used by the xread hack which has recently been removed (and even
that use was dubious). Also, include the BIOS error code as hex when
reporting read errors to aid in debugging.
Much thanks to Henri Hennebert for patiently testing various iterations
of the patch as well as fixing the zfsboot2.bin relocation to use
relative segments.
MFC after: 1 week
to do about the few cases where the HAL state isn't available (regdomain)
or isn't yet setup (probe/attach.)
The global ath_hal_debug now affects all instances of the HAL.
This also restores the ability for probe/attach debugging to work; as
the sysctl tree may not be attached at that point. Users can just set
the global "hw.ath.hal.debug" to a suitable value to enable probe/attach
related debugging.
(obvious in retrospect) in which interrupts on one CPU that are temporarily
masked can end up permanently masked when a handler on another CPU clobbers
the interrupt mask register with an old copy.
rather than global variables.
This specifically allows for debugging to be enabled per-NIC, rather
than globally.
Since the ath driver doesn't know about AH_DEBUG, and to keep the ABI
consistent regardless of whether AH_DEBUG is enabled or not, enable the
debug parameter always but only conditionally compile in the debug
methods if needed.
The ALQ support is currently still global pending some brainstorming.
Submitted by: ssgriffonuser@gmail.com
Reviewed by: adrian, bschmidt
This was previously done only for SCSI XPT in r223081, on which the change
in r223089 depended in order to respond to serial number requests. As a
result of r223089, da(4) and ada(4) devices register a d_getattr for geom to
use to obtain the information.
Reported by: ache
Reviewed by: ken
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.
MFC after: 2 weeks
alias address needs to be specified.
Add inbound handler to the alias_ftp module. It helps handle active
FTP transfer mode for the case with external clients and FTP server behind
NAT. Fix passive FTP transfer case for server behind NAT using redirect with
external IP address different from NAT ip address.
PR: kern/157957
Submitted by: Alexander V. Chernikov
was used for doing a mount when performing system operations
on AUTH_SYS mounts. This resolved an issue when mounting
a Linux server. Found during testing at the recent
NFSv4 interoperability Bakeathon.
MFC after: 2 weeks
processors unless the invariant TSC bit of CPUID is set. Intel processors
may stop incrementing TSC when DPSLP# pin is asserted, according to Intel
processor manuals, i. e., TSC timecounter is useless if the processor can
enter deep sleep state (C3/C4). This problem was accidentally uncovered by
r222869, which increased timecounter quality of P-state invariant TSC, e.g.,
for Core2 Duo T5870 (Family 6, Model f) and Atom N270 (Family 6, Model 1c).
Reported by: Fabian Keil (freebsd-listen at fabiankeil dot de)
Ian FREISLICH (ianf at clue dot co dot za)
Tested by: Fabian Keil (freebsd-listen at fabiankeil dot de)
- Core2 Duo T5870 (C3 state available/enabled)
jkim - Xeon X5150 (C3 state unavailable)
resource allocation from an x86 Host-PCI bridge driver so that it can be
reused by the ACPI Host-PCI bridge driver (and eventually the MPTable
Host-PCI bridge driver) instead of duplicating the same logic. Note that
this means that hw.acpi.host_mem_start is now replaced with the
hw.pci.host_mem_start tunable that was already used in the non-ACPI case.
This also removes hw.acpi.host_mem_start on ia64 where it was not
applicable (the implementation was very x86-specific).
While here, adjust the logic to apply the new start address on any
"wildcard" allocation even if that allocation comes from a subset of
the allowable address range.
Reviewed by: imp (1)
register both status change and link state change callbacks.
Implement checking valid link in state change callback and poll
active link state in vr_tick(). This allows immediate detection of
lost link as well as protecting driver from frequent link flips during
link renegotiation. taskq implementation was removed because driver
now needs to poll link state in vr_tick().
While I'm here do not report current link state if interface is not
running.
Tested by: n_hibma
MFC after: 1 week
o Consider No CIS a normal event and stop whining about it so much
(too many cards are like this, espeically usb/firewire cards).
o Add comments to the cis reading code.
o Made the read from config space a smidge easier to read and eliminate
a loop that can be done mathematically.
present. Only call the bus to check if we actually do timeout so we
don't affect the normal case (since this case needn't be optimized and
this guards against all races).
child is still present. If not, return 'handled' and don't print
anything (this is expected behavior). We expect an interrupt on eject,
power-down and/or shutdown.
particular flow control and dma coalesce. Also improve the
sysctl operation on those too.
Add IPv6 detection in the ioctl code, this was done for
ixgbe first, carrying that over.
Add resource ability to disable particular adapter.
Add HW TSO capability so vlans can make use of TSO
operations while traversing non-exported file systems. This is
required for some non-FreeBSD clients to do NFSv4 mounts. Found during
the recent NFSv4 interoperability Bakeathon.
MFC after: 2 weeks
Since the only parameter that we check is size of bootcode, then
allow only two sizes: size of boot1 and size of /boot/boot.
This partially protects users from losing ability to boot if incorrect
bootcode is specified.
Requested by: ru
TX for the given TID needs to be paused during ADDBA requests (and unpaused
once the session is established.) Since net80211 currently doesn't implement
software aggregation, if this pause/unpause is done in the driver (as it
is in my development branch) then it will need to be unpaused both on
ADDBA response and on ADDBA timeout.
This callback allows the driver to unpause TX for the relevant TID.
Reviewed by: bschmidt
IPsec being compiled in and used. Improve reporting by adding the length
fields to the panic message, so that we would have some immediate debugging
hints.
Discussed with: jhb
the future, but presents a set of simple block devices for now. With
(forthcoming) boot loader support or vfs.root.mountfrom, allows booting
PS3s from disk.
Submitted by: glevand <geoffrey.levand@mail.ru>
lock the mutex when manipulating rc_flag in the DRC cache.
This is believed to fix a hung server that was reported
to the freebsd-fs@ list on June 9 under the subject heading
"New NFS server stress test hang", where all the threads
were waiting for the RC_LOCKED flag to clear.
Tested by: jwd at slowblink.com
MFC after: 2 weeks
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.
Reviewed by: jhb
MFC after: 2 weeks
(Saying that the lock on the object that the page belongs to must be held
only represents one aspect of the rules.)
Eliminate the use of the page queues lock for atomically performing read-
modify-write operations on the dirty field when the underlying architecture
supports atomic operations on char and short types.
Document the fact that 32KB pages aren't really supported.
Reviewed by: attilio, kib
Some of loader filesystems are very ill equipped to handle seeking
backwards within the file. Namely, tftp requires trasfer to be
restarted from the start of the file every time we go backwards.
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.
interrupts. Bringup on additional machine models repeatedly reveals
firmware that enables interrupts behind our back, causing the console
to be flooded otherwise.
- As with the regular interrupt counters using uint16_t instead of
u_long for counting the stray vector interrupts should be more than
sufficient.
- Cache the interrupt vector in intr_stray_vector().
On MP systems this is not a usable solution anymore and could easily
lead to false positives triggering enough logging that even using
the console was no longer usable (multiple parallel ping -f can do).
Switch to the suggested solution of using mbuf tags to carry per
packet state between gre_output() invocations. Contrary to the
proposed solution modelled after gif(4) only allocate one mbuf tag
per packet rather than per packet and per gre_output() pass through.
As the sysctl to control the possible valid (gre in gre) nestings does
no sanity checks, make sure to always allocate space in the mbuf tag
for at least one, and at most 255 possible gre interfaces to detect
loops in addition to the counter.
Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version)
PR: kern/114714
Reviewed by: Cristian KLEIN (cristi net.utcluj.ro)
Reviewed bu: Wooseog Choi (ben_choi hotmail.com)
Sponsored by: Sandvine Incorporated
MFC after: 1 week
proceeding.
On boot, some laptops with certain cards in them sometimes fail on
boot, but if the card is inserted after boot it works. Experiments
show that small delays here makes things more reliable. It is
believed that some combinations need a little more time before the
power on the card is really stable enough to be reliable once the
power is stable in the bridge.
some times compiler inserts redundant instructions to preserve unused upper
32 bits even when it is casted to a 32-bit value. Unfortunately, it seems
the problem becomes more serious when it is shifted, especially on amd64.
ACPI Device() objects that do not have any device IDs available via the
_HID or _CID methods. Without a device ID a device driver cannot attach
to the device anyway. Namespace objects that are devices but not of
type ACPI_TYPE_DEVICE are not affected.
A few BIOSes have also attached a _CRS method to a PCI device to
allocate resources that are not managed via a BAR. With the previous
code those resources are allocated from acpi0 directly which can interfere
with the new PCI-PCI bridge driver (since the PCI device in question may
be behind a bridge and its resources should be allocated from that
bridge's windows instead). The resources were also orphaned and
and would end up associated with some other random device whose device_t
reused the pointer of the original ACPI-enumerated device (after it was
free'd by the ACPI PCI bus driver) in devinfo output which was confusing.
If we want to handle _CRS on PCI devices we can adjust the ACPI PCI bus
driver to do that in the future and associate the resources with the
proper device object respecting PCI-PCI bridges, etc.
Note that with this change the ACPI PCI bus driver no longer has to
delete ACPI-enumerated device_t devices that mirror PCI devices since
they should in general not exist. There are rare cases when a BIOS
will give a PCI device a _HID (e.g. I've seen a PCI-ISA bridge given
a _HID for a system resource device). In that case we leave both the
ACPI and PCI-enumerated device_t objects around just as in the previous
code.
method instead of reusing the existing per-queue interrupt task.
Reusing the per-queue interrupt task could result in both an interrupt
thread and the taskqueue thread trying to handle received packets on a
single queue resulting in out-of-order packet processing.
- Don't define igb_start() at all on 8.0 and where if_transmit is used.
Replace last remaining call to igb_start() with a loop to kick off
transmit on each queue instead.
- Call ether_ifdetach() earlier in igb_detach().
- Drain tasks and free taskqueues during igb_detach().
Reviewed by: jfv
MFC after: 1 week
space. This is consistent with the behavior in linux.
PR: kern/157871
Reported by: Petr Salinger <Petr Salinger att seznam cz>
Verified on: GNU/kFreeBSD debian 8.2-1-amd64 (by reporter)
Reviewed by: kib (some time ago)
MFC after: 2 weeks
stream of the local processor. Also explicitly invalidate
the ALAT. This is done on the other CPUs in the coherence
domain by virtue of the ptc.ga instruction, but does not
apply to the local CPU.
and usr.sbin/makefs/ffs/ffs_subr.c as they have no need of anything in that
file. No other programs or libraries include <ufs/ffs/ffs_extern.h> (nor
should they as it is totally in-kernel interfaces). For added protection
I enclosed the entire contents of <ufs/ffs/ffs_extern.h> in ifdef _KERNEL.
Feedback from: Bruce Evans and Tai-hwa Liang
have to ignore it when sending the IPI anyway. Actually I can't think of
a good reason why this ever was done that way in the first place as it's
not even usefull for debugging.
While at it replace the use of pc_other_cpus as it's slated for deorbit.
messages for a filesystem being out of space need to be moved so that
they do not print out until after a failed cleanup attempt.
Suggested by: Jeff Roberson
Modify the "alternate break sequence" detecting state
machine so that only a contiguous invocation of the
break sequence is accepted. The old implementation
did not reset the state machine when detecting an
unexpected character.
While here, use an enum for the states of the machine
instead of magic numbers.bmitted by:
Sponsored by: Spectra Logic Corporation
announced during boot and contains the port number. The pnpinfo string
lists the port type (PUC_TYPE_* constants).
Tested by: Boris Samorodov bsam ipt ru
MFC after: 1 week
DEVFS, and make it accessible via the diskinfo utility.
Extend GEOM's generic attribute query mechanism into generic disk consumers.
sys/geom/geom_disk.c:
sys/geom/geom_disk.h:
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Allow disk providers to implement a new method which can override
the default BIO_GETATTR response, d_getattr(struct bio *). This
function returns -1 if not handled, otherwise it returns 0 or an
errno to be passed to g_io_deliver().
sys/cam/scsi/scsi_da.c:
sys/cam/ata/ata_da.c:
- Don't copy the serial number to dp->d_ident anymore, as the CAM XPT
is now responsible for returning this information via
d_getattr()->(a)dagetattr()->xpt_getatr().
sys/geom/geom_dev.c:
- Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM
attribute "GEOM::physpath", if possible. If the attribute request
returns a zero-length string, ENOENT is returned.
usr.sbin/diskinfo/diskinfo.c:
- If the DIOCGPHYSPATH ioctl is successful, report physical path
data when diskinfo is executed with the '-v' option.
Submitted by: will
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation
Add generic attribute change notification support to GEOM.
sys/sys/geom/geom.h:
Add a new attrchanged method field to both g_class
and g_geom.
sys/sys/geom/geom.h:
sys/geom/geom_event.c:
- Provide the g_attr_changed() function that providers
can use to advertise attribute changes.
- Perform delivery of attribute change notifications
from a thread context via the standard GEOM event
mechanism.
sys/geom/geom_subr.c:
Inherit the attrchanged method from class to geom (class instance).
sys/geom/geom_disk.c:
Provide disk_attr_changed() to provide g_attr_changed() access
to consumers of the disk API.
sys/cam/scsi/scsi_pass.c:
sys/cam/scsi/scsi_da.c:
sys/geom/geom_dev.c:
sys/geom/geom_disk.c:
Use attribute changed events to track updates to physical path
information.
sys/cam/scsi/scsi_da.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, and
the updated buffer type references our physical path
attribute, emit a GEOM attribute changed event via the
disk_attr_changed() API.
sys/cam/scsi/scsi_pass.c:
Add AC_ADVINFO_CHANGED to the registered asynchronous CAM
events for this driver. When this event occurs, update
the physical patch devfs alias for this pass instance.
Submitted by: gibbs
Sponsored by: Spectra Logic Corporation
sys/kern/kern_conf.c:
Add make_dev_physpath_alias(). This interface takes
the parent cdev of the alias, an old alias cdev (if any)
to replace with the newly created alias, and the physical
path string. The alias is visiable as a symlink to the
parent, with the same name as the parent, rooted at
physpath in devfs.
Note: make_dev_physpath_alias() has hard coded knowledge of the
Solaris style prefix convention for physical path data,
"id1,". In the future, I expect the convention to change
to allow "physical path quality" to be reported in the
prefix. For example, a physical path based on NewBus
topology would be of "lower quality" than a physical path
reported by a device enclosure.
Sponsored by: Spectra Logic Corporation
- Only attempt the closing synchronize cache on a disk
if it is still there.
- When a device is lost, report the number of outstanding
I/Os as they are drained.
- When a device is lost, return any unprocessed bios with
ENXIO instead of EIO.
- Filter asynchronous events, but always allow cam_periph_async()
to see them too.
Sponsored by: Spectra Logic Corporation
other device attributes stored in the CAM Existing Device Table (EDT).
This includes some infrastructure requried by the enclosure services
driver to export physical path information.
Make the CAM device advanced info interface accept store requests.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
- Replace scsi_get_sas_addr() with a scsi_get_devid() which takes
a callback that decides whether to accept a particular descriptor.
Provide callbacks for NAA IEEE Registered addresses and for SAS
addresses, replacing the old function. This is needed because
the old function doesn't work for an enclosure address for a SAS
device, which is not flagged as a SAS address, but is NAA IEEE
Registered. It may be worthwhile merging this interface with the
devid match interface.
- Add a few more defines for some device ID fields.
sbin/camcontrol/camcontrol.c:
- Update for the CCB_DEV_ADVINFO interface change.
cam/cam_xpt_internal.h:
- Add the new fields for the physical path string to the CAM EDT.
cam/cam_ccb.h:
- Rename CCB_GDEV_ADVINFO to simply CCB_DEV_ADVINFO, and the ccb
structure to ccb_dev_advinfo.
- Add a flag that changes this CCB's action to store, rather than
the default, retrieve.
- Add a new buffer type, CDAI_TYPE_PHYS_PATH, for the new CAM EDT
physpath field.
- Remove the never-implemented transport & proto flags.
cam/cam_xpt.c:
cam/cam_xpt.h:
- Add xpt_getattr(), which provides a wrapper for fetching a device's
attribute using the GEOM strings as key. This method currently
supports "GEOM::ident" and "GEOM::physpath".
Submitted by: will
Reviewed by : gibbs
Extend the XPT_DEV_MATCH api to allow a device search by device ID.
As far as the API is concerned, device ID is a binary blob to be
interpreted by the transport layer. The SCSI implementation assumes
it is an array of VPD device ID descriptors.
sys/cam/cam_ccb.h:
Create a new structure, device_id_match_pattern, and
update the XPT_DEV_MATCH datastructures and flags so
that this pattern type can be used.
sys/cam/cam_xpt.c:
- A single pattern matching on both inquiry data and device
ID is invalid. Report any violators.
- Pass device ID match requests through to the new routine
scsi_devid_match(). The direct call of a SCSI routine is
a layering violation, but no worse than the one a few
lines up that checks inquiry data. Defer cleaning this
up until our future, larger, rototilling of CAM.
- Zero out cam_ed and cam_et nodes on allocation. Prior to
this change, device_id_len and device_id were not inialized,
preventing proper detection of the presence of this
information.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
Add the scsi_match_devid() routine.
Add a helper function for extracting peripherial driver names
sys/cam/cam_periph.c:
sys/cam/cam_periph.h:
Add the cam_periph_list() method which fills an sbuf
with a comma delimited list of the peripheral instances
associated with a given CAM path.
Add a helper functions for SCSI commands used by the SES driver.
sys/cam/scsi/scsi_all.c:
sys/cam/scsi/scsi_all.h:
Add structure definitions and csio filling functions for
the receive diagnostic results and send diagnostic commands.
Misc CAM XPT cleanups.
sys/cam/cam_xpt.c:
Broadcast AC_FOUND_DEVICE and AC_PATH_REGISTERED
events at the time async event handlers are attached
even when registering just for events on a partitular
SIM. Previously, you had to register for these
events on all SIMs in the system in order to get
the initial broadcast even though subsequent device
and path arrivals would be delivered.
sys/cam/cam_xpt.c:
Remove SIM mutex held asserts from path accessors.
CAM paths are reference counted and it is this
reference count, not the sim mutex, that garantees
they are stable.
Sponsored by: Spectra Logic Corporation
"globalport" option for multiple NAT instances.
If ipfw rule contains "global" keyword instead of nat_number, then
for each outgoing packet ipfw_nat looks up translation state in all
configured nat instances. If an entry is found, packet aliased
according to that entry, otherwise packet is passed unchanged.
User can specify "skip_global" option in NAT configuration to exclude
an instance from the lookup in global mode.
PR: kern/157867
Submitted by: Alexander V. Chernikov (previous version)
Tested by: Eugene Grosbein
Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c
(asked on -current a week ago, no feedback so i assume no objection).
to the check_uidgid() function, since it contains all needed arguments
and also pointer to mbuf and now it is possible use in_pcblookup_mbuf()
function.
Since i can not test it for the non-FreeBSD case, i keep this ifdef
unchanged.
Tested by: Alexander V. Chernikov
MFC after: 3 weeks
device node has been created, pass MAKEDEV_CHECKNAME in so that the devfs
code will do the check.
Use a regular static variable as before, that's good enough to keep us from
calling into devfs most of the time.
Suggested by: kib
MFC after: 1 week
Sponsored by: Spectra Logic Corporation
In devstat_new_entry(), there is no need to initialize the queue
and the mutex in this function. There are ways to do static
initialization on both, so use STAILQ_HEAD_INITIALIZER and
MTX_SYSINIT to initialize the queue and the mutex.
In devstat_alloc(), use an atomic test and set routine to guard
making our entry in /dev. Using just a plain static variable
creates a race condition on multiprocessor machines. If you
attempt to create a second entry in devfs, the kernel will panic.
Submitted by: kdm
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation
MFC after: 1 week.
sys/dev/xen/blkback/blkback.c:
o Implement front-end request coalescing. This greatly improves the
performance of front-end clients that are unaware of the dynamic
request-size/number of requests negotiation available in the
FreeBSD backend driver. This required a large restructuring
in how this driver records in-flight transactions and how those
transactions are mapped into kernel KVA. For example, the driver
now includes a mini "KVA manager" that allocates ranges of
contiguous KVA to patches of requests that are physically
contiguous in the backing store so that a single bio or UIO
segment can be used to represent the I/O.
o Refuse to open any backend files or devices if the system
has yet to mount root. This avoids a panic.
o Properly handle "onlined" devices. An "onlined" backend
device stays attached to its backing store across front-end
disconnections. This feature is intended to reduce latency
when a front-end does a hand-off to another driver (e.g.
PV aware bootloader to OS kernel) or during a VM reboot.
o Harden the driver against a pathological/buggy front-end
by carefully vetting front-end XenStore data such as the
front-end state.
o Add sysctls that report the negotiated number of
segments per-request and the number of requests that
can be concurrently in flight.
Submitted by: kdm
Reviewed by: gibbs
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
(rcv_nxt) if we advertising a zero window. This can be true when ACK'ing
a window probe whose one byte payload was accepted rather than dropped
because the socket's receive buffer was not completely full, but the
remaining space was smaller than the window scale.
This ensures that window probe ACKs satisfy the assumption made in r221346
and closes a window where rcv_nxt could be greater than rcv_adv.
Tested by: trasz, pho, trociny
Reviewed by: silby
MFC after: 1 week
vm_page_undirty(). The assert is not precise due to VPO_BUSY owner
to tracked, so assertion does not catch the case when VPO_BUSY is
owned by other thread.
Reviewed by: alc
The generic sound driver has been added, along with enough
device-specific drivers to support the most common audio
chipsets.
We've discussed enabling it from time to time over the years
and we've received numerous requests from users, so we decided
that shipping 9.0 with working audio by default would be the
best thing to do.
Bug reports should be sent to the multimedia@ mailing list, as
usual.
Approved by: mav
No objection: re
of the devices we manage. These changes can be due to writes
we make ourselves or due to changes made by the control domain.
The goal of these changes is to insure that all state transitions
can be detected regardless of their source and to allow common
device policies (e.g. "onlined" backend devices) to be centralized
in the XenBus bus code.
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbus_if.m:
Add a new method for XenBus drivers "localend_changed".
This method is invoked whenever a write is detected to
a device's XenBus tree. The default implementation of
this method is a no-op.
sys/xen/xenbus/xenbus_if.m:
sys/dev/xen/netfront/netfront.c:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
Change the signature of the "otherend_changed" method.
This notification cannot fail, so it should return void.
sys/xen/xenbus/xenbusb_back.c:
Add "online" device handling to the XenBus Back Bus
support code. An online backend device remains active
after a front-end detaches as a reconnect is expected
to occur in the near future.
sys/xen/interface/io/xenbus.h:
Add comment block further explaining the meaning and
driver responsibilities associated with the XenBus
Closed state.
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
sys/xen/xenbus/xenbusb_back.c:
sys/xen/xenbus/xenbusb_front.c:
sys/xen/xenbus/xenbusb_if.m:
o Register a XenStore watch against the local XenBus tree
for all devices.
o Cache the string length of the path to our local tree.
o Allow the xenbus front and back drivers to hook/filter both
local and otherend watch processing.
o Update the device ivar version of "state" when we detect
a XenStore update of that node.
sys/dev/xen/control/control.c:
sys/xen/xenbus/xenbus.c:
sys/xen/xenbus/xenbusb.c:
sys/xen/xenbus/xenbusb.h:
sys/xen/xenbus/xenbusvar.h:
sys/xen/xenstore/xenstorevar.h:
Allow clients of the XenStore watch mechanism to attach
a single uintptr_t worth of client data to the watch.
This removes the need to carefully place client watch
data within enclosing objects so that a cast or offsetof
calculation can be used to convert from watch to enclosing
object.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.
- Append partial truncation freework structures to indirdeps while
truncation is proceeding. These prevent new block pointers from
becoming valid until truncation completes and serialize truncations.
- On completion of a partial truncate journal work waits for zeroed
pointers to hit indirects.
- softdep_journal_freeblocks() handles last frag allocation and last
block zeroing.
- vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
is only implemented in one place.
- Block allocation failure handling moved up one level so it does not
proceed with buf locks held. This permits us to do more extensive
reclaims when filesystem space is exhausted.
- softdep_sync_metadata() is broken into two parts, the first executes
once at the start of ffs_syncvnode() and flushes truncations and
inode dependencies. The second is called on each locked buf. This
eliminates excessive looping and rollbacks.
- Improve the mechanism in process_worklist_item() that handles
acquiring vnode locks for handle_workitem_remove() so that it works
more generally and does not loop excessively over the same worklist
items on each call.
- Don't corrupt directories by zeroing the tail in fsck. This is only
done for regular files.
- Push a fsync complete record for files that need it so the checker
knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts)
Tested by: pho
partially added a name. Allow ufs_direnter() to continue in the
hopes that it is a transient error. If it is not, the directory
is corrupted already from IO errors and writing this new block
is not likely to make things worse.
- Fix races on setting AAC_AIFFLAGS_ALLOCFIBS
- Remove some unused AAC_IFFLAGS_* bits.
Please note that the kthread still makes a difference between the
total mask and AAC_AIFFLAGS_ALLOCFIBS because more flags may be
added in the future to aifflags.
Sponsored by: Sandvine Incorporated
Reported and reviewed by: emaste
MFC after: 2 weeks
OpenSolaris and ZFS header files. These changes are sufficient
to allow a C++ program to use the libzfs library.
Note: The majority of these files already included 'extern "C"'
declarations, so the intention of providing C++ compatibility
already existed even if it wasn't provided.
cddl/compat/opensolaris/include/assert.h:
Wrap our compatibility assert implementation in
'extern "C"'. Since this is a compatibility header
I matched the Solaris style of doing this explicitly
rather than rely on FreeBSD's __BEGIN/END_DECLS macro.
sys/cddl/compat/opensolaris/sys/kstat.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_pool.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/ddt.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h:
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h:
Rename parameters in function declarations that conflict
with C++ keywords. This was the solution preferred by
members of the Illumos community.
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h:
In C, nested structures are visible in the global namespace,
but in C++, they take on the namespace of the structure in
which they are contained. Flatten nested structure
definitions within struct zfs_cmd so these structures are
visible in the global namespace when compiled in both
languages.
Sponsored by: Spectra Logic Corporation
but has only 2 SATA ports instead of 4. The worst part is that SStatus and
SError registers for missing ports are not implemented and return wrong
values (0xffffffff), that caused infinite reset loop.
Just ignore that SError value while I found no better way to identify them.
- Re-add accidentally removed atomic op. for sysctl(9) handler.
- Remove a period(`.') at the end of a debugging message.
- Consistently spell "low" for "TSC-low" timecounter throughout.
Pointed out by: bde
increase robustness (no more calls to panic(9)) and simplify
code.
- Allocate RX/TX data structures as a single buffer rather than
an array of 4KB pages to simplify code.
- Fixed LRO (aka TPA) code. Removed kernel module parameter and
support enabling disabling LRO through ifconfig(8) command line.
LRO is still disabled by default but should be enabled for best
performance on an endpoint device.
- Fixed statistcs code and removed kernel module parameter (stats
should just work).
- Added many software counters to help identify the cause of some
performance issues.
- Streamlined adapter internal init/stop code paths.
- Fiddled with debug code (adding some here, removing some there).
- Continued style(9) adjustments.
invariant. For SMP case (TSC-low), it also has to pass SMP synchronization
test and the CPU vendor/model has to be white-listed explicitly. Currently,
all Intel CPUs and single-socket AMD Family 15h processors are listed here.
Discussed with: hackers
TSC timecounter if TSC frequency is higher than ~4.29 MHz (or 2^32-1 Hz) or
multiple CPUs are present. The "TSC-low" frequency is always lower than a
preset maximum value and derived from TSC frequency (by being halved until
it becomes lower than the maximum). Note the maximum value for SMP case is
significantly lower than UP case because we want to reduce (rare but known)
"temporal anomalies" caused by non-serialized RDTSC instruction. Normally,
it is still higher than "ACPI-fast" timecounter frequency (which was default
timecounter hardware for long time until r222222) to be useful.
interleaving.
Signal dumping to happen only for the first panic which should be the
most important.
Sponsored by: Sandvine Incorporated
Submitted by: Nima Misaghian (nmisaghian AT sandvine DOT com)
MFC after: 2 weeks
- Add retry loops in the i2c read/write functions.
- Combied the ADC channel selection and readout of the value into
one iicbus_transfer to avoid possible races.
Reviewed by: nwhitehorn