2005-03-10 20:39:57 -05:00
|
|
|
/*-
|
2017-11-27 09:52:40 -05:00
|
|
|
* SPDX-License-Identifier: BSD-3-Clause
|
|
|
|
|
*
|
2005-03-10 20:39:57 -05:00
|
|
|
* Copyright (c) 2005 John Bicket
|
|
|
|
|
* All rights reserved.
|
|
|
|
|
*
|
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
|
* are met:
|
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
|
* notice, this list of conditions and the following disclaimer,
|
|
|
|
|
* without modification.
|
|
|
|
|
* 2. Redistributions in binary form must reproduce at minimum a disclaimer
|
|
|
|
|
* similar to the "NO WARRANTY" disclaimer below ("Disclaimer") and any
|
|
|
|
|
* redistribution must be conditioned upon including a substantially
|
|
|
|
|
* similar Disclaimer requirement for further binary redistribution.
|
|
|
|
|
* 3. Neither the names of the above-listed copyright holders nor the names
|
|
|
|
|
* of any contributors may be used to endorse or promote products derived
|
|
|
|
|
* from this software without specific prior written permission.
|
|
|
|
|
*
|
|
|
|
|
* Alternatively, this software may be distributed under the terms of the
|
|
|
|
|
* GNU General Public License ("GPL") version 2 as published by the Free
|
|
|
|
|
* Software Foundation.
|
|
|
|
|
*
|
|
|
|
|
* NO WARRANTY
|
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
|
|
|
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
|
|
|
* LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTIBILITY
|
|
|
|
|
* AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
|
|
|
|
|
* THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY,
|
|
|
|
|
* OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
|
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
|
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
|
|
|
|
|
* IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
|
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
|
|
|
|
|
* THE POSSIBILITY OF SUCH DAMAGES.
|
|
|
|
|
*
|
|
|
|
|
* $FreeBSD$
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Defintions for the Atheros Wireless LAN controller driver.
|
|
|
|
|
*/
|
|
|
|
|
#ifndef _DEV_ATH_RATE_SAMPLE_H
|
|
|
|
|
#define _DEV_ATH_RATE_SAMPLE_H
|
|
|
|
|
|
|
|
|
|
/* per-device state */
|
|
|
|
|
struct sample_softc {
|
2008-11-30 14:06:35 -05:00
|
|
|
struct ath_ratectrl arc; /* base class */
|
|
|
|
|
int smoothing_rate; /* ewma percentage [0..99] */
|
|
|
|
|
int smoothing_minpackets;
|
|
|
|
|
int sample_rate; /* %time to try different tx rates */
|
|
|
|
|
int max_successive_failures;
|
|
|
|
|
int stale_failure_timeout; /* how long to honor max_successive_failures */
|
|
|
|
|
int min_switch; /* min time between rate changes */
|
Introduce TX aggregation and software TX queue management
for Atheros AR5416 and later wireless devices.
This is a very large commit - the complete history can be
found in the user/adrian/if_ath_tx branch.
Legacy (ie, pre-AR5416) devices also use the per-software
TXQ support and (in theory) can support non-aggregation
ADDBA sessions. However, the net80211 stack doesn't currently
support this.
In summary:
TX path:
* queued frames normally go onto a per-TID, per-node queue
* some special frames (eg ADDBA control frames) are thrown
directly onto the relevant hardware queue so they can
go out before any software queued frames are queued.
* Add methods to create, suspend, resume and tear down an
aggregation session.
* Add in software retransmission of both normal and aggregate
frames.
* Add in completion handling of aggregate frames, including
parsing the block ack bitmap provided by the hardware.
* Write an aggregation function which can assemble frames into
an aggregate based on the selected rate control and channel
configuration.
* The per-TID queues are locked based on their target hardware
TX queue. This matches what ath9k/atheros does, and thus
simplified porting over some of the aggregation logic.
* When doing TX aggregation, stick the sequence number allocation
in the TX path rather than net80211 TX path, and protect it
by the TXQ lock.
Rate control:
* Delay rate control selection until the frame is about to
be queued to the hardware, so retried frames can have their
rate control choices changed. Frames with a static rate
control selection have that applied before each TX, just
to simplify the TX path (ie, not have "static" and "dynamic"
rate control special cased.)
* Teach ath_rate_sample about aggregates - both completion and
errors.
* Add an EWMA for tracking what the current "good" MCS rate is
based on failure rates.
Misc:
* Introduce a bunch of dirty hacks and workarounds so TID mapping
and net80211 frame inspection can be kept out of the net80211
layer. Because of the way this code works (and it's from Atheros
and Linux ath9k), there is a consistent, 1:1 mapping between
TID and AC. So we need to ensure that frames going to a specific
TID will _always_ end up on the right AC, and vice versa, or the
completion/locking will simply get very confused. I plan on
addressing this mess in the future.
Known issues:
* There is no BAR frame transmission just yet. A whole lot of
tidying up needs to occur before BAR frame TX can occur in the
"correct" place - ie, once the TID TX queue has been drained.
* Interface reset/purge/etc results in frames in the TX and RX
queues being removed. This creates holes in the sequence numbers
being assigned and the TX/RX AMPDU code (on either side) just
hangs.
* There's no filtered frame support at the present moment, so
stations going into power saving mode will simply have a number
of frames dropped - likely resulting in a traffic "hang".
* Raw frame TX is going to just not function with 11n aggregation.
Likely this needs to be modified to always override the sequence
number if the frame is going into an aggregation session.
However, general raw frame injection currently doesn't work in
general in net80211, so let's just ignore this for now until
this is sorted out.
* HT protection is just not implemented and won't be until the above
is sorted out. In addition, the AR5416 has issues RTS protecting
large aggregates (anything >8k), so the work around needs to be
ported and tested. Thus, this will be put on hold until the above
work is complete.
* The rate control module 'sample' is the only currently supported
module; onoe/amrr haven't been tested and have likely bit rotted
a little. I'll follow up with some commits to make them work again
for non-11n rates, but they won't be updated to handle 11n and
aggregation. If someone wishes to do so then they're welcome to
send along patches.
* .. and "sample" doesn't really do a good job of 11n TX. Specifically,
the metrics used (packet TX time and failure/success rates) isn't as
useful for 11n. It's likely that it should be extended to take into
account the aggregate throughput possible and then choose a rate
which maximises that. Ie, it may be acceptable for a higher MCS rate
with a higher failure to be used if it gives a more acceptable
throughput/latency then a lower MCS rate @ a lower error rate.
Again, patches will be gratefully accepted.
Because of this, ATH_ENABLE_11N is still not enabled by default.
Sponsored by: Hobnob, Inc.
Obtained from: Linux, Atheros
2011-11-08 17:43:13 -05:00
|
|
|
int min_good_pct; /* min good percentage for a rate to be considered */
|
2005-03-10 20:39:57 -05:00
|
|
|
};
|
|
|
|
|
#define ATH_SOFTC_SAMPLE(sc) ((struct sample_softc *)sc->sc_rc)
|
|
|
|
|
|
|
|
|
|
struct rate_stats {
|
2005-03-19 16:04:53 -05:00
|
|
|
unsigned average_tx_time;
|
2005-03-10 20:39:57 -05:00
|
|
|
int successive_failures;
|
2011-11-08 09:46:03 -05:00
|
|
|
uint64_t tries;
|
|
|
|
|
uint64_t total_packets; /* pkts total since assoc */
|
|
|
|
|
uint64_t packets_acked; /* pkts acked since assoc */
|
Introduce TX aggregation and software TX queue management
for Atheros AR5416 and later wireless devices.
This is a very large commit - the complete history can be
found in the user/adrian/if_ath_tx branch.
Legacy (ie, pre-AR5416) devices also use the per-software
TXQ support and (in theory) can support non-aggregation
ADDBA sessions. However, the net80211 stack doesn't currently
support this.
In summary:
TX path:
* queued frames normally go onto a per-TID, per-node queue
* some special frames (eg ADDBA control frames) are thrown
directly onto the relevant hardware queue so they can
go out before any software queued frames are queued.
* Add methods to create, suspend, resume and tear down an
aggregation session.
* Add in software retransmission of both normal and aggregate
frames.
* Add in completion handling of aggregate frames, including
parsing the block ack bitmap provided by the hardware.
* Write an aggregation function which can assemble frames into
an aggregate based on the selected rate control and channel
configuration.
* The per-TID queues are locked based on their target hardware
TX queue. This matches what ath9k/atheros does, and thus
simplified porting over some of the aggregation logic.
* When doing TX aggregation, stick the sequence number allocation
in the TX path rather than net80211 TX path, and protect it
by the TXQ lock.
Rate control:
* Delay rate control selection until the frame is about to
be queued to the hardware, so retried frames can have their
rate control choices changed. Frames with a static rate
control selection have that applied before each TX, just
to simplify the TX path (ie, not have "static" and "dynamic"
rate control special cased.)
* Teach ath_rate_sample about aggregates - both completion and
errors.
* Add an EWMA for tracking what the current "good" MCS rate is
based on failure rates.
Misc:
* Introduce a bunch of dirty hacks and workarounds so TID mapping
and net80211 frame inspection can be kept out of the net80211
layer. Because of the way this code works (and it's from Atheros
and Linux ath9k), there is a consistent, 1:1 mapping between
TID and AC. So we need to ensure that frames going to a specific
TID will _always_ end up on the right AC, and vice versa, or the
completion/locking will simply get very confused. I plan on
addressing this mess in the future.
Known issues:
* There is no BAR frame transmission just yet. A whole lot of
tidying up needs to occur before BAR frame TX can occur in the
"correct" place - ie, once the TID TX queue has been drained.
* Interface reset/purge/etc results in frames in the TX and RX
queues being removed. This creates holes in the sequence numbers
being assigned and the TX/RX AMPDU code (on either side) just
hangs.
* There's no filtered frame support at the present moment, so
stations going into power saving mode will simply have a number
of frames dropped - likely resulting in a traffic "hang".
* Raw frame TX is going to just not function with 11n aggregation.
Likely this needs to be modified to always override the sequence
number if the frame is going into an aggregation session.
However, general raw frame injection currently doesn't work in
general in net80211, so let's just ignore this for now until
this is sorted out.
* HT protection is just not implemented and won't be until the above
is sorted out. In addition, the AR5416 has issues RTS protecting
large aggregates (anything >8k), so the work around needs to be
ported and tested. Thus, this will be put on hold until the above
work is complete.
* The rate control module 'sample' is the only currently supported
module; onoe/amrr haven't been tested and have likely bit rotted
a little. I'll follow up with some commits to make them work again
for non-11n rates, but they won't be updated to handle 11n and
aggregation. If someone wishes to do so then they're welcome to
send along patches.
* .. and "sample" doesn't really do a good job of 11n TX. Specifically,
the metrics used (packet TX time and failure/success rates) isn't as
useful for 11n. It's likely that it should be extended to take into
account the aggregate throughput possible and then choose a rate
which maximises that. Ie, it may be acceptable for a higher MCS rate
with a higher failure to be used if it gives a more acceptable
throughput/latency then a lower MCS rate @ a lower error rate.
Again, patches will be gratefully accepted.
Because of this, ATH_ENABLE_11N is still not enabled by default.
Sponsored by: Hobnob, Inc.
Obtained from: Linux, Atheros
2011-11-08 17:43:13 -05:00
|
|
|
int ewma_pct; /* EWMA percentage */
|
2005-03-19 16:04:53 -05:00
|
|
|
unsigned perfect_tx_time; /* transmit time for 0 retries */
|
2005-03-10 20:39:57 -05:00
|
|
|
int last_tx;
|
|
|
|
|
};
|
|
|
|
|
|
2008-11-30 14:06:35 -05:00
|
|
|
struct txschedule {
|
|
|
|
|
uint8_t t0, r0; /* series 0: tries, rate code */
|
|
|
|
|
uint8_t t1, r1; /* series 1: tries, rate code */
|
|
|
|
|
uint8_t t2, r2; /* series 2: tries, rate code */
|
|
|
|
|
uint8_t t3, r3; /* series 3: tries, rate code */
|
|
|
|
|
};
|
|
|
|
|
|
2005-03-10 20:39:57 -05:00
|
|
|
/*
|
[ath] [ath_rate] Extend ath_rate_sample to better handle 11n rates and aggregates.
My initial rate control code was .. suboptimal. I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.
So, 8 years later, here's what I should've done back then.
* Firstly, I wasn't at all tracking packet sizes other than the two buckets
(250 and 1600 bytes.) So, extend it to include 4096, 8192, 16384, 32768 and
65536. I may go add 2048 at some point if I find it's useful.
This is important for a few reasons. First, when forming A-MPDU or AMSDU
aggregates the frame sizes are larger, and thus the TX time calculation
is woefully, increasingly wrong. Secondly, the behaviour of 802.11 channels
isn't some fixed thing, both due to channel conditions and radios themselves.
Notably, there was some observations done a few years ago on 11n chipsets
which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
reception as you got further along in the transmit time. It could be due to
a variety of things - transmitter linearity, channel conditions changing,
frequency/phase drift, etc - but the observation was to potentially form
shorter aggregates to improve BER.
* .. and then modify the ath TX path to report the length of the aggregate sent,
so as the statistics kept would line up with the correct bucket.
* Then on the rate control look-up side - i was also only using the first frame
length for an A-MPDU rate control lookup which isn't good enough here.
So, add a new method that walks the TID software queue for that node to
find out what the likely length of data available is. It isn't ALL of the
data in the queue because we'll only ever send enough data to fit inside the
block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
would do.
* .. and cache that in the first ath_buf in the aggregate so it and the eventual
AMPDU length can be returned to the rate control code.
* THEN, modify the rate control code to look at them both when deciding which bucket
to attribute the sent frame on. I'm erring on the side of caution and using the
size bucket that the lookup is based on.
Ok, so now the rate lookups and statistics are "more correct". However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.
So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.
* First, simplify the statistics updates (update_stats()) to do a single pass on
all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
12 would have their average tx time updated for /their/ part of the transmission,
not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
rate attempts. So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
that the statistics noted that /all/ subframes failed at those rates, rather than
the eventual set of transmitted/sent frames. This ensures the EWMA /and/ average
TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
streams. It's a /lot/ slower than MCS7. Also, the reverse is true - if we're at
MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
rate if there weren't any frames yet transmitted. I was defaulting to 25% EWMA and
.. then each comparison would accept the higher rate. Just skip those; sampling
will fill in the details.
So, this seems to work a lot better. It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad. But i'll keep iterating over this and tidying up my hacks.
Ok, so why this still something I'm poking at? rather than porting minstrel_ht?
ath_rate_sample tries to minimise airtime, not maximise throughput. I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures. I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.
Tested:
* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode
2020-05-15 14:51:20 -04:00
|
|
|
* We track performance for eight different packet size buckets.
|
2005-03-10 20:39:57 -05:00
|
|
|
*/
|
[ath] [ath_rate] Extend ath_rate_sample to better handle 11n rates and aggregates.
My initial rate control code was .. suboptimal. I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.
So, 8 years later, here's what I should've done back then.
* Firstly, I wasn't at all tracking packet sizes other than the two buckets
(250 and 1600 bytes.) So, extend it to include 4096, 8192, 16384, 32768 and
65536. I may go add 2048 at some point if I find it's useful.
This is important for a few reasons. First, when forming A-MPDU or AMSDU
aggregates the frame sizes are larger, and thus the TX time calculation
is woefully, increasingly wrong. Secondly, the behaviour of 802.11 channels
isn't some fixed thing, both due to channel conditions and radios themselves.
Notably, there was some observations done a few years ago on 11n chipsets
which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
reception as you got further along in the transmit time. It could be due to
a variety of things - transmitter linearity, channel conditions changing,
frequency/phase drift, etc - but the observation was to potentially form
shorter aggregates to improve BER.
* .. and then modify the ath TX path to report the length of the aggregate sent,
so as the statistics kept would line up with the correct bucket.
* Then on the rate control look-up side - i was also only using the first frame
length for an A-MPDU rate control lookup which isn't good enough here.
So, add a new method that walks the TID software queue for that node to
find out what the likely length of data available is. It isn't ALL of the
data in the queue because we'll only ever send enough data to fit inside the
block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
would do.
* .. and cache that in the first ath_buf in the aggregate so it and the eventual
AMPDU length can be returned to the rate control code.
* THEN, modify the rate control code to look at them both when deciding which bucket
to attribute the sent frame on. I'm erring on the side of caution and using the
size bucket that the lookup is based on.
Ok, so now the rate lookups and statistics are "more correct". However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.
So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.
* First, simplify the statistics updates (update_stats()) to do a single pass on
all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
12 would have their average tx time updated for /their/ part of the transmission,
not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
rate attempts. So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
that the statistics noted that /all/ subframes failed at those rates, rather than
the eventual set of transmitted/sent frames. This ensures the EWMA /and/ average
TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
streams. It's a /lot/ slower than MCS7. Also, the reverse is true - if we're at
MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
rate if there weren't any frames yet transmitted. I was defaulting to 25% EWMA and
.. then each comparison would accept the higher rate. Just skip those; sampling
will fill in the details.
So, this seems to work a lot better. It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad. But i'll keep iterating over this and tidying up my hacks.
Ok, so why this still something I'm poking at? rather than porting minstrel_ht?
ath_rate_sample tries to minimise airtime, not maximise throughput. I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures. I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.
Tested:
* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode
2020-05-15 14:51:20 -04:00
|
|
|
#define NUM_PACKET_SIZE_BINS 7
|
2005-03-10 20:39:57 -05:00
|
|
|
|
[ath] [ath_rate] Extend ath_rate_sample to better handle 11n rates and aggregates.
My initial rate control code was .. suboptimal. I wanted to at least get MCS
rates sent, but it didn't do anywhere near enough to handle low signal level links
or remotely keep accurate statistics.
So, 8 years later, here's what I should've done back then.
* Firstly, I wasn't at all tracking packet sizes other than the two buckets
(250 and 1600 bytes.) So, extend it to include 4096, 8192, 16384, 32768 and
65536. I may go add 2048 at some point if I find it's useful.
This is important for a few reasons. First, when forming A-MPDU or AMSDU
aggregates the frame sizes are larger, and thus the TX time calculation
is woefully, increasingly wrong. Secondly, the behaviour of 802.11 channels
isn't some fixed thing, both due to channel conditions and radios themselves.
Notably, there was some observations done a few years ago on 11n chipsets
which noticed longer aggregates showed an increase in failed A-MPDU sub-frame
reception as you got further along in the transmit time. It could be due to
a variety of things - transmitter linearity, channel conditions changing,
frequency/phase drift, etc - but the observation was to potentially form
shorter aggregates to improve BER.
* .. and then modify the ath TX path to report the length of the aggregate sent,
so as the statistics kept would line up with the correct bucket.
* Then on the rate control look-up side - i was also only using the first frame
length for an A-MPDU rate control lookup which isn't good enough here.
So, add a new method that walks the TID software queue for that node to
find out what the likely length of data available is. It isn't ALL of the
data in the queue because we'll only ever send enough data to fit inside the
block-ack window, so limit how many bytes we return to roughly what ath_tx_form_aggr()
would do.
* .. and cache that in the first ath_buf in the aggregate so it and the eventual
AMPDU length can be returned to the rate control code.
* THEN, modify the rate control code to look at them both when deciding which bucket
to attribute the sent frame on. I'm erring on the side of caution and using the
size bucket that the lookup is based on.
Ok, so now the rate lookups and statistics are "more correct". However, MCS rates
are not the same as 11abg rates in that they're not a monotonically incrementing
set of faster rates and you can't assume that just because a given MCS rate fails,
the next higher one wouldn't work better or be a lower average tx time.
So, I had to do a bunch of surgery to the best rate and sample rate math.
This is the bit that's a WIP.
* First, simplify the statistics updates (update_stats()) to do a single pass on
all rates.
* Next, make sure that each rate average tx time is updated based on /its/ failure/success.
Eg if you sent a frame with { MCS15, MCS12, MCS8 } and MCS8 succeeded, MCS15 and MCS
12 would have their average tx time updated for /their/ part of the transmission,
not the whole transmission.
* Next, EWMA wasn't being fully calculated based on the /failures/ in each of the
rate attempts. So, if MCS15, MCS12 failed above but MCS8 didn't, then ensure
that the statistics noted that /all/ subframes failed at those rates, rather than
the eventual set of transmitted/sent frames. This ensures the EWMA /and/ average
TX time are updated correctly.
* When picking a sample rate and initial rate, probe rates aroud the current MCS
but limit it to MCS0..7 /for all spatial streams/, rather than doing crazy things
like hitting MCS7 and then probing MCS8 - MCS8 is basically MCS0 but two spatial
streams. It's a /lot/ slower than MCS7. Also, the reverse is true - if we're at
MCS8 then don't probe MCS7 as part of it, it's not likely to succeed.
* Fix bugs in pick_best_rate() where I was /immediately/ choosing the highest MCS
rate if there weren't any frames yet transmitted. I was defaulting to 25% EWMA and
.. then each comparison would accept the higher rate. Just skip those; sampling
will fill in the details.
So, this seems to work a lot better. It's not perfect; I'm still seeing a lot of
instability around higher MCS rates because there are bursts of loss/retransmissions
that aren't /too/ bad. But i'll keep iterating over this and tidying up my hacks.
Ok, so why this still something I'm poking at? rather than porting minstrel_ht?
ath_rate_sample tries to minimise airtime, not maximise throughput. I have
extended it with an EWMA based on sub-frame success/failures - high MCS rates
that have partially successful receptions still show super short average frame
times, but a /lot/ of retransmits have to happen for that to work.
So for MCS rates I also track this EWMA and ensure that the rates I'm choosing
don't have super crappy packet failures. I don't mind not getting lower
peak throughput versus minstrel_ht; instead I want to see if I can make "minimise
airtime" work well.
Tested:
* AR9380, STA mode
* AR9344, STA mode
* AR9580, STA/AP mode
2020-05-15 14:51:20 -04:00
|
|
|
static const int packet_size_bins[NUM_PACKET_SIZE_BINS] = { 250, 1600, 4096, 8192, 16384, 32768, 65536 };
|
2012-07-19 21:36:02 -04:00
|
|
|
|
|
|
|
|
static inline int
|
|
|
|
|
bin_to_size(int index)
|
|
|
|
|
{
|
|
|
|
|
return packet_size_bins[index];
|
|
|
|
|
}
|
|
|
|
|
|
2005-03-10 20:39:57 -05:00
|
|
|
/* per-node state */
|
|
|
|
|
struct sample_node {
|
2008-11-30 14:06:35 -05:00
|
|
|
int static_rix; /* rate index of fixed tx rate */
|
2012-07-19 21:41:18 -04:00
|
|
|
#define SAMPLE_MAXRATES 64 /* NB: corresponds to hal info[32] */
|
2012-08-15 03:10:10 -04:00
|
|
|
uint64_t ratemask; /* bit mask of valid rate indices */
|
2008-11-30 14:06:35 -05:00
|
|
|
const struct txschedule *sched; /* tx schedule table */
|
2005-03-10 20:39:57 -05:00
|
|
|
|
2012-02-26 01:04:44 -05:00
|
|
|
const HAL_RATE_TABLE *currates;
|
|
|
|
|
|
2008-11-30 14:06:35 -05:00
|
|
|
struct rate_stats stats[NUM_PACKET_SIZE_BINS][SAMPLE_MAXRATES];
|
|
|
|
|
int last_sample_rix[NUM_PACKET_SIZE_BINS];
|
2005-03-19 16:04:53 -05:00
|
|
|
|
2008-11-30 14:06:35 -05:00
|
|
|
int current_sample_rix[NUM_PACKET_SIZE_BINS];
|
2005-03-10 20:39:57 -05:00
|
|
|
int packets_sent[NUM_PACKET_SIZE_BINS];
|
|
|
|
|
|
2008-11-30 14:06:35 -05:00
|
|
|
int current_rix[NUM_PACKET_SIZE_BINS];
|
2005-03-19 16:04:53 -05:00
|
|
|
int packets_since_switch[NUM_PACKET_SIZE_BINS];
|
2020-05-27 18:48:34 -04:00
|
|
|
int ticks_since_switch[NUM_PACKET_SIZE_BINS];
|
2005-03-19 16:04:53 -05:00
|
|
|
|
|
|
|
|
int packets_since_sample[NUM_PACKET_SIZE_BINS];
|
|
|
|
|
unsigned sample_tt[NUM_PACKET_SIZE_BINS];
|
2005-03-10 20:39:57 -05:00
|
|
|
};
|
2012-07-19 20:47:23 -04:00
|
|
|
|
|
|
|
|
#ifdef _KERNEL
|
|
|
|
|
|
2008-11-30 14:06:35 -05:00
|
|
|
#define ATH_NODE_SAMPLE(an) ((struct sample_node *)&(an)[1])
|
2015-01-27 23:44:42 -05:00
|
|
|
#define IS_RATE_DEFINED(sn, rix) (((uint64_t) (sn)->ratemask & (1ULL<<((uint64_t) rix))) != 0)
|
2005-03-10 20:39:57 -05:00
|
|
|
|
|
|
|
|
#ifndef MIN
|
|
|
|
|
#define MIN(a,b) ((a) < (b) ? (a) : (b))
|
|
|
|
|
#endif
|
|
|
|
|
#ifndef MAX
|
|
|
|
|
#define MAX(a,b) ((a) > (b) ? (a) : (b))
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
#define WIFI_CW_MIN 31
|
|
|
|
|
#define WIFI_CW_MAX 1023
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Calculate the transmit duration of a frame.
|
|
|
|
|
*/
|
|
|
|
|
static unsigned calc_usecs_unicast_packet(struct ath_softc *sc,
|
2011-02-08 13:31:28 -05:00
|
|
|
int length,
|
2011-02-17 00:16:59 -05:00
|
|
|
int rix, int short_retries,
|
|
|
|
|
int long_retries, int is_ht40)
|
|
|
|
|
{
|
2005-03-10 20:39:57 -05:00
|
|
|
const HAL_RATE_TABLE *rt = sc->sc_currates;
|
Replay r286410. Change KPI of how device drivers that provide wireless
connectivity interact with the net80211 stack.
Historical background: originally wireless devices created an interface,
just like Ethernet devices do. Name of an interface matched the name of
the driver that created. Later, wlan(4) layer was introduced, and the
wlanX interfaces become the actual interface, leaving original ones as
"a parent interface" of wlanX. Kernelwise, the KPI between net80211 layer
and a driver became a mix of methods that pass a pointer to struct ifnet
as identifier and methods that pass pointer to struct ieee80211com. From
user point of view, the parent interface just hangs on in the ifconfig
list, and user can't do anything useful with it.
Now, the struct ifnet goes away. The struct ieee80211com is the only
KPI between a device driver and net80211. Details:
- The struct ieee80211com is embedded into drivers softc.
- Packets are sent via new ic_transmit method, which is very much like
the previous if_transmit.
- Bringing parent up/down is done via new ic_parent method, which notifies
driver about any changes: number of wlan(4) interfaces, number of them
in promisc or allmulti state.
- Device specific ioctls (if any) are received on new ic_ioctl method.
- Packets/errors accounting are done by the stack. In certain cases, when
driver experiences errors and can not attribute them to any specific
interface, driver updates ic_oerrors or ic_ierrors counters.
Details on interface configuration with new world order:
- A sequence of commands needed to bring up wireless DOESN"T change.
- /etc/rc.conf parameters DON'T change.
- List of devices that can be used to create wlan(4) interfaces is
now provided by net.wlan.devices sysctl.
Most drivers in this change were converted by me, except of wpi(4),
that was done by Andriy Voskoboinyk. Big thanks to Kevin Lo for testing
changes to at least 8 drivers. Thanks to pluknet@, Oliver Hartmann,
Olivier Cochard, gjb@, mmoll@, op@ and lev@, who also participated in
testing.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2015-08-27 04:56:39 -04:00
|
|
|
struct ieee80211com *ic = &sc->sc_ic;
|
2006-02-09 15:40:28 -05:00
|
|
|
int rts, cts;
|
2005-03-10 20:39:57 -05:00
|
|
|
|
|
|
|
|
unsigned t_slot = 20;
|
|
|
|
|
unsigned t_difs = 50;
|
|
|
|
|
unsigned t_sifs = 10;
|
|
|
|
|
int tt = 0;
|
|
|
|
|
int x = 0;
|
|
|
|
|
int cw = WIFI_CW_MIN;
|
2006-12-13 14:34:35 -05:00
|
|
|
int cix;
|
2005-03-10 20:39:57 -05:00
|
|
|
|
2005-03-19 20:27:33 -05:00
|
|
|
KASSERT(rt != NULL, ("no rate table, mode %u", sc->sc_curmode));
|
|
|
|
|
|
2006-12-13 14:34:35 -05:00
|
|
|
if (rix >= rt->rateCount) {
|
|
|
|
|
printf("bogus rix %d, max %u, mode %u\n",
|
|
|
|
|
rix, rt->rateCount, sc->sc_curmode);
|
2006-02-09 15:40:28 -05:00
|
|
|
return 0;
|
|
|
|
|
}
|
2006-12-13 14:34:35 -05:00
|
|
|
cix = rt->info[rix].controlRate;
|
2006-02-09 15:40:28 -05:00
|
|
|
/*
|
|
|
|
|
* XXX getting mac/phy level timings should be fixed for turbo
|
|
|
|
|
* rates, and there is probably a way to get this from the
|
|
|
|
|
* hal...
|
|
|
|
|
*/
|
|
|
|
|
switch (rt->info[rix].phy) {
|
|
|
|
|
case IEEE80211_T_OFDM:
|
|
|
|
|
t_slot = 9;
|
|
|
|
|
t_sifs = 16;
|
|
|
|
|
t_difs = 28;
|
|
|
|
|
/* fall through */
|
|
|
|
|
case IEEE80211_T_TURBO:
|
2005-03-10 20:39:57 -05:00
|
|
|
t_slot = 9;
|
2006-02-09 15:40:28 -05:00
|
|
|
t_sifs = 8;
|
2005-03-10 20:39:57 -05:00
|
|
|
t_difs = 28;
|
2006-02-09 15:40:28 -05:00
|
|
|
break;
|
2011-02-08 13:31:28 -05:00
|
|
|
case IEEE80211_T_HT:
|
|
|
|
|
t_slot = 9;
|
|
|
|
|
t_sifs = 8;
|
|
|
|
|
t_difs = 28;
|
|
|
|
|
break;
|
2006-02-09 15:40:28 -05:00
|
|
|
case IEEE80211_T_DS:
|
|
|
|
|
/* fall through to default */
|
|
|
|
|
default:
|
|
|
|
|
/* pg 205 ieee.802.11.pdf */
|
|
|
|
|
t_slot = 20;
|
|
|
|
|
t_difs = 50;
|
|
|
|
|
t_sifs = 10;
|
2005-03-10 20:39:57 -05:00
|
|
|
}
|
2005-03-19 16:04:53 -05:00
|
|
|
|
2006-02-09 15:40:28 -05:00
|
|
|
rts = cts = 0;
|
|
|
|
|
|
2005-03-19 16:04:53 -05:00
|
|
|
if ((ic->ic_flags & IEEE80211_F_USEPROT) &&
|
|
|
|
|
rt->info[rix].phy == IEEE80211_T_OFDM) {
|
|
|
|
|
if (ic->ic_protmode == IEEE80211_PROT_RTSCTS)
|
|
|
|
|
rts = 1;
|
|
|
|
|
else if (ic->ic_protmode == IEEE80211_PROT_CTSONLY)
|
|
|
|
|
cts = 1;
|
|
|
|
|
|
|
|
|
|
cix = rt->info[sc->sc_protrix].controlRate;
|
|
|
|
|
}
|
|
|
|
|
|
2006-02-09 15:40:28 -05:00
|
|
|
if (0 /*length > ic->ic_rtsthreshold */) {
|
2005-03-19 16:04:53 -05:00
|
|
|
rts = 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (rts || cts) {
|
2006-12-13 14:34:35 -05:00
|
|
|
int ctsrate;
|
2005-03-19 16:04:53 -05:00
|
|
|
int ctsduration = 0;
|
2006-02-09 15:40:28 -05:00
|
|
|
|
2006-12-13 14:34:35 -05:00
|
|
|
/* NB: this is intentionally not a runtime check */
|
|
|
|
|
KASSERT(cix < rt->rateCount,
|
|
|
|
|
("bogus cix %d, max %u, mode %u\n", cix, rt->rateCount,
|
|
|
|
|
sc->sc_curmode));
|
2006-02-09 15:40:28 -05:00
|
|
|
|
2006-12-13 14:34:35 -05:00
|
|
|
ctsrate = rt->info[cix].rateCode | rt->info[cix].shortPreamble;
|
2005-03-19 16:04:53 -05:00
|
|
|
if (rts) /* SIFS + CTS */
|
|
|
|
|
ctsduration += rt->info[cix].spAckDuration;
|
|
|
|
|
|
[ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
2016-07-15 02:39:35 -04:00
|
|
|
/* XXX assumes short preamble, include SIFS */
|
2011-09-11 05:43:13 -04:00
|
|
|
ctsduration += ath_hal_pkt_txtime(sc->sc_ah, rt, length, rix,
|
[ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
2016-07-15 02:39:35 -04:00
|
|
|
is_ht40, 0, 1);
|
2005-03-19 16:04:53 -05:00
|
|
|
|
|
|
|
|
if (cts) /* SIFS + ACK */
|
|
|
|
|
ctsduration += rt->info[cix].spAckDuration;
|
|
|
|
|
|
|
|
|
|
tt += (short_retries + 1) * ctsduration;
|
|
|
|
|
}
|
2005-03-10 20:39:57 -05:00
|
|
|
tt += t_difs;
|
2011-02-08 13:31:28 -05:00
|
|
|
|
[ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
2016-07-15 02:39:35 -04:00
|
|
|
/* XXX assumes short preamble, include SIFS */
|
2011-09-11 05:43:13 -04:00
|
|
|
tt += (long_retries+1)*ath_hal_pkt_txtime(sc->sc_ah, rt, length, rix,
|
[ath] [ath_hal] break out the duration calculation to optionally include SIFS.
The pre-11n calculations include SIFS, but the 11n ones don't.
The reason is that (mostly) the 11n hardware is doing the SIFS calculation
for us but the pre-11n hardware isn't. This means that we're over-shooting
the times in the duration field for non-11n frames on 11n hardware, which
is OK, if not a little inefficient.
Now, this is all fine for what the hardware needs for doing duration math
for ACK, RTS/CTS, frame length, etc, but it isn't useful for doing PHY
duration calculations. Ie, given a frame to TX and its timestamp, what
would the end of the actual transmission time be; and similar for an
RX timestamp and figuring out its original length.
So, this adds a new field to the duration routines which requests
SIFS or no SIFS to be included. All the callers currently will call
it requesting SIFS, so this /should/ be a glorious no-op. I'm however
planning some future work around airtime fairness and positioning which
requires these routines to have SIFS be optional.
Notably though, the 11n version doesn't do any SIFS addition at the moment.
I'll go and tweak and verify all of the packet durations before I go and
flip that part on.
Tested:
* AR9330, STA mode
* AR9330, AP mode
* AR9380, STA mode
2016-07-15 02:39:35 -04:00
|
|
|
is_ht40, 0, 1);
|
2011-09-11 05:43:13 -04:00
|
|
|
|
2006-02-09 15:40:28 -05:00
|
|
|
tt += (long_retries+1)*(t_sifs + rt->info[rix].spAckDuration);
|
2011-02-08 13:31:28 -05:00
|
|
|
|
2005-03-19 16:04:53 -05:00
|
|
|
for (x = 0; x <= short_retries + long_retries; x++) {
|
2005-03-10 20:39:57 -05:00
|
|
|
cw = MIN(WIFI_CW_MAX, (cw + 1) * 2);
|
|
|
|
|
tt += (t_slot * cw/2);
|
|
|
|
|
}
|
|
|
|
|
return tt;
|
|
|
|
|
}
|
2012-07-19 20:47:23 -04:00
|
|
|
|
|
|
|
|
#endif /* _KERNEL */
|
|
|
|
|
|
2005-03-10 20:39:57 -05:00
|
|
|
#endif /* _DEV_ATH_RATE_SAMPLE_H */
|