tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited

[ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ] This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Kevin(Yudong) Yang <yyd@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
author: Neal Cardwell <ncardwell@google.com> 2022-09-28 16:03:31 -0400
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2022-10-26 12:34:48 +0200
commit: 49d429760df7fd662dd24d54c7325d5b2fd78697 (patch)
tree: 23ffd357a1fd6ca70832b9ae050a1646806c377d /include/net
parent: 19d636b663e0e92951bba5fced929ca7fd25c552 (diff)
1 files changed, 4 insertions, 1 deletions
diff --git a/include/net/tcp.h b/include/net/tcp.h
index d3646645cb9e..d2de3b7788a9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1281,11 +1281,14 @@ static inline bool tcp_is_cwnd_limited(const struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
 
+	if (tp->is_cwnd_limited)
+		return true;
+
 	/* If in slow start, ensure cwnd grows to twice what was ACKed. */
 	if (tcp_in_slow_start(tp))
 		return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out;
 
-	return tp->is_cwnd_limited;
+	return false;
 }
 
 /* BBR congestion control needs pacing.
author	Neal Cardwell <ncardwell@google.com>	2022-09-28 16:03:31 -0400
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-10-26 12:34:48 +0200
commit	49d429760df7fd662dd24d54c7325d5b2fd78697 (patch)
tree	23ffd357a1fd6ca70832b9ae050a1646806c377d /include/net
parent	19d636b663e0e92951bba5fced929ca7fd25c552 (diff)