mirror of
https://github.com/redis/redis.git
synced 2026-02-11 14:53:19 -05:00
This PR is based on: https://github.com/redis/redis/pull/12109 https://github.com/valkey-io/valkey/pull/60 Closes: https://github.com/redis/redis/issues/11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: https://github.com/valkey-io/valkey/pull/837 https://github.com/valkey-io/valkey/pull/1173 https://github.com/valkey-io/valkey/pull/804 https://github.com/valkey-io/valkey/pull/945 https://github.com/valkey-io/valkey/pull/989 --------- Co-authored-by: Yuan Wang <wangyuancode@163.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Moti Cohen <moticless@gmail.com> Co-authored-by: naglera <anagler123@gmail.com> Co-authored-by: Amit Nagler <58042354+naglera@users.noreply.github.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Ping Xie <pingxie@outlook.com> Co-authored-by: Ran Shidlansik <ranshid@amazon.com> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com> Co-authored-by: xbasel <103044017+xbasel@users.noreply.github.com>
165 lines
6.5 KiB
Tcl
165 lines
6.5 KiB
Tcl
#
|
|
# Copyright (c) 2009-Present, Redis Ltd.
|
|
# All rights reserved.
|
|
#
|
|
# Copyright (c) 2024-present, Valkey contributors.
|
|
# All rights reserved.
|
|
#
|
|
# Licensed under your choice of the Redis Source Available License 2.0
|
|
# (RSALv2) or the Server Side Public License v1 (SSPLv1).
|
|
#
|
|
# Portions of this file are available under BSD3 terms; see REDISCONTRIBUTIONS for more information.
|
|
#
|
|
|
|
# Creates a master-slave pair and breaks the link continuously to force
|
|
# partial resyncs attempts, all this while flooding the master with
|
|
# write queries.
|
|
#
|
|
# You can specify backlog size, ttl, delay before reconnection, test duration
|
|
# in seconds, and an additional condition to verify at the end.
|
|
#
|
|
# If reconnect is > 0, the test actually try to break the connection and
|
|
# reconnect with the master, otherwise just the initial synchronization is
|
|
# checked for consistency.
|
|
proc test_psync {descr duration backlog_size backlog_ttl delay cond mdl sdl reconnect rdbchannel} {
|
|
start_server {tags {"repl"} overrides {save {}}} {
|
|
start_server {overrides {save {}}} {
|
|
|
|
set master [srv -1 client]
|
|
set master_host [srv -1 host]
|
|
set master_port [srv -1 port]
|
|
set slave [srv 0 client]
|
|
|
|
$master config set repl-backlog-size $backlog_size
|
|
$master config set repl-backlog-ttl $backlog_ttl
|
|
$master config set repl-diskless-sync $mdl
|
|
$master config set repl-diskless-sync-delay 1
|
|
$master config set repl-rdb-channel $rdbchannel
|
|
$slave config set repl-diskless-load $sdl
|
|
$slave config set repl-rdb-channel $rdbchannel
|
|
|
|
set load_handle0 [start_bg_complex_data $master_host $master_port 9 100000]
|
|
set load_handle1 [start_bg_complex_data $master_host $master_port 11 100000]
|
|
set load_handle2 [start_bg_complex_data $master_host $master_port 12 100000]
|
|
|
|
test {Slave should be able to synchronize with the master} {
|
|
$slave slaveof $master_host $master_port
|
|
wait_for_condition 50 100 {
|
|
[lindex [r role] 0] eq {slave} &&
|
|
[lindex [r role] 3] eq {connected}
|
|
} else {
|
|
fail "Replication not started."
|
|
}
|
|
}
|
|
|
|
# Check that the background clients are actually writing.
|
|
test {Detect write load to master} {
|
|
wait_for_condition 50 1000 {
|
|
[$master dbsize] > 100
|
|
} else {
|
|
fail "Can't detect write load from background clients."
|
|
}
|
|
}
|
|
|
|
test "Test replication partial resync: $descr (diskless: $mdl, $sdl, reconnect: $reconnect, rdbchannel: $rdbchannel)" {
|
|
# Now while the clients are writing data, break the maste-slave
|
|
# link multiple times.
|
|
if ($reconnect) {
|
|
for {set j 0} {$j < $duration*10} {incr j} {
|
|
after 100
|
|
# catch {puts "MASTER [$master dbsize] keys, REPLICA [$slave dbsize] keys"}
|
|
|
|
if {($j % 20) == 0} {
|
|
catch {
|
|
if {$delay} {
|
|
$slave multi
|
|
$slave client kill $master_host:$master_port
|
|
$slave debug sleep $delay
|
|
$slave exec
|
|
} else {
|
|
$slave client kill $master_host:$master_port
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
stop_bg_complex_data $load_handle0
|
|
stop_bg_complex_data $load_handle1
|
|
stop_bg_complex_data $load_handle2
|
|
|
|
# Wait for the slave to reach the "online"
|
|
# state from the POV of the master.
|
|
set retry 5000
|
|
while {$retry} {
|
|
set info [$master info]
|
|
if {[string match {*slave0:*state=online*} $info]} {
|
|
break
|
|
} else {
|
|
incr retry -1
|
|
after 100
|
|
}
|
|
}
|
|
if {$retry == 0} {
|
|
error "assertion:Slave not correctly synchronized"
|
|
}
|
|
|
|
# Wait that slave acknowledge it is online so
|
|
# we are sure that DBSIZE and DEBUG DIGEST will not
|
|
# fail because of timing issues. (-LOADING error)
|
|
wait_for_condition 5000 100 {
|
|
[lindex [$slave role] 3] eq {connected}
|
|
} else {
|
|
fail "Slave still not connected after some time"
|
|
}
|
|
|
|
wait_for_condition 100 100 {
|
|
[$master debug digest] == [$slave debug digest]
|
|
} else {
|
|
set csv1 [csvdump r]
|
|
set csv2 [csvdump {r -1}]
|
|
set fd [open /tmp/repldump1.txt w]
|
|
puts -nonewline $fd $csv1
|
|
close $fd
|
|
set fd [open /tmp/repldump2.txt w]
|
|
puts -nonewline $fd $csv2
|
|
close $fd
|
|
fail "Master - Replica inconsistency, Run diff -u against /tmp/repldump*.txt for more info"
|
|
}
|
|
assert {[$master dbsize] > 0}
|
|
eval $cond
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
tags {"external:skip"} {
|
|
foreach mdl {no yes} {
|
|
foreach sdl {disabled swapdb} {
|
|
foreach rdbchannel {yes no} {
|
|
if {$rdbchannel == "yes" && $mdl == "no"} {
|
|
# rdbchannel replication requires repl-diskless-sync enabled
|
|
continue
|
|
}
|
|
|
|
test_psync {no reconnection, just sync} 6 1000000 3600 0 {
|
|
} $mdl $sdl 0 $rdbchannel
|
|
|
|
test_psync {ok psync} 6 100000000 3600 0 {
|
|
assert {[s -1 sync_partial_ok] > 0}
|
|
} $mdl $sdl 1 $rdbchannel
|
|
|
|
test_psync {no backlog} 6 100 3600 0.5 {
|
|
assert {[s -1 sync_partial_err] > 0}
|
|
} $mdl $sdl 1 $rdbchannel
|
|
|
|
test_psync {ok after delay} 3 100000000 3600 3 {
|
|
assert {[s -1 sync_partial_ok] > 0}
|
|
} $mdl $sdl 1 $rdbchannel
|
|
|
|
test_psync {backlog expired} 3 100000000 1 3 {
|
|
assert {[s -1 sync_partial_err] > 0}
|
|
} $mdl $sdl 1 $rdbchannel
|
|
}
|
|
}
|
|
}
|
|
}
|