Class: Gitlab::Database::LoadBalancing::Host
- Inherits:
-
Object
- Object
- Gitlab::Database::LoadBalancing::Host
- Defined in:
- lib/gitlab/database/load_balancing/host.rb
Overview
A single database host used for load balancing.
Constant Summary collapse
- CONNECTION_ERRORS =
[ ActionView::Template::Error, ActiveRecord::StatementInvalid, ActiveRecord::ConnectionNotEstablished, PG::Error ].freeze
- CAN_TRACK_LOGICAL_LSN_QUERY =
This query checks that the current user has permissions before we try and query logical replication status. We also only allow >= PG14 because these views are only accessible to superuser before PG14 even if the has_table_privilege says otherwise.
<<~SQL.squish.freeze SELECT has_table_privilege('pg_replication_origin_status', 'select') AND has_function_privilege('pg_show_replication_origin_status()', 'execute') AND current_setting('server_version_num', true)::int >= 140000 AS allowed SQL
- LATEST_LSN_WITH_LOGICAL_QUERY =
The following is necessary to handle a mix of logical and physical replicas. We assume that if they have pg_replication_origin_status then they are a logical replica. In a logical replica we need to use ‘remote_lsn` rather than `pg_last_wal_replay_lsn` in order for our LSN to be comparable to the source cluster. This logic would be broken if we have 2 logical subscriptions or if we have a logical subscription in the source primary cluster. Read more at gitlab.com/gitlab-org/gitlab/-/merge_requests/121621
<<~SQL.squish.freeze CASE WHEN (SELECT TRUE FROM pg_replication_origin_status) THEN (SELECT remote_lsn FROM pg_replication_origin_status) WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn() ELSE pg_current_wal_insert_lsn() END SQL
- LATEST_LSN_WITHOUT_LOGICAL_QUERY =
<<~SQL.squish.freeze CASE WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn() ELSE pg_current_wal_insert_lsn() END SQL
Instance Attribute Summary collapse
-
#host ⇒ Object
readonly
Returns the value of attribute host.
-
#intervals ⇒ Object
readonly
Returns the value of attribute intervals.
-
#last_checked_at ⇒ Object
readonly
Returns the value of attribute last_checked_at.
-
#load_balancer ⇒ Object
readonly
Returns the value of attribute load_balancer.
-
#pool ⇒ Object
readonly
Returns the value of attribute pool.
-
#port ⇒ Object
readonly
Returns the value of attribute port.
Instance Method Summary collapse
-
#caught_up?(location) ⇒ Boolean
Returns true if this host has caught up to the given transaction write location.
- #check_replica_status? ⇒ Boolean
-
#data_is_recent_enough? ⇒ Boolean
Returns true if the replica has replicated enough data to be useful.
- #database_replica_location ⇒ Object
-
#disconnect!(timeout: 120) ⇒ Object
Disconnects the pool, once all connections are no longer in use.
- #force_disconnect! ⇒ Object
-
#initialize(host, load_balancer, port: nil) ⇒ Host
constructor
host - The address of the database.
- #offline! ⇒ Object
-
#online? ⇒ Boolean
Returns true if the host is online.
- #primary_write_location ⇒ Object
- #query_and_release(sql) ⇒ Object
- #refresh_status ⇒ Object
- #replica_is_up_to_date? ⇒ Boolean
- #replication_lag_below_threshold? ⇒ Boolean
-
#replication_lag_size(location = primary_write_location) ⇒ Object
Returns the number of bytes this secondary is lagging behind the primary.
-
#replication_lag_time ⇒ Object
Returns the replication lag time of this secondary in seconds as a float.
-
#try_disconnect ⇒ Object
Attempt to disconnect the pool if all connections are no longer in use.
Constructor Details
#initialize(host, load_balancer, port: nil) ⇒ Host
host - The address of the database. load_balancer - The LoadBalancer that manages this Host.
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 58 def initialize(host, load_balancer, port: nil) @host = host @port = port @load_balancer = load_balancer @pool = load_balancer.create_replica_connection_pool( load_balancer.configuration.pool_size, host, port ) @online = true @last_checked_at = Time.zone.now # Randomly somewhere in between interval and 2*interval we'll refresh the status of the host interval = load_balancer.configuration.replica_check_interval @intervals = (interval..(interval * 2)).step(0.5).to_a end |
Instance Attribute Details
#host ⇒ Object (readonly)
Returns the value of attribute host.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def host @host end |
#intervals ⇒ Object (readonly)
Returns the value of attribute intervals.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def intervals @intervals end |
#last_checked_at ⇒ Object (readonly)
Returns the value of attribute last_checked_at.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def last_checked_at @last_checked_at end |
#load_balancer ⇒ Object (readonly)
Returns the value of attribute load_balancer.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def load_balancer @load_balancer end |
#pool ⇒ Object (readonly)
Returns the value of attribute pool.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def pool @pool end |
#port ⇒ Object (readonly)
Returns the value of attribute port.
8 9 10 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 8 def port @port end |
Instance Method Details
#caught_up?(location) ⇒ Boolean
Returns true if this host has caught up to the given transaction write location.
location - The transaction write location as reported by a primary.
232 233 234 235 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 232 def caught_up?(location) lag = replication_lag_size(location) lag.present? && lag.to_i <= 0 end |
#check_replica_status? ⇒ Boolean
155 156 157 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 155 def check_replica_status? (Time.zone.now - last_checked_at) >= intervals.sample end |
#data_is_recent_enough? ⇒ Boolean
Returns true if the replica has replicated enough data to be useful.
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 172 def data_is_recent_enough? # It's possible for a replica to not replay WAL data for a while, # despite being up to date. This can happen when a primary does not # receive any writes for a while. # # To prevent this from happening we check if the lag size (in bytes) # of the replica is small enough for the replica to be useful. We # only do this if we haven't replicated in a while so we only need # to connect to the primary when truly necessary. if (lag_size = replication_lag_size) lag_size <= load_balancer.configuration.max_replication_difference else false end end |
#database_replica_location ⇒ Object
218 219 220 221 222 223 224 225 226 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 218 def database_replica_location row = query_and_release(<<-SQL.squish) SELECT pg_last_wal_replay_lsn()::text AS location SQL row['location'] if row.any? rescue *CONNECTION_ERRORS nil end |
#disconnect!(timeout: 120) ⇒ Object
Disconnects the pool, once all connections are no longer in use.
timeout - The time after which the pool should be forcefully
disconnected.
79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 79 def disconnect!(timeout: 120) start_time = ::Gitlab::Metrics::System.monotonic_time while (::Gitlab::Metrics::System.monotonic_time - start_time) <= timeout return if try_disconnect sleep(2) end force_disconnect! end |
#force_disconnect! ⇒ Object
102 103 104 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 102 def force_disconnect! pool.disconnect! end |
#offline! ⇒ Object
106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 106 def offline! ::Gitlab::Database::LoadBalancing::Logger.warn( event: :host_offline, message: 'Marking host as offline', db_host: @host, db_port: @port ) @online = false @pool.disconnect! end |
#online? ⇒ Boolean
Returns true if the host is online.
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 119 def online? return @online unless check_replica_status? was_online = @online refresh_status # Log that the host came back online if it was previously offline if @online && !was_online ::Gitlab::Database::LoadBalancing::Logger.info( event: :host_online, message: 'Host is online after replica status check', db_host: @host, db_port: @port ) # Always log if the host goes offline elsif !@online ::Gitlab::Database::LoadBalancing::Logger.warn( event: :host_offline, message: 'Host is offline after replica status check', db_host: @host, db_port: @port ) end @online rescue *CONNECTION_ERRORS offline! false end |
#primary_write_location ⇒ Object
214 215 216 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 214 def primary_write_location load_balancer.primary_write_location end |
#query_and_release(sql) ⇒ Object
237 238 239 240 241 242 243 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 237 def query_and_release(sql) connection.select_all(sql).first || {} rescue StandardError {} ensure release_connection end |
#refresh_status ⇒ Object
149 150 151 152 153 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 149 def refresh_status @latest_lsn_query = nil # Periodically clear the cached @latest_lsn_query value in case permissions change @online = replica_is_up_to_date? @last_checked_at = Time.zone.now end |
#replica_is_up_to_date? ⇒ Boolean
159 160 161 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 159 def replica_is_up_to_date? replication_lag_below_threshold? || data_is_recent_enough? end |
#replication_lag_below_threshold? ⇒ Boolean
163 164 165 166 167 168 169 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 163 def replication_lag_below_threshold? if (lag_time = replication_lag_time) lag_time <= load_balancer.configuration.max_replication_lag_time else false end end |
#replication_lag_size(location = primary_write_location) ⇒ Object
Returns the number of bytes this secondary is lagging behind the primary.
This method will return nil if no lag size could be calculated.
202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 202 def replication_lag_size(location = primary_write_location) location = connection.quote(location) row = query_and_release(<<-SQL.squish) SELECT pg_wal_lsn_diff(#{location}, (#{latest_lsn_query}))::float AS diff SQL row['diff'].to_i if row.any? rescue *CONNECTION_ERRORS nil end |
#replication_lag_time ⇒ Object
Returns the replication lag time of this secondary in seconds as a float.
This method will return nil if no lag time could be calculated.
192 193 194 195 196 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 192 def replication_lag_time row = query_and_release('SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::float as lag') row['lag'].to_f if row.any? end |
#try_disconnect ⇒ Object
Attempt to disconnect the pool if all connections are no longer in use. Returns true if the pool was disconnected, false if not.
93 94 95 96 97 98 99 100 |
# File 'lib/gitlab/database/load_balancing/host.rb', line 93 def try_disconnect if pool.connections.none?(&:in_use?) pool.disconnect! return true end false end |