Class: RedisFailover::NodeManager

Inherits:
Object
  • Object
show all
Includes:
Util
Defined in:
lib/redis_failover/node_manager.rb

Overview

NodeManager manages a list of redis nodes. Upon startup, the NodeManager will discover the current redis master and slaves. Each redis node is monitored by a NodeWatcher instance. The NodeWatchers periodically report the current state of the redis node it's watching to the NodeManager. The NodeManager processes the state reports and reacts appropriately by handling stale/dead nodes, and promoting a new redis master if it sees fit to do so.

Constant Summary

TIMEOUT =

Number of seconds to wait before retrying bootstrap process.

5
CHECK_INTERVAL =

Number of seconds for checking node snapshots.

5
MAX_PROMOTION_ATTEMPTS =

Number of max attempts to promote a master before releasing master lock.

3
LATENCY_THRESHOLD =

Latency threshold for recording node state.

0.5
NODE_DISCOVERY_ERRORS =

Errors that can happen during the node discovery process.

[
  InvalidNodeRoleError,
  NodeUnavailableError,
  NoMasterError,
  MultipleMastersError
].freeze

Constants included from Util

Util::CONNECTIVITY_ERRORS, Util::DEFAULT_ROOT_ZNODE_PATH, Util::REDIS_ERRORS, Util::REDIS_READ_OPS, Util::UNSUPPORTED_OPS, Util::ZK_ERRORS

Instance Method Summary collapse

Methods included from Util

#decode, #different?, #encode, logger, #logger, logger=, #symbolize_keys

Constructor Details

#initialize(options) ⇒ NodeManager

Creates a new instance.

Options Hash (options):

  • :zkservers (String)

    comma-separated ZK host:port pairs

  • :znode_path (String)

    znode path override for redis nodes

  • :password (String)

    password for redis nodes

  • :nodes (Array<String>)

    the nodes to manage

  • :max_failures (String)

    the max failures for a node



37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/redis_failover/node_manager.rb', line 37

def initialize(options)
  logger.info("Redis Node Manager v#{VERSION} starting (#{RUBY_DESCRIPTION})")
  @options = options
  @required_node_managers = options.fetch(:required_node_managers, 1)
  @root_znode = options.fetch(:znode_path, Util::DEFAULT_ROOT_ZNODE_PATH)
  @node_strategy = NodeStrategy.for(options.fetch(:node_strategy, :majority))
  @failover_strategy = FailoverStrategy.for(options.fetch(:failover_strategy, :latency))
  @nodes = Array(@options[:nodes]).map { |opts| Node.new(opts) }.uniq
  @master_manager = false
  @master_promotion_attempts = 0
  @sufficient_node_managers = false
  @lock = Monitor.new
  @shutdown = false
end

Instance Method Details

#notify_state(node, state, latency = nil) ⇒ Object

Notifies the manager of a state change. Used primarily by RedisFailover::NodeWatcher to inform the manager of watched node states.



78
79
80
81
82
83
84
85
86
87
# File 'lib/redis_failover/node_manager.rb', line 78

def notify_state(node, state, latency = nil)
  @lock.synchronize do
    if running?
      update_current_state(node, state, latency)
    end
  end
rescue => ex
  logger.error("Error handling state report #{[node, state].inspect}: #{ex.inspect}")
  logger.error(ex.backtrace.join("\n"))
end

#resetObject

Performs a reset of the manager.



90
91
92
93
94
# File 'lib/redis_failover/node_manager.rb', line 90

def reset
  @master_manager = false
  @master_promotion_attempts = 0
  @watchers.each(&:shutdown) if @watchers
end

#shutdownObject

Initiates a graceful shutdown.



97
98
99
100
101
102
103
104
105
# File 'lib/redis_failover/node_manager.rb', line 97

def shutdown
  logger.info('Shutting down ...')
  @lock.synchronize do
    @shutdown = true
  end

  reset
  exit
end

#startObject

Note:

This method does not return until the manager terminates.

Starts the node manager.



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/redis_failover/node_manager.rb', line 55

def start
  return unless running?
  setup_zk
  spawn_watchers
  wait_until_master
rescue *ZK_ERRORS => ex
  logger.error("ZK error while attempting to manage nodes: #{ex.inspect}")
  reset
  sleep(TIMEOUT)
  retry
rescue NoMasterError
  logger.error("Failed to promote a new master after #{MAX_PROMOTION_ATTEMPTS} attempts.")
  reset
  sleep(TIMEOUT)
  retry
end