Class: ScoutAgent::Lifeline

Inherits:
Object
  • Object
show all
Includes:
Tracked
Defined in:
lib/scout_agent/lifeline.rb

Overview

This class is a monitor for an Agent subprocess of the platform. It launches the Agent code and makes sure it continues to check-in at regular intervals, restarting the subprocess when it fails to do so.

Constant Summary collapse

NO_CONTACT_TIMEOUT =

The number of seconds allowed to pass before the Agent subprocess is considered unresponsive.

10
CHECK_IN_FREQUENCY =

The frequency with which the subprocess is expected to check-in. This is purposely set to a little under a second to give one more check-in possibility before the NO_CONTACT_TIMEOUT is reached.

0.99
TERM_TO_KILL_PAUSE =

The number of seconds the monitor will wait for a process to exit cleanly before forcing a stop.

5
RELAUNCH_FREQUENCIES =

The sequence of seconds this monitor will wait between restarts of the subprocess. The initial values are short, to try and get running again as soon as possible. However, this timeout grows larger up to a point to reduce strain on a server experiencing long term problems. The sequence will reset after a successful relaunch that runs for at least as long as the next number in the sequence (or the max).

[0, 1, 1, 2, 3, 5, 8, 13]

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Tracked

#clear_status, #force_status_database_reload, #status_database, #status_log

Constructor Details

#initialize(agent, log = WireTap.new(nil)) ⇒ Lifeline

Prepares a monitor for the code specified by agent. You may also set log() messages will be appended to.



45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/scout_agent/lifeline.rb', line 45

def initialize(agent, log = WireTap.new(nil))
  @agent                       = agent
  @log                         = log
  @parent_pid                  = Process.pid
  @child_pid                   = nil
  @reader                      = nil
  @writer                      = nil
  @launch_and_monitor_thread   = nil
  @termination_thread          = nil
  @check_in_with_parent_thread = nil
  @code                        = nil
  @last_launch                 = nil
  @relaunch_index              = 0
  
  at_my_exit do
    clear_status
  end
end

Instance Attribute Details

#logObject (readonly)

The log file this monitor writes tracking information to.



67
68
69
# File 'lib/scout_agent/lifeline.rb', line 67

def log
  @log
end

Instance Method Details

#joinObject

Waits for the monitor Thread to be stopped by a natural termination before returning. If terminate() is called to start the shutdown, this method will also wait on the Thread spawned by that method to ensure everything gets the signal to stop.



115
116
117
118
119
120
121
122
# File 'lib/scout_agent/lifeline.rb', line 115

def join
  if Process.pid == @parent_pid and @launch_and_monitor_thread
    @launch_and_monitor_thread.join  # wait on the monitor to stop
    if @termination_thread
      @termination_thread.join       # wait on us to stop the subprocess
    end
  end
end

#launch_and_monitorObject

This method outlines the process used to monitor an Agent. It is roughly: launch, monitor, kill as needed, and restart the process.



73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/scout_agent/lifeline.rb', line 73

def launch_and_monitor
  @launch_and_monitor_thread = Thread.new do
    Thread.current.abort_on_exception = true
    loop do
      wait_for_launch
      prepare_pipe
      launch_child
      close_writer
      monitor_child
      restart_child
    end
  end
end

#terminateObject

Begins a termination of the Agent subprocess in a separate Thread. This monitor’s join() method will also wait on this termination Thread to ensure everything gets the order to shutdown before we exit.



92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/scout_agent/lifeline.rb', line 92

def terminate
  @termination_thread = Thread.new do
    if Process.pid == @parent_pid
      # stop monitoring
      log.info("Stopping the monitoring for '#{@agent}'.")
      @launch_and_monitor_thread.exit if @launch_and_monitor_thread
      # ask child process to exit
      log.info("Asking '#{@agent}' to stop.")
      begin
        IDCard.new(@agent).signal("TERM")
      rescue Errno::ESRCH  # no such process
        # if already exited, so we are fine
      end
    end
  end
end