Class: Karafka::Instrumentation::Vendors::Kubernetes::LivenessListener

Inherits:
BaseListener
  • Object
show all
Defined in:
lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb

Overview

Note:

This listener will bind itself only when Karafka will actually attempt to start and moves from initializing to running. Before that, the TCP server will NOT be active. This is done on purpose to mitigate a case where users would subscribe this listener in ‘karafka.rb` without checking the recommendations of conditional assignment.

Note:

In case of usage within an embedding with Puma, you need to select different port then the one used by Puma itself.

Note:

Please use ‘Kubernetes::SwarmLivenessListener` when operating in the swarm mode

Kubernetes HTTP listener that does not only reply when process is not fully hanging, but also allows to define max time of processing and looping.

Processes like Karafka server can hang while still being reachable. For example, in case something would hang inside of the user code, Karafka could stop polling and no new data would be processed, but process itself would still be active. This listener allows for defining of a ttl that gets bumped on each poll loop and before and after processing of a given messages batch.

Instance Method Summary collapse

Constructor Details

#initialize(hostname: nil, port: 3000, consuming_ttl: 5 * 60 * 1_000, polling_ttl: 5 * 60 * 1_000) ⇒ LivenessListener

Note:

The default TTL matches the default ‘max.poll.interval.ms`

Returns a new instance of LivenessListener.

Parameters:

  • hostname (String, nil) (defaults to: nil)

    hostname or nil to bind on all

  • port (Integer) (defaults to: 3000)

    TCP port on which we want to run our HTTP status server

  • consuming_ttl (Integer) (defaults to: 5 * 60 * 1_000)

    time in ms after which we consider consumption hanging. It allows us to define max consumption time after which k8s should consider given process as hanging

  • polling_ttl (Integer) (defaults to: 5 * 60 * 1_000)

    max time in ms for polling. If polling (any) does not happen that often, process should be considered dead.



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 37

def initialize(
  hostname: nil,
  port: 3000,
  consuming_ttl: 5 * 60 * 1_000,
  polling_ttl: 5 * 60 * 1_000
)
  # If this is set to a symbol, it indicates unrecoverable error like fencing
  # While fencing can be partial (for one of the SGs), we still should consider this
  # as an undesired state for the whole process because it halts processing in a
  # non-recoverable manner forever
  @unrecoverable = false
  @polling_ttl = polling_ttl
  @consuming_ttl = consuming_ttl
  @mutex = Mutex.new
  @pollings = {}
  @consumptions = {}
  super(hostname: hostname, port: port)
end

Instance Method Details

#healthy?String

Did we exceed any of the ttls

Returns:

  • (String)

    204 string if ok, 500 otherwise



132
133
134
135
136
137
138
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 132

def healthy?
  return false if @unrecoverable
  return false if polling_ttl_exceeded?
  return false if consuming_ttl_exceeded?

  true
end

#on_app_running(_event) ⇒ Object

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


57
58
59
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 57

def on_app_running(_event)
  start
end

#on_app_stopped(_event) ⇒ Object

Stop the http server when we stop the process

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


63
64
65
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 63

def on_app_stopped(_event)
  stop
end

#on_connection_listener_fetch_loop(_event) ⇒ Object

Tick on each fetch

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


69
70
71
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 69

def on_connection_listener_fetch_loop(_event)
  mark_polling_tick
end

#on_connection_listener_stopped(_event) ⇒ Object

Deregister the polling tracker for given listener

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


124
125
126
127
128
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 124

def on_connection_listener_stopped(_event)
  return if Karafka::App.done?

  clear_polling_tick
end

#on_connection_listener_stopping(_event) ⇒ Object

Deregister the polling tracker for given listener

Parameters:

  • _event (Karafka::Core::Monitoring::Event)


112
113
114
115
116
117
118
119
120
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 112

def on_connection_listener_stopping(_event)
  # We are interested in disabling tracking for given listener only if it was requested
  # when karafka was running. If we would always clear, it would not catch the shutdown
  # polling requirements. The "running" listener shutdown operations happen only when
  # the manager requests it for downscaling.
  return if Karafka::App.done?

  clear_polling_tick
end

#on_error_occurred(event) ⇒ Object

Parameters:

  • event (Karafka::Core::Monitoring::Event)


95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/karafka/instrumentation/vendors/kubernetes/liveness_listener.rb', line 95

def on_error_occurred(event)
  clear_consumption_tick
  clear_polling_tick

  error = event[:error]

  # We are only interested in the rdkafka errors
  return unless error.is_a?(Rdkafka::RdkafkaError)
  # When any of those occurs, it means something went wrong in a way that cannot be
  # recovered. In such cases we should report that the consumer process is not healthy.
  return unless error.fatal?

  @unrecoverable = error.code
end