Class: Artifactory::Cleaner::Controller

Inherits:
Object
  • Object
show all
Defined in:
lib/artifactory/cleaner/controller.rb

Overview

Artifactory Cleaner Logic Controller

The Artifactory::Cleaner::Controller class provides logic central to Artifactory Cleaner. Artifactory::Cleaner::Controller manages the Artifactory API client, performs searches, discovers artifacts, and more. It is capable of executing tasks in a multi-threaded fashion, making multiple requests to the Artifactory server in parallel.

Defined Under Namespace

Classes: ProcessingQueues

Instance Method Summary collapse

Constructor Details

#initialize(artifactory_config) ⇒ Controller

Initialize and configure a new Artifactory::Cleaner::Controller Params:

artifactory_config

Hash of configuration for the Artifactory client. Used as a splat for a call to Artifactory::Client.new



26
27
28
29
30
31
32
# File 'lib/artifactory/cleaner/controller.rb', line 26

def initialize(artifactory_config)
  @artifactory_client = client = Artifactory::Client.new(**artifactory_config)
  @verbose = false
  initialize_queues
  @workers = []
  @num_workers = 6
end

Instance Method Details

#archive_artifact(artifact, path) ⇒ Object

Download a copy of an artifact to the local filesystem prior to deletion

Given an Artifactory::Resource::Artifact ‘artifact`, download the artifact to the local filesystem directory specified by the `path` param

Note: Downloading an artifact will update the artifact’s last_downloaded date so it may no longer match the same search criteria it originally die (if last_downloaded was used to discover this artifact)

This method is meant to be used prior to calling ‘delete_artifact`



351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
# File 'lib/artifactory/cleaner/controller.rb', line 351

def archive_artifact(artifact, path)
  path = File.dirname(File.join(path, URI.parse(artifact.download_uri).path.split( artifact.repo )[1]))

  debuglog "[DEBUG] downloading #{artifact} (#{artifact.uri}) to #{path}"
  archived_file = nil
  timing = Benchmark.measure do
    archived_file = artifact.download(path)
  end

  debuglog "[DEBUG] #{artifact.uri} #{Util.filesize artifact.size} downloaded in #{timing.real.round(2)} seconds (#{Util.filesize(artifact.size/timing.real)})/s"

  raise ArchiveFileNotWritten, "Failed to write to #{archived_file}" unless File.exist? archived_file
  raise ArchiveFileNotWritten, "Archive file is empty: #{archived_file}" unless File.size? archived_file
  raise ArchiveFileSizeMismatch, "#{path} size mismatch (#{File.size(archived_file)} != #{artifact.size})" unless File.size(archived_file) == artifact.size
end

#artifact_usage_search(from: nil, to: nil, repos: nil, threads: 4) ⇒ Array<Resource::Artifact>

Search for an artifact by its usage

Examples:

Search for all repositories with the given usage statistics

Artifact.usage_search(
  notUsedSince: 1388534400000,
  createdBefore: 1388534400000,
)

Search for all artifacts with the given usage statistics in a repo

Artifact.usage_search(
  notUsedSince: 1388534400000,
  createdBefore: 1388534400000,
  repos: 'libs-release-local',
)

Parameters:

  • options (Hash)

    the list of options to search with

Returns:

  • (Array<Resource::Artifact>)

    a list of artifacts that match the query



178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/artifactory/cleaner/controller.rb', line 178

def artifact_usage_search(from: nil, to: nil, repos: nil, threads: 4)
  to = Time.now if to.nil?

  params = {
    dateFields: 'created,lastModified,lastDownloaded',
    from: from.is_a?(Time) ? from.to_i * 1000 : from.to_i,
    to: to.is_a?(Time) ? to.to_i * 1000 : to.to_i
  }
  repos = repos.compact.join(",") unless repos.nil?
  params[:repos] = repos unless repos.nil?

  result = nil

  debuglog("[DEBUG] Making Artifactory request /api/search/dates for #{params.inspect}")
  timing = {}
  timing[:search] = Benchmark.measure do
    begin
      result = @artifactory_client.get("/api/search/dates", params)
    rescue Artifactory::Error::HTTPError => err
      if err.code == 404
        debuglog "  HTTP 404 Not Found fetching: /api/search/dates -- assuming no assets for this date range"
        result = []
        #Pry::rescued(err) if defined?(Pry::rescue)
      else
        STDERR.puts "HTTP Error while performing an artifact usage search: #{err}"
        STDERR.puts err.full_message
        STDERR.puts "Parameters were: #{params.inspect}"
        STDERR.puts "Caused by #{err.cause.full_message}" if err.cause
        Pry::rescued(err) if defined?(Pry::rescue)
      end
    end
  end
  debuglog("[DEBUG] Got #{result["results"].length} results from search in #{timing[:search].real} seconds") unless result.nil? or result.empty?
  timing[:fetch] = Benchmark.measure do
    if threads > 1
      unless result.nil? or result.empty?
        result = discover_artifacts_from_search(result["results"], threads: threads)
      end
    else
      unless result.nil? or result.empty?
        result = result["results"].map do |artifact|
          a = nil
          retries = 10
          while a.nil? and retries > 0
            begin
              retries -= 1
              a = Artifactory::Cleaner::DiscoveredArtifact.from_url(artifact["uri"], client: @artifactory_client)
              a.last_downloaded = Time.parse(artifact["lastDownloaded"]) unless artifact["lastDownloaded"].to_s.empty?
            rescue Net::OpenTimeout, Artifactory::Error::ConnectionError => err
              STDERR.puts "[WARN] Connection Failure attempting to reach Artifactory API: #{err}"
              debuglog "  Parameters were: #{params.inspect}"
              debuglog "  Caused by #{err.cause.full_message}" if err.cause
              STDERR.puts "  Retrying in 10 seconds" if retries
              sleep 10
            rescue Artifactory::Error::HTTPError => err
              if err.code == 404
                STDERR.puts "[WARN] HTTP 404 Not Found fetching: #{artifact["uri"]}"
                retries = 0
              else
                retries = min(retries, 1)
                STDERR.puts "[ERROR] HTTP Error while fetching an artifact from a usage search: #{err}"
                debuglog err.full_message
                debuglog "  Artifact was: #{artifact.inspect}"
                debuglog "  Parameters were: #{params.inspect}"
                debuglog "  Caused by #{err.cause.full_message}" if err.cause
                Pry::rescued(err) if defined?(Pry::rescue)
                STDERR.puts "  Will retry download once" if retries
              end
            end
          end
          a
        end
      end
      result.compact!
    end
  end
  debuglog("[DEBUG][Perfdata] Artifactory request /api/search/dates timing: #{timing[:search]}")
  debuglog("[DEBUG][Perfdata] Fetching artifacts timing: #{timing[:fetch]}")
  total_time = timing.values.reduce(0) {|s,t| s + t.real}
  debuglog("[DEBUG] #{result.length} artifacts fetched in #{total_time.round 2} seconds")
  result
end

#bucketize_artifacts(from: nil, to: nil, increment: 30 * 24 * 3600, repos: nil, buckets: nil, threads: 4) ⇒ Object



294
295
296
297
298
299
300
# File 'lib/artifactory/cleaner/controller.rb', line 294

def bucketize_artifacts(from: nil, to: nil, increment: 30 * 24 * 3600, repos: nil, buckets: nil, threads: 4)
  buckets = ArtifactBucketCollection.new unless buckets.is_a? ArtifactBucketCollection
  with_discovered_artifacts(from: from, to: to, repos: repos, increment: increment, threads: threads) do |artifact|
    buckets << artifact
  end
  buckets
end

#bucketized_artifact_report(buckets) ⇒ Object

Given a Artifactory::Cleaner::ArtifactBucketCollection, return a String summarizing the contents

TODO: This really should be a method on Artifactory::Cleaner::ArtifactBucketCollection



306
307
308
309
310
311
312
313
314
315
# File 'lib/artifactory/cleaner/controller.rb', line 306

def bucketized_artifact_report(buckets)
  total_size = 0
  total_count = 0
  lines = buckets.map do |bucket|
    total_size += bucket.filesize
    total_count += bucket.length
    "#{bucket.length} artifacts between #{bucket.min} and #{bucket.max} days, totaling #{Artifactory::Cleaner::Util::filesize bucket.filesize}"
  end
  lines << "Total: #{Artifactory::Cleaner::Util::filesize total_size} across #{total_count} artifacts"
end

#catagorize_old_assets(days) ⇒ Object

Deprecated, do not use



383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
# File 'lib/artifactory/cleaner/controller.rb', line 383

def catagorize_old_assets(days)
  buckets = {
      730 => {count: 0, size: 0},
      365 => {count: 0, size: 0},
      180 => {count: 0, size: 0},
      90  => {count: 0, size: 0},
      30  => {count: 0, size: 0},
  }
  discover_repos
  @repos[:local].each_pair do |id,repo|
    begin
      pkgs = 0
      purgable = 0
      timings = Benchmark.bm(12) do |bm|
        debuglog "Searching Repo #{id}:"
        old_packages = nil
        bm.report('api call') {
          old_packages = @artifactory_client.artifact_usage_search(
              notUsedSince: (Time.now.to_i - 24 * 3600 * days) * 1000,
              createdBefore: (Time.now.to_i - 24 * 3600 * days) * 1000,
              repos: id
          )
        }
        debuglog "  Artifactory search returned #{old_packages.length} assets older than #{days}..."
        bm.report('loop') { old_packages.each_with_index do |pkg,i|
          pkgs += 1
          uri = URI(pkg.uri)
          purgable += pkg.size
          # Calculate the age of this package in days and increment the bucket it belongs in
          age = (Time.now - pkg.last_modified)/(3600*24)
          if (bucket = buckets.keys.find {|v| age >= v })
            buckets[bucket][:count] += 1
            buckets[bucket][:size] += pkg.size
          end
          debuglog "  ##{i}: #{File.basename(uri.path)} #{Util.filesize pkg.size} Created #{pkg.created} Modified #{pkg.last_modified}"
        end }
      end
      debuglog "Found #{pkgs} assets from #{id} older than #{days} days totaling #{Util.filesize purgable} in #{timings.reduce {|sum, t| sum + t.real}} seconds"
    rescue => ex
      STDERR.puts "Caught an exception trying to handle repo #{id}: #{ex}"
      STDERR.puts ex.full_message
      STDERR.puts "Caused by #{ex.cause.full_message}" if ex.cause
    end
  end

  buckets.each_pair do |age,bucket|
    debuglog "#{bucket[:count]} packages older than #{age} days, totaling #{Util.filesize bucket[:size]}"
  end
  
  buckets
end

#delete_artifact(artifact) ⇒ Object

Delete an artifact from the Artifactory server

Given an Artifactory::Resource::Artifact ‘artifact`, delete it from the Artifactory server. **This is a destructive operation – use with caution!**

Consider using ‘archive_artifact` first to save artifacts locally

This function writes to the remote Artifactory server (specifically it makes a delete call)



376
377
378
379
# File 'lib/artifactory/cleaner/controller.rb', line 376

def delete_artifact(artifact)
  debuglog "[DEBUG] DELETE Artifact #{artifact} at #{artifact.uri}!"
  artifact.delete
end

#discover_artifacts_from_search(artifact_list, threads: 4) ⇒ Object

Given a list of Artifacts, fetch information about them and return a list of Artifactory::Cleaner::DiscoveredArtifact instances

This is a helper function for #artifact_usage_search

TODO: Document format of the ‘artifact_list` parameter

This method may throw network errors from the underlying Artifactory client

This method is multi-threaded and will spawn workers in order to make multiple concurrent HTTP connections to the Artifactory API. The number of threads can be tuned with the ‘threads` parameter. Be careful not to cause excessive load on the Artifactory API!



96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/artifactory/cleaner/controller.rb', line 96

def discover_artifacts_from_search(artifact_list, threads: 4)
  result = []
  timing = {}
  #kill_threads
  @num_workers = threads
  timing[:enqueue] = Benchmark.measure do
    artifact_list.each {|a| queue_discovery_of_artifact a}
  end

  timing[:dequeue] = Benchmark.measure do
    until @discovery_queues.incoming.empty? and @discovery_queues.outgoing.empty? and not @workers.any? &:working?
      begin
        #debuglog("[DEBUG] Pop from outgoing queue; incoming.len=#{@discovery_queues.incoming.length}, outgoing.len=#{@discovery_queues.outgoing.length}")
        item = @discovery_queues.outgoing.pop
        if item.kind_of? Artifactory::Resource::Artifact
          result << item
          #debuglog "[DEBUG] Discovered #{item} from a child thread"
        elsif item.kind_of? Artifactory::Error::ArtifactoryError
          STDERR.puts "[ERROR] Artifactory Error from artifact fetch: #{item}"
          STDERR.puts item.full_message
          STDERR.puts "Caused by #{item.cause.full_message}" if item.cause
        elsif item.kind_of? Error
          STDERR.puts "[ERROR] Error from artifact fetch: #{item}"
          STDERR.puts item.full_message
          STDERR.puts "Caused by #{item.cause.full_message}" if item.cause
        elsif !item.nil?
          STDERR.puts "[ERROR] Got #{item} back from the discovery queue, expected an Artifactory::Resource::Artifact"
        end
      rescue => processing_ex
        STDERR.puts "[ERROR] Caught an exception when processing from the outgoing discovery queue: #{processing_ex}"
        STDERR.puts processing_ex.full_message
        STDERR.puts "Caused by #{processing_ex.cause.full_message}" if processing_ex.cause
      end
    end
  end

  begin
    kill_threads
  rescue => ex
    STDERR.puts "[ERROR] Caught an exception when killing threads: #{ex}"
    STDERR.puts ex.full_message
    STDERR.puts "Caused by #{ex.cause.full_message}" if ex.cause
  end

  debuglog("[DEBUG][Perfdata] Enqueue URLs for workers to discover: #{timing[:enqueue]}")
  debuglog("[DEBUG][Perfdata] Dequeue found Artifacts from workers: #{timing[:dequeue]}")
  total_time = timing.values.reduce(0) {|s,t| s + t.real}
  debuglog("[DEBUG] #{result.length} artifacts fetched in #{total_time.round 2} seconds")
  result
end

#discover_reposObject

Return an ordered structure of repositories from the Artifactory server.

This method will query Artifactory and fetch information about all available repositories. The result returned is a Hash with three keys, one for each repo type: ‘:local`, `:remote` and `:virtual` Under each of these keys is a hash mapping repo keys to their Artifactory::Resource::Repository objects

This method may raise network errors from the underlying Artifactory client

This method is not multi-threaded



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/artifactory/cleaner/controller.rb', line 57

def discover_repos
  timing = {}
  @repos = {
      local: {},
      remote: {},
      virtual: {},
  }
  i = 0
  timing[:loop] = Benchmark.measure do
    @artifactory_client.repository_all.each do |repo|
      debuglog "[DEBUG] Found #{repo.package_type} repo: #{repo.key}"
      if repo.rclass == 'remote' && repo.url
        debuglog " +-> repo #{repo.key} is a mirror of remote at #{repo.url}"
        @repos[:remote][repo.key] = repo
      elsif repo.rclass == 'virtual' && repo.repositories
        debuglog " +-> repo #{repo.key} is a virtual repo containing: #{repo.repositories.join ', '}"
        @repos[:remote][repo.key] = repo
      else
        @repos[:local][repo.key] = repo
      end
      i += 1
    end
  end
  debuglog("[DEBUG][Perfdata] Fetched #{i} repos; timing: #{timing[:loop]}")
  @repos
end

#verbose=(val) ⇒ Object

Enable or disable verbose mode (see Controller#verbose?) When verbose mode is enabled, the controller will print debugging and status information to STDERR



43
44
45
# File 'lib/artifactory/cleaner/controller.rb', line 43

def verbose=(val)
  @verbose = !!val
end

#verbose?Boolean

Is verbose output enabled? If so, the controller will print debugging and status information to STDERR

Returns:

  • (Boolean)


36
37
38
# File 'lib/artifactory/cleaner/controller.rb', line 36

def verbose?
  @verbose
end

#with_discovered_artifacts(from: nil, to: nil, repos: nil, increment: 30 * 24 * 3600, threads: 4) ⇒ Object

Iterator method for an artifact search

the ‘with_discovered_artifacts` method is used to iterate over artifacts from a search which potentially covers a large period of time. This method will break the period up into small chunks of time defined by the `increment` argument (defaulting to 30 days) and will perform multiple searches to avoid large searches which may time out or overload the Artifactory server.

Pass a block and the block will be called with every Artifactory::Cleaner::DiscoveredArtifact that is found

This method is not mult-threaded however it calls artifact_usage_search which is multi-threaded; number of threads is controlled by the ‘threads` argument

This method calls artifact_usage_search which may raise network exceptions

Params:

from

Time instance for the start date of the search

to

Time instance for the end date of the search; defaults to Time.now

repos

Optional array of repository names to search within; searches all repositories if omitted

increment

Integer number of seconds to chunk the search period into, defaults to 30 days

threads

Number of threads to use to fetch artifacts; defayult is 4 (passed to artifact_usage_search)



282
283
284
285
286
287
288
289
290
291
292
# File 'lib/artifactory/cleaner/controller.rb', line 282

def with_discovered_artifacts(from: nil, to: nil, repos: nil, increment: 30 * 24 * 3600, threads: 4)
  chunk_end = to || Time.now
  while chunk_end > from
    chunk_start = chunk_end - increment
    chunk_start = from if chunk_start < from
    artifact_usage_search(from: chunk_start, to: chunk_end, repos: repos, threads: threads).each do |pkg|
      yield pkg
    end
    chunk_end = chunk_start
  end
end

#yaml_format(artifact, indent = 0) ⇒ Object

Return a YAML representation of a module Artifactory::Cleaner::DiscoveredArtifact

Provide a Artifactory::Cleaner::DiscoveredArtifact and this method will return a String containing a YAML representation of the properties of the DiscoveredArtifact. If the ‘indent` parameter is provided, then a YAML fragment will be returned, indented by `indent` spaces. This allows for “streaming” a list of Artifact YAML to an IOStream



324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
# File 'lib/artifactory/cleaner/controller.rb', line 324

def yaml_format(artifact, indent = 0)
  properties = [:uri, :last_downloaded, :repo, :created, :last_modified, :last_updated, :download_uri, :mime_type, :size, :checksums ]
  result = YAML.dump(properties.each_with_object({}) {|prop,export| export[prop] = artifact.send(prop) })
  if indent
    i = 0
    result.each_line.reduce('') do |str,line|
      if (i += 1) > 2
        str + (' ' * indent) + line
      elsif i == 2
          str + line
      else
        str
      end
    end
  end
end