Module: Maid::Tools

Includes:
Deprecated
Defined in:
lib/maid/tools.rb

Overview

These "tools" are methods available in the Maid DSL.

In general, methods are expected to:

  • Automatically expand paths (that is, '~/Downloads/foo.zip' becomes '/home/username/Downloads/foo.zip')
  • Respect the noop (dry-run) option if it is set

Some methods are not available on all platforms. An ArgumentError is raised when a command is not available. See tags such as: [Mac OS X]

Instance Method Summary collapse

Instance Method Details

#accessed_at(path) ⇒ Object

Get the time that a file was last accessed.

In Unix speak, atime.

Examples

accessed_at('foo.zip') # => Sat Apr 09 10:50:01 -0400 2011


519
520
521
# File 'lib/maid/tools.rb', line 519

def accessed_at(path)
  File.atime(expand(path))
end

#checksum_of(path) ⇒ Object

Get a checksum for a file.

Examples

checksum_of('foo.zip') # => "67258d750ca654d5d3c7b06bd2a1c792ced2003e"


557
558
559
# File 'lib/maid/tools.rb', line 557

def checksum_of(path)
  Digest::SHA1.hexdigest(File.read(path))
end

#content_types(path) ⇒ Object

Get the content types of a path.

Content types can be MIME types, Internet media types or Spotlight content types (OS X only).

Examples

content_types('foo.zip') # => ["public.zip-archive", "com.pkware.zip-archive", "public.archive", "application/zip", "application"]
content_types('bar.jpg') # => ["public.jpeg", "public.image", "image/jpeg", "image"]


647
648
649
# File 'lib/maid/tools.rb', line 647

def content_types(path)
  [spotlight_content_types(path), mime_type(path), media_type(path)].flatten
end

#copy(sources, destination) ⇒ Object

Copy from sources to destination

The path is not copied if a file already exists at the destination with the same name. A warning is logged instead. Note: Similar functionality is provided by the sync tool, but this requires installation of the rsync binary

Examples

Single path:

copy('~/Downloads/foo.zip', '~/Archive/Software/Mac OS X/')

Multiple paths:

copy(['~/Downloads/foo.zip', '~/Downloads/bar.zip'], '~/Archive/Software/Mac OS X/')
copy(dir('~/Downloads/*.zip'), '~/Archive/Software/Mac OS X/')


163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/maid/tools.rb', line 163

def copy(sources, destination)
  destination = expand(destination)

  expand_all(sources).each do |source|
      target = File.join(destination, File.basename(source))

    unless File.exist?(target)
      log("cp #{ sh_escape(source) } #{ sh_escape(destination) }")
      FileUtils.cp(source, destination, @file_options)
    else
      warn("skipping copy because #{ sh_escape(source) } because #{ sh_escape(target) } already exists")
    end
  end
end

#created_at(path) ⇒ Object

Get the creation time of a file.

In Unix speak, ctime.

Examples

created_at('foo.zip') # => Sat Apr 09 10:50:01 -0400 2011


508
509
510
# File 'lib/maid/tools.rb', line 508

def created_at(path)
  File.ctime(expand(path))
end

#dimensions_px(path) ⇒ Object

Determine the dimensions of GIF, PNG, JPEG, or TIFF images.

Value returned is [width, height].

Examples

dimensions_px('image.jpg') # => [1024, 768]
width, height = dimensions_px('image.jpg')
dimensions_px('image.jpg').join('x') # => "1024x768"


442
443
444
# File 'lib/maid/tools.rb', line 442

def dimensions_px(path)
  Dimensions.dimensions(path)
end

#dir(globs) ⇒ Object

Give all files matching the given glob.

Note that the globs are not regexps (they're closer to shell globs). However, some regexp-like notation can be used, e.g. ?, [a-z], {tgz,zip}. For more details, see Ruby's documentation on Dir.glob.

The matches are sorted lexically to aid in readability when using --dry-run.

Examples

Single glob:

dir('~/Downloads/*.zip')

Specifying multiple extensions succinctly:

dir('~/Downloads/*.{exe,deb,dmg,pkg,rpm}')

Multiple glob (all are equivalent):

dir(['~/Downloads/*.zip', '~/Dropbox/*.zip'])
dir(%w(~/Downloads/*.zip ~/Dropbox/*.zip))
dir('~/{Downloads,Dropbox}/*.zip')

Recursing into subdirectories (see also: find):

dir('~/Music/**/*.m4a')


240
241
242
243
244
245
# File 'lib/maid/tools.rb', line 240

def dir(globs)
  expand_all(globs).
    map { |glob| Dir.glob(glob) }.
    flatten.
    sort
end

#dir_safe(globs) ⇒ Object

Same as dir, but excludes files that are (possibly) being downloaded.

Example

Move Debian/Ubuntu packages that are finished downloading into a software directory.

move dir_safe('~/Downloads/*.deb'), '~/Archive/Software'


256
257
258
259
# File 'lib/maid/tools.rb', line 256

def dir_safe(globs)
  dir(globs).
    reject { |path| downloading?(path) }
end

#disk_usage(path) ⇒ Object

Calculate disk usage of a given path in kilobytes.

See also: Maid::NumericExtensions::SizeToKb.

Examples

disk_usage('foo.zip') # => 136


489
490
491
492
493
494
495
496
497
498
499
# File 'lib/maid/tools.rb', line 489

def disk_usage(path)
  raw = cmd("du -s #{ sh_escape(path) }")
  # FIXME: This reports in kilobytes, but should probably report in bytes.
  usage_kb = raw.split(/\s+/).first.to_i

  if usage_kb.zero?
    raise "Stopping pessimistically because of unexpected value from du (#{ raw.inspect })"
  else
    usage_kb
  end
end

#downloaded_from(path) ⇒ Object

[Mac OS X] Use Spotlight metadata to determine the site from which a file was downloaded.

Examples

downloaded_from('foo.zip') # => ['http://www.site.com/foo.zip', 'http://www.site.com/']


355
356
357
# File 'lib/maid/tools.rb', line 355

def downloaded_from(path)
  mdls_to_array(path, 'kMDItemWhereFroms')
end

#downloading?(path) ⇒ Boolean

Detect whether the path is currently being downloaded in Chrome, Firefox or Safari.

See also: dir_safe



362
363
364
# File 'lib/maid/tools.rb', line 362

def downloading?(path)
  !!(chrome_downloading?(path) || firefox_downloading?(path) || safari_downloading?(path))
end

#dupes_in(globs) ⇒ Object

Find all duplicate files in the given globs.

More often than not, you'll want to use newest_dupes_in or verbose_dupes_in instead of using this method directly.

Globs are expanded as in dir, then all non-files are filtered out. The remaining files are compared by size, and non-dupes are filtered out. The remaining candidates are then compared by checksum. Dupes are returned as an array of arrays.

Examples

dupes_in('~/{Downloads,Desktop}/*') # => [
                                           ['~/Downloads/foo.zip', '~/Downloads/foo (1).zip'],
                                           ['~/Desktop/bar.txt', '~/Desktop/bar copy.txt']
                                         ]

Keep the newest dupe:

dupes_in('~/Desktop/*', '~/Downloads/*').each do |dupes|
  trash dupes.sort_by { |p| File.mtime(p) }[0..-2]
end


389
390
391
392
393
394
395
396
397
398
399
400
401
# File 'lib/maid/tools.rb', line 389

def dupes_in(globs)
  dupes = []
  files(globs).                           # Start by filtering out non-files
    group_by { |f| size_of(f) }.          # ... then grouping by size, since that's fast
    reject { |s, p| p.length < 2 }.       # ... and filter out any non-dupes
    map do |size, candidates|
      dupes += candidates.
        group_by { |p| checksum_of(p) }.  # Now group our candidates by a slower checksum calculation
        reject { |c, p| p.length < 2 }.   # ... and filter out any non-dupes
        values
    end
  dupes
end

#duration_s(path) ⇒ Object

[Mac OS X] Use Spotlight metadata to determine audio length.

Examples

duration_s('foo.mp3') # => 235.705


466
467
468
# File 'lib/maid/tools.rb', line 466

def duration_s(path)
  cmd("mdls -raw -name kMDItemDurationSeconds #{ sh_escape(path) }").to_f
end

#escape_glob(glob) ⇒ Object

Escape characters that have special meaning as a part of path global patterns.

Useful when using dir with file names that may contain { } [ ] characters.

Example

escape_glob('test [tmp]') # => 'test \\[tmp\\]'


277
278
279
# File 'lib/maid/tools.rb', line 277

def escape_glob(glob)
  glob.gsub(/[\{\}\[\]]/) { |s| '\\' + s }
end

#files(globs) ⇒ Object

Give only files matching the given glob.

This is the same as dir but only includes actual files (no directories or symlinks).



265
266
267
268
# File 'lib/maid/tools.rb', line 265

def files(globs)
  dir(globs).
    select { |f| File.file?(f) }
end

#find(path, &block) ⇒ Object

Find matching files, akin to the Unix utility find.

If no block is given, it will return an array. Otherwise, it acts like Find.find.

Examples

Without a block:

find('~/Downloads/') # => [...]

Recursing and filtering using a regular expression:

find('~/Downloads/').grep(/\.pdf$/)

(Note: It's just Ruby, so any methods in Array and Enumerable can be used.)

Recursing with a block:

find('~/Downloads/') do |path|
  # ...
end


329
330
331
332
333
334
335
336
337
# File 'lib/maid/tools.rb', line 329

def find(path, &block)
  expanded_path = expand(path)

  if block.nil?
    Find.find(expanded_path).to_a
  else
    Find.find(expanded_path, &block)
  end
end

#git_piston(path) ⇒ Object

Deprecated.

Pull and push the git repository at the given path.

Since this is deprecated, you might also be interested in SparkleShare, a great git-based file syncronization project.

Examples

git_piston('~/code/projectname')


571
572
573
574
575
# File 'lib/maid/tools.rb', line 571

def git_piston(path)
  full_path = expand(path)
  stdout = cmd("cd #{ sh_escape(full_path) } && git pull && git push 2>&1")
  log("Fired git piston on #{ sh_escape(full_path) }.  STDOUT:\n\n#{ stdout }")
end

#ignore_child_dirs(arr) ⇒ Object

Given an array of directories, return a new array without any child directories whose parent is already present in that array.

Example

ignore_child_dirs(["foo", "foo/a", "foo/b", "bar"]) # => ["foo", "bar"]


749
750
751
752
753
754
755
# File 'lib/maid/tools.rb', line 749

def ignore_child_dirs(arr)
  arr.sort { |x, y|
    y.count('/') - x.count('/')
  }.select { |d|
    !arr.include?(File.dirname(d))
  }
end

#last_accessed(path) ⇒ Object

Deprecated.

Alias of accessed_at.



526
527
528
529
# File 'lib/maid/tools.rb', line 526

def last_accessed(path)
  # Not a normal `alias` so the deprecation notice shows in the docs.
  accessed_at(path)
end

#locate(name) ⇒ Object

[Mac OS X] Use Spotlight to locate all files matching the given filename.

[Ubuntu] Use locate to locate all files matching the given filename.

Examples

locate('foo.zip') # => ['/a/foo.zip', '/b/foo.zip']


346
347
348
# File 'lib/maid/tools.rb', line 346

def locate(name)
  cmd("#{Maid::Platform::Commands.locate} #{ sh_escape(name) }").split("\n")
end

#location_city(path) ⇒ Object

Determine the city of the given JPEG image.

Examples

loation_city('old_capitol.jpg') # => "Iowa City, IA, US"


451
452
453
454
455
456
457
458
459
# File 'lib/maid/tools.rb', line 451

def location_city(path)
  case mime_type(path)
  when 'image/jpeg'
    gps = EXIFR::JPEG.new(path).gps
    coordinates_string = [gps.latitude, gps.longitude]
    location = Geocoder.search(coordinates_string).first
    [location.city, location.province, location.country_code].join(', ')
  end
end

#media_type(path) ⇒ Object

Get the Internet media type of the file.

In other words, the first part of mime_type.

Examples

media_type('bar.jpg') # => "image"


671
672
673
674
675
676
677
# File 'lib/maid/tools.rb', line 671

def media_type(path)
  type = MIME::Types.type_for(path)[0]

  if type
    type.media_type
  end
end

#mime_type(path) ⇒ Object

Get the MIME type of the file.

Examples

mime_type('bar.jpg') # => "image/jpeg"


656
657
658
659
660
661
662
# File 'lib/maid/tools.rb', line 656

def mime_type(path)
  type = MIME::Types.type_for(path)[0]

  if type
    [type.media_type, type.sub_type].join('/')
  end
end

#mkdir(path, options = {}) ⇒ Object

Create a directory and all of its parent directories.

The path of the created directory is returned, which allows for chaining (see examples).

Options

:mode

The symbolic and absolute mode can both be used, for example: 0700, 'u=wr,go=rr'

Examples

Creating a directory with a specific mode:

mkdir('~/Music/Pink Floyd/', :mode => 0644)

Ensuring a directory exists when moving:

move('~/Downloads/Pink Floyd*.mp3', mkdir('~/Music/Pink Floyd/'))


300
301
302
303
304
305
# File 'lib/maid/tools.rb', line 300

def mkdir(path, options = {})
  path = expand(path)
  log("mkdir -p #{ sh_escape(path) }")
  FileUtils.mkdir_p(path, @file_options.merge(options))
  path
end

#modified_at(path) ⇒ Object

Get the modification time of a file.

In Unix speak, mtime.

Examples

modified_at('foo.zip') # => Sat Apr 09 10:50:01 -0400 2011


539
540
541
# File 'lib/maid/tools.rb', line 539

def modified_at(path)
  File.mtime(expand(path))
end

#move(sources, destination) ⇒ Object

Move sources to a destination directory.

Movement is only allowed to directories that already exist. If your intention is to rename, see the rename method.

Examples

Single path:

move('~/Downloads/foo.zip', '~/Archive/Software/Mac OS X/')

Multiple paths:

move(['~/Downloads/foo.zip', '~/Downloads/bar.zip'], '~/Archive/Software/Mac OS X/')
move(dir('~/Downloads/*.zip'), '~/Archive/Software/Mac OS X/')


39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/maid/tools.rb', line 39

def move(sources, destination)
  destination = expand(destination)

  if File.directory?(destination)
    expand_all(sources).each do |source|
      log("move #{ sh_escape(source) } #{ sh_escape(destination) }")
      FileUtils.mv(source, destination, @file_options)
    end
  else
    # Unix `mv` warns about the target not being a directory with multiple sources.  Maid checks the same.
    warn("skipping move because #{ sh_escape(destination) } is not a directory (use 'mkdir' to create first, or use 'rename')")
  end
end

#newest_dupes_in(globs) ⇒ Object

Convenience method that is like dupes_in but excludes the oldest dupe.

Example

Keep the oldest dupe (trash the others):

trash newest_dupes_in('~/Downloads/*')


411
412
413
414
415
# File 'lib/maid/tools.rb', line 411

def newest_dupes_in(globs)
  dupes_in(globs).
    map { |dupes| dupes.sort_by { |p| File.mtime(p) }[1..-1] }.
    flatten
end

#remove(paths, options = {}) ⇒ Object

Delete the files at the given path recursively.

NOTE: In most cases, trash is a safer choice, since the files will be recoverable by retreiving them from the trash. Once you delete a file using remove, it's gone! Please use trash whenever possible and only use remove when necessary.

Options

:force => boolean

Force deletion (no error is raised if the file does not exist).

:secure => boolean

Infrequently needed. See FileUtils.remove_entry_secure

Examples

Single path:

remove('~/Downloads/foo.zip')

Multiple path:

remove(['~/Downloads/foo.zip', '~/Downloads/bar.zip'])
remove(dir('~/Downloads/*.zip'))


204
205
206
207
208
209
210
211
# File 'lib/maid/tools.rb', line 204

def remove(paths, options = {})
  expand_all(paths).each do |path|
    options = @file_options.merge(options)

    log("Removing #{ sh_escape(path) }")
    FileUtils.rm_r(path, options)
  end
end

#rename(source, destination) ⇒ Object

Rename a single file.

Any directories needed in order to complete the rename are made automatically.

Overwriting is not allowed; it logs a warning. If overwriting is desired, use remove to delete the file first, then use rename.

Examples

Simple rename:

rename('foo.zip', 'baz.zip') # "foo.zip" becomes "baz.zip"

Rename needing directories:

rename('foo.zip', 'bar/baz.zip') # "bar" is created, "foo.zip" becomes "baz.zip" within "bar"

Attempting to overwrite:

rename('foo.zip', 'existing.zip') # "skipping move of..."


72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/maid/tools.rb', line 72

def rename(source, destination)
  source = expand(source)
  destination = expand(destination)

  mkdir(File.dirname(destination))

  if File.exist?(destination)
    warn("skipping rename of #{ sh_escape(source) } to #{ sh_escape(destination) } because it would overwrite")
  else
    log("rename #{ sh_escape(source) } #{ sh_escape(destination) }")
    FileUtils.mv(source, destination, @file_options)
  end
end

#size_of(path) ⇒ Object

Get the size of a file.

Examples

size_of('foo.zip') # => 2193


548
549
550
# File 'lib/maid/tools.rb', line 548

def size_of(path)
  File.size(path)
end

#spotlight_content_types(path) ⇒ Object

[Mac OS X] Use Spotlight metadata to determine which content types a file has.

Examples

spotlight_content_types('foo.zip') # => ['public.zip-archive', 'public.archive']


635
636
637
# File 'lib/maid/tools.rb', line 635

def spotlight_content_types(path)
  mdls_to_array(path, 'kMDItemContentTypeTree')
end

#sync(from, to, options = {}) ⇒ Object

Simple sync two files/folders using rsync.

The host OS must provide rsync. See the rsync man page for a detailed description.

man rsync

Options

:delete => boolean :verbose => boolean :archive => boolean (default true) :update => boolean (default true) :exclude => string :prune_empty => boolean

Examples

Syncing a directory to a backup:

sync('~/music', '/backup/music')

Excluding a path:

sync('~/code', '/backup/code', :exclude => '.git')

Excluding multiple paths:

sync('~/code', '/backup/code', :exclude => ['.git', '.rvmrc'])


607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
# File 'lib/maid/tools.rb', line 607

def sync(from, to, options = {})
  # expand removes trailing slash
  # cannot use str[-1] due to ruby 1.8.7 restriction
  from = expand(from) + (from.end_with?('/') ? '/' : '')
  to = expand(to) + (to.end_with?('/') ? '/' : '')
  # default options
  options = { :archive => true, :update => true }.merge(options)
  ops = []
  ops << '-a' if options[:archive]
  ops << '-v' if options[:verbose]
  ops << '-u' if options[:update]
  ops << '-m' if options[:prune_empty]
  ops << '-n' if @file_options[:noop]

  Array(options[:exclude]).each do |path|
    ops << "--exclude=#{ sh_escape(path) }"
  end

  ops << '--delete' if options[:delete]
  stdout = cmd("rsync #{ ops.join(' ') } #{ sh_escape(from) } #{ sh_escape(to) } 2>&1")
  log("Fired sync from #{ sh_escape(from) } to #{ sh_escape(to) }.  STDOUT:\n\n#{ stdout }")
end

#trash(paths, options = {}) ⇒ Object

Move the given paths to the user's trash.

The path is still moved if a file already exists in the trash with the same name. However, the current date and time is appended to the filename.

Note: the OS-native "restore" or "put back" functionality for trashed files is not currently supported. (See issue #63.) However, they can be restored manually, and the Maid log can help assist with this.

Options

:remove_over => Fixnum (e.g. 1.gigabyte, 1024.megabytes)

Delete files over the given size rather than moving to the trash.

See also Maid::NumericExtensions::SizeToKb

Examples

Single path:

trash('~/Downloads/foo.zip')

Multiple paths:

trash(['~/Downloads/foo.zip', '~/Downloads/bar.zip'])
trash(dir('~/Downloads/*.zip'))


113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# File 'lib/maid/tools.rb', line 113

def trash(paths, options = {})
  # ## Implementation Notes
  #
  # Trashing files correctly is surprisingly hard.  What Maid ends up doing is one the easiest, most foolproof
  # solutions:  moving the file.
  #
  # Unfortunately, that means it's not possile to restore files automatically in OSX or Ubuntu.  The previous location
  # of the file is lost.
  #
  # OSX support depends on AppleScript or would require a not-yet-written C extension to interface with the OS.  The
  # AppleScript solution is less than ideal: the user has to be logged in, Finder has to be running, and it makes the
  # "trash can sound" every time a file is moved.
  #
  # Ubuntu makes it easy to implement, and there's a Python library for doing so (see `trash-cli`).  However, there's
  # not a Ruby equivalent yet.

  expand_all(paths).each do |path|
    target = File.join(@trash_path, File.basename(path))
    safe_trash_path = File.join(@trash_path, "#{ File.basename(path) } #{ Time.now.strftime('%Y-%m-%d-%H-%M-%S') }")

    if options[:remove_over] &&
        File.exist?(path) &&
        disk_usage(path) > options[:remove_over]
      remove(path)
    end

    if File.exist?(path)
      if File.exist?(target)
        rename(path, safe_trash_path)
      else
        move(path, @trash_path)
      end
    end
  end
end

#tree_empty?(root) ⇒ Boolean

Test whether a directory is either empty, or contains only empty directories/subdirectories.

Example

if tree_empty?(dir('~/Downloads/foo'))
  trash('~/Downloads/foo')
end


714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
# File 'lib/maid/tools.rb', line 714

def tree_empty?(root)
  return nil if File.file?(root)
  return true if Dir.glob(root + '/*').length == 0

  ignore = []

  # Look for files.
  return false if Dir.glob(root + '/*').select { |f| File.file?(f) }.length > 0

  empty_dirs = Dir.glob(root + '/**/*').select { |d|
    File.directory?(d)
  }.reverse.select { |d|
    # `.reverse` sorts deeper directories first.

    # If the directory is empty, its parent should ignore it.
    should_ignore = Dir.glob(d + '/*').select { |n|
      !ignore.include?(n)
    }.length == 0

    ignore << d if should_ignore

    should_ignore
  }

  Dir.glob(root + '/*').select { |n|
    !empty_dirs.include?(n)
  }.length == 0
end

#verbose_dupes_in(globs) ⇒ Object

Convenience method for dupes_in that excludes the dupe with the shortest name.

This is ideal for dupes like foo.zip, foo (1).zip, foo copy.zip.

Example

Keep the dupe with the shortest name (trash the others):

trash verbose_dupes_in('~/Downloads/*')


427
428
429
430
431
# File 'lib/maid/tools.rb', line 427

def verbose_dupes_in(globs)
  dupes_in(globs).
    map { |dupes| dupes.sort_by { |p| File.basename(p).length }[1..-1] }.
    flatten
end

#where_content_type(paths, filter_types) ⇒ Object

Filter an array by content types.

Content types can be MIME types, internet media types or Spotlight content types (OS X only).

If you need your rules to work on multiple platforms, it's recommended to avoid using Spotlight content types.

Examples

Using media types

where_content_type(dir('~/Downloads/*'), 'video')
where_content_type(dir('~/Downloads/*'), ['image', 'audio'])

Using MIME types

where_content_type(dir('~/Downloads/*'), 'image/jpeg')

Using Spotlight content types

Less portable, but richer data in some cases.

where_content_type(dir('~/Downloads/*'), 'public.image')


701
702
703
704
# File 'lib/maid/tools.rb', line 701

def where_content_type(paths, filter_types)
  filter_types = Array(filter_types)
  Array(paths).select { |p| !(filter_types & content_types(p)).empty? }
end

#zipfile_contents(path) ⇒ Object

List the contents of a zip file.

Examples

zipfile_contents('foo.zip') # => ['foo.exe', 'README.txt', 'subdir/anything.txt']


475
476
477
478
479
480
# File 'lib/maid/tools.rb', line 475

def zipfile_contents(path)
  # It might be nice to use `glob` from `Zip::FileSystem`, but it seems buggy.  (Subdirectories aren't included.)
  Zip::File.open(path) do |zip_file|
    zip_file.entries.map { |entry| entry.name }.sort
  end
end