Class: Traject::LineWriter

Inherits:
Object
  • Object
show all
Defined in:
lib/traject/line_writer.rb

Overview

A writer for Traject::Indexer, that just writes out all the output as serialized text with #puts.

Should be thread-safe (ie, multiple worker threads can be calling #put concurrently), by wrapping write to actual output file in a mutex synchronize. This does not seem to effect performance much, as far as I could tell benchmarking.

This class can be sub-classed to write out different serialized reprentations -- subclasses will just override the #serialize method. For instance, see JsonWriter.

Output

The main functionality this class provides is logic for choosing based on settings what file or bytestream to send output to.

You can supply settings["output_file"] with a file path. LineWriter will open up a File to write to.

Or you can supply settings["output_stream"] with any ruby IO object, such an open File object or anything else.

If neither are supplied, will write to $stdout.

Closing the output stream

The LineWriter tries to guess on whether it should call close on the output stream it's writing to, when the LineWriter instance is closed. For instance, if you passed in a settings["output_file"] with a path, and the LineWriter opened up a File object for you, it should close it for you.

But for historical reasons, LineWriter doesn't just use that signal, but tries to guess generally on when to call close. If for some reason it gets it wrong, just use settings["close_output_on_close"] set to true or false. (String "true" or "false" are also acceptable, for convenience in setting options on command line)

Direct Known Subclasses

DebugWriter, DelimitedWriter, JsonWriter, YamlWriter

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(argSettings) ⇒ LineWriter

Returns a new instance of LineWriter.



44
45
46
47
48
49
50
# File 'lib/traject/line_writer.rb', line 44

def initialize(argSettings)
  @settings     = argSettings
  @write_mutex  = Mutex.new

  # trigger lazy loading now for thread-safety
  @output_file = open_output_file
end

Instance Attribute Details

#output_fileObject (readonly)

Returns the value of attribute output_file.



42
43
44
# File 'lib/traject/line_writer.rb', line 42

def output_file
  @output_file
end

#settingsObject (readonly)

Returns the value of attribute settings.



41
42
43
# File 'lib/traject/line_writer.rb', line 41

def settings
  @settings
end

#write_mutexObject (readonly)

Returns the value of attribute write_mutex.



42
43
44
# File 'lib/traject/line_writer.rb', line 42

def write_mutex
  @write_mutex
end

Instance Method Details

#_write(data) ⇒ Object



52
53
54
# File 'lib/traject/line_writer.rb', line 52

def _write(data)
  output_file.puts(data)
end

#closeObject



82
83
84
# File 'lib/traject/line_writer.rb', line 82

def close
  @output_file.close if should_close_stream?
end

#open_output_fileObject



68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/traject/line_writer.rb', line 68

def open_output_file
  unless defined? @output_file
    of =
      if settings["output_file"]
        File.open(settings["output_file"], 'w:UTF-8')
      elsif settings["output_stream"]
        settings["output_stream"]
      else
        $stdout
      end
  end
  return of
end

#put(context) ⇒ Object



61
62
63
64
65
66
# File 'lib/traject/line_writer.rb', line 61

def put(context)
  serialized = serialize(context)
  write_mutex.synchronize do
    _write(serialized)
  end
end

#serialize(context) ⇒ Object



57
58
59
# File 'lib/traject/line_writer.rb', line 57

def serialize(context)
  context.output_hash
end

#should_close_stream?Boolean

Returns:

  • (Boolean)


86
87
88
89
90
91
92
# File 'lib/traject/line_writer.rb', line 86

def should_close_stream?
  if settings["close_output_on_close"].nil?
    (@output_file.nil? || @output_file.tty? || @output_file == $stdout || $output_file == $stderr)
  else
    settings["close_output_on_close"].to_s == "true"
  end
end