Method: Elasticsearch::API::Actions#termvectors

Defined in:
lib/elasticsearch/api/actions/termvectors.rb

#termvectors(arguments = {}) ⇒ Object

Get term vector information. Get information and statistics about terms in the fields of a particular document. You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the ‘fields` parameter or by adding the fields to the request body. For example:

“‘ GET /my-index-000001/_termvectors/1?fields=message “`

Fields can be specified using wildcards, similar to the multi match query. Term vectors are real-time by default, not near real-time. This can be changed by setting ‘realtime` parameter to `false`. You can request three types of values: _term information_, _term statistics_, and _field statistics_. By default, all term information and field statistics are returned for all fields but term statistics are excluded. **Term information**

  • term frequency in the field (always returned)

  • term positions (‘positions: true`)

  • start and end offsets (‘offsets: true`)

  • term payloads (‘payloads: true`), as base64 encoded bytes

If the requested information wasn’t stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.

Parameters:

  • arguments (Hash) (defaults to: {})

    a customizable set of options

Options Hash (arguments):

  • :index (String)

    The name of the index that contains the document. (Required)

  • :id (String)

    A unique identifier for the document.

  • :fields (String, Array<String>)

    A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the ‘completion_fields` or `fielddata_fields` parameters.

  • :field_statistics (Boolean)

    If ‘true`, the response includes:

    • The document count (how many documents contain this field).

    • The sum of document frequencies (the sum of document frequencies for all terms in this field).

    • The sum of total term frequencies (the sum of total term frequencies of each term in this field). Server default: true.

  • :offsets (Boolean)

    If ‘true`, the response includes term offsets. Server default: true.

  • :payloads (Boolean)

    If ‘true`, the response includes term payloads. Server default: true.

  • :positions (Boolean)

    If ‘true`, the response includes term positions. Server default: true.

  • :preference (String)

    The node or shard the operation should be performed on. It is random by default.

  • :realtime (Boolean)

    If true, the request is real-time as opposed to near-real-time. Server default: true.

  • :routing (String)

    A custom value that is used to route operations to a specific shard.

  • :term_statistics (Boolean)

    If ‘true`, the response includes:

    • The total term frequency (how often a term occurs in all documents).

    • The document frequency (the number of documents containing the current term).

    By default these values are not returned since term statistics can have a serious performance impact.

  • :version (Integer)

    If ‘true`, returns the document version as part of a hit.

  • :version_type (String)

    The version type.

  • :error_trace (Boolean)

    When set to ‘true` Elasticsearch will include the full stack trace of errors when they occur.

  • :filter_path (String, Array<String>)

    Comma-separated list of filters in dot notation which reduce the response returned by Elasticsearch.

  • :human (Boolean)

    When set to ‘true` will return statistics in a format suitable for humans. For example `“exists_time”: “1h”` for humans and `“exists_time_in_millis”: 3600000` for computers. When disabled the human readable values will be omitted. This makes sense for responses being consumed only by machines.

  • :pretty (Boolean)

    If set to ‘true` the returned JSON will be “pretty-formatted”. Only use this option for debugging only.

  • :headers (Hash)

    Custom HTTP headers

  • :body (Hash)

    request body

Raises:

  • (ArgumentError)

See Also:



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/elasticsearch/api/actions/termvectors.rb', line 84

def termvectors(arguments = {})
  request_opts = { endpoint: arguments[:endpoint] || 'termvectors' }

  defined_params = [:index, :id].each_with_object({}) do |variable, set_variables|
    set_variables[variable] = arguments[variable] if arguments.key?(variable)
  end
  request_opts[:defined_params] = defined_params unless defined_params.empty?

  raise ArgumentError, "Required argument 'index' missing" unless arguments[:index]

  arguments = arguments.clone
  headers = arguments.delete(:headers) || {}

  body = arguments.delete(:body)

  _index = arguments.delete(:index)

  _id = arguments.delete(:id)

  method = if body
             Elasticsearch::API::HTTP_POST
           else
             Elasticsearch::API::HTTP_GET
           end

  arguments.delete(:endpoint)
  path = if _index && _id
           "#{Utils.listify(_index)}/_termvectors/#{Utils.listify(_id)}"
         else
           "#{Utils.listify(_index)}/_termvectors"
         end
  params = Utils.process_params(arguments)

  Elasticsearch::API::Response.new(
    perform_request(method, path, params, body, headers, request_opts)
  )
end