Module: Janeway::Functions

Included in:
Parser
Defined in:
lib/janeway/functions.rb,
lib/janeway/functions/count.rb,
lib/janeway/functions/match.rb,
lib/janeway/functions/value.rb,
lib/janeway/functions/length.rb,
lib/janeway/functions/search.rb

Overview

Parses jsonpath function calls, and defines the code for jsonpath builtin functions

Instance Method Summary collapse

Instance Method Details

#parse_function_countObject

The count() function extension provides a way to obtain the number of nodes in a nodelist and make that available for further processing in the filter expression:

Its only argument is a nodelist. The result is a value (an unsigned integer) that gives the number of nodes in the nodelist.

Notes:

* There is no deduplication of the nodelist.
* The number of nodes in the nodelist is counted independent of
  their values or any children they may have, e.g., the count of a
  non-empty singular nodelist such as count(@) is always 1.

Examples:

$[?count(@.*.author) >= 5]

Raises:



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# File 'lib/janeway/functions/count.rb', line 20

def parse_function_count
  consume # function
  raise "expect group_start token, found #{current}" unless current.type == :group_start

  consume # (

  # Read parameter
  arg = parse_function_parameter
  parameters = [arg]
  raise Error, "Invalid parameter - count() expects node list, got #{arg.value.inspect}" if arg.literal?
  raise Error, 'Too many parameters for count() function call' unless current.type == :group_end

  # Define function body
  AST::Function.new('count', parameters) do |node_list|
    if node_list.is_a?(Array)
      node_list.size
    else
      1 # the count of a non-empty singular nodelist such as count(@) is always 1.
    end
  end
end

#parse_function_lengthObject

The length() function extension provides a way to compute the length of a value and make that available for further processing in the filter expression:

JSONPath return type: ValueType

Raises:



10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/janeway/functions/length.rb', line 10

def parse_function_length
  consume # function
  raise "expect group_start token, found #{current}" unless current.type == :group_start

  consume # (

  # Read parameter
  arg = parse_function_parameter
  parameters = [arg]
  unless arg.singular_query? || arg.literal?
    raise Error, "Invalid parameter - length() expects literal value or singular query, got #{arg.value.inspect}"
  end
  raise Error, 'Too many parameters for length() function call' unless current.type == :group_end

  # Meaning of return value depends on the JSON type:
  #   * string - number of Unicode scalar values in the string.
  #   * array -  number of elements in the array.
  #   * object - number of members in the object.
  # For any other argument value, the result is the special result Nothing.
  AST::Function.new('length', parameters) do |value|
    if [Array, Hash, String].include?(value.class)
      value.size
    else
      :nothing
    end
  end
end

#parse_function_matchObject

The match() function extension provides a way to check whether (the entirety of; see Section 2.4.7) a given string matches a given regular expression, which is in the form described in [RFC9485].

Its arguments are instances of ValueType (possibly taken from a singular query, as for the first argument in the example above). If the first argument is not a string or the second argument is not a string conforming to [RFC9485], the result is LogicalFalse. Otherwise, the string that is the first argument is matched against the I-Regexp contained in the string that is the second argument; the result is LogicalTrue if the string matches the I-Regexp and is LogicalFalse otherwise.

The regexp dialect is called “I-Regexp” and is defined in RFC9485.

Fortunately a shortcut is availalble, that RFC contains instructions for converting an I-Regexp to ruby’s regexp format. The instructions are:

* For any unescaped dots (.) outside character classes (first
  alternative of charClass production), replace the dot with [^\n\r].
* Enclose the regexp in \A(?: and )\z.

tl;dr: How is this different from the search function? “match” must match the entire string, “search” matches a substring.

Examples:

$[?match(@.date, “1974-05-..”)]

Raises:

See Also:



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/janeway/functions/match.rb', line 35

def parse_function_match
  consume # function
  raise "expect group_start token, found #{current}" unless current.type == :group_start

  consume # (

  # Read first parameter
  parameters = []
  parameters << parse_function_parameter
  raise Error, 'Not enough parameters for match() function call' unless current.type == :union

  consume # ,

  # Read second parameter (the regexp)
  # This could be a string, in which case it is available now.
  # Otherwise it is an expression that takes the regexp from the input document,
  # and the iregexp will not be available until interpretation.
  parameters << parse_function_parameter
  raise Error, 'Too many parameters for match() function call' unless current.type == :group_end

  AST::Function.new('match', parameters) do |str, str_iregexp|
    if str.is_a?(String) && str_iregexp.is_a?(String)
      regexp = translate_iregex_to_ruby_regex(str_iregexp)
      regexp.match?(str)
    else
      false # result defined by RFC9535
    end
  end
end

#parse_function_parameterString, ...

All jsonpath function parameters are one of these accepted types. Parse the function parameter and return the result.

Returns:



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/janeway/functions.rb', line 42

def parse_function_parameter
  result =
    case current.type
    when :string then parse_string
    when :current_node then parse_current_node
    when :root then parse_root
    when :group_end then raise Error, 'Function call is missing parameter'
    else
      # Invalid, no function uses this.
      # Instead of crashing here, accept it and let the function return an empty result.
      parse_expr
    end
  consume
  result
end

#parse_function_searchObject

2.4.7. search() Function Extension Parameters:

1. ValueType (string)
2. ValueType (string conforming to [RFC9485])

Result:

LogicalType

The search() function extension provides a way to check whether a given string contains a substring that matches a given regular expression, which is in the form described in [RFC9485].

$[?search(@.author, “[BR]ob”)]

Its arguments are instances of ValueType (possibly taken from a singular query, as for the first argument in the example above). If the first argument is not a string or the second argument is not a string conforming to [RFC9485], the result is LogicalFalse. Otherwise, the string that is the first argument is searched for a substring that matches the I-Regexp contained in the string that is the second argument; the result is LogicalTrue if at least one such substring exists and is LogicalFalse otherwise.

tl;dr: How is this different from the match function? “match” must match the entire string, “search” matches a substring.

Raises:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/janeway/functions/search.rb', line 30

def parse_function_search
  consume # function
  raise "expect group_start token, found #{current}" unless current.type == :group_start

  consume # (

  # Read first parameter
  parameters = []
  parameters << parse_function_parameter
  raise Error, 'Insufficient parameters for search() function call' unless current.type == :union

  consume # ,

  # Read second parameter (the regexp)
  # This could be a string, in which case it is available now.
  # Otherwise it is an expression that takes the regexp from the input document,
  # and the iregexp will not be available until interpretation.
  parameters << parse_function_parameter
  raise Error, 'Too many parameters for match() function call' unless current.type == :group_end

  AST::Function.new('search', parameters) do |str, str_iregexp|
    if str.is_a?(String) && str_iregexp.is_a?(String)
      regexp = translate_iregex_to_ruby_regex(str_iregexp, anchor: false)
      regexp.match?(str)
    else
      false # result defined by RFC9535
    end
  end
end

#parse_function_valueObject

Parameters:

1. NodesType

Result:

ValueType

The value() function extension provides a way to convert an instance of NodesType to a value and make that available for further processing in the filter expression:

Its only argument is an instance of NodesType (possibly taken from a filter-query, as in the example above). The result is an instance of ValueType.

If the argument contains a single node, the result is the value of the node.

If the argument is the empty nodelist or contains multiple nodes, the result is Nothing.

Note: A singular query may be used anywhere where a ValueType is expected, so there is no need to use the value() function extension with a singular query.

Examples:

$[?value(@..color) == “red”]

Raises:



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/janeway/functions/value.rb', line 29

def parse_function_value
  consume # function
  raise "expect group_start token, found #{current}" unless current.type == :group_start

  consume # (

  # Read parameter
  parameters = [parse_function_parameter]
  raise Error, 'Too many parameters for value() function call' unless current.type == :group_end

  AST::Function.new('value', parameters) do |nodes|
    if nodes.is_a?(Array) && nodes.size == 1
      nodes.first
    else
      :nothing
    end
  end
end

#translate_iregex_to_ruby_regex(iregex, anchor: true) ⇒ Regexp

Convert IRegexp format to ruby regexp equivalent, following the instructions in rfc9485.

Parameters:

  • iregex (String)
  • anchor (Boolean) (defaults to: true)

    add anchors to match the string start and end

Returns:

  • (Regexp)

See Also:



13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# File 'lib/janeway/functions.rb', line 13

def translate_iregex_to_ruby_regex(iregex, anchor: true)
  # * For any unescaped dots (.) outside character classes (first
  #   alternative of charClass production), replace the dot with [^\n\r].
  chars = iregex.chars
  in_char_class = false
  indexes = []
  chars.each_with_index do |char, i|
    case char
    when '[' then in_char_class = true
    when ']'
      in_char_class = false unless chars[i - 1] == '\\' # escaped ] does not close char class
    when '.'
      next if in_char_class || chars[i - 1] == '\\' # escaped dot

      indexes << i # replace this dot
    end
  end
  indexes.reverse_each do |i|
    chars[i] = '[^\n\r]'
  end

  # * Enclose the regexp in \A(?: and )\z.
  regex_str = anchor ? format('\A(?:%s)\z', chars.join) : chars.join
  Regexp.new(regex_str)
end