Module: PlainText::Split

Included in:
String
Defined in:
lib/plain_text/split.rb

Overview

Contains a method that splits a String in a reversible way

String#split is a powerful method. One caveat is there is no way to guarantee the possibility to reverse the process when a random Regexp (as opposed to String or when the user knows what exactly the Regexp is or has a perfect control about it) is given, because the resultant Array contains all the group-ed String as elements.

This module provides a method to enable it. Requiring this file makes the method included in the String class.

Examples:

Reversible (the method is assumed to be included in String)

my_str.split_with_delimiter(/MyRegexp/).join == my_str  # => true

Author:

  • Masa Sakano (Wise Babel Ltd)

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.count_lines(instr, linebreak: $/) ⇒ Integer

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

Parameters:

  • instr (String)

    String that is examined.

  • linebreak (String) (defaults to: $/)

    \n etc (Default: $/).

Returns:

  • (Integer)

    always positive

See Also:



78
79
80
81
82
83
# File 'lib/plain_text/split.rb', line 78

def self.count_lines(instr, linebreak: $/)
  return 0 if instr.empty?
  ar = instr.split(linebreak, -1)  # -1 is specified to preserve the last linebreak(s).
  ar.pop if "" == ar[-1]
  ar.size
end

.count_regexp(instr, re_in, like_linenum: false, with_if_end: false) ⇒ Integer

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

Parameters:

  • instr (String)

    String that is examined.

  • re_in (Regexp, String)

    If String, it is interpreted literally as in String#split.

  • like_linenum (Boolean) (defaults to: false)

    if true (Def: false), it counts like the line number.

  • with_if_end (Boolean) (defaults to: false)

    a special case (see the description).

Returns:

  • (Integer)

    always positive

See Also:



59
60
61
62
63
64
65
66
67
68
# File 'lib/plain_text/split.rb', line 59

def self.count_regexp(instr, re_in, like_linenum: false, with_if_end: false)
  like_linenum = true if with_if_end
  return (with_if_end ? [0, true] : 0) if instr.empty?
  allsize = split_with_delimiter(instr, re_in).size

  n_normal = allsize.div(2)
  return n_normal if !like_linenum
  n_lines = (allsize.even? ? allsize : allsize+1).div 2
  with_if_end ? [n_normal, (n_normal ==  n_lines)] : n_lines
end

.split_with_delimiter(instr, re_in) ⇒ Array

The class-method version of the instance method of the same name.

One more parameter (input String) is required to specify.

Parameters:

  • instr (String)

    String that is examined.

  • re_in (Regexp, String)

    If String, it is interpreted literally as in String#split.

Returns:

  • (Array)

See Also:



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# File 'lib/plain_text/split.rb', line 31

def self.split_with_delimiter(instr, re_in)
  re_in = Regexp.new(Regexp.quote(re_in)) if re_in.class.method_defined? :to_str
  re_grp = add_grouping(re_in)  # Ensure grouping.

  arspl = instr.split re_grp, -1
  return arspl if arspl.size <= 1  # n.b., Size is 0 for an empty string (only?).

  n_grouping = re_grp.match(instr).size  # The number of grouping - should be at least 2, including $&.
  return adjust_last_element(arspl) if n_grouping <= 2

  # Takes only the split main contents and delimeter
  arret = []
  arspl.each_with_index do |ec, ei|
    arret << ec if (1..2).include?( (ei + 1) % n_grouping )
  end
  adjust_last_element(arret) # => Array
end

Instance Method Details

#count_lines(**kwd) ⇒ Integer

Returns the number of lines.

Parameters:

  • kwd (Hash<linebreak: String>)

    \n etc (Default: $/).

Returns:

  • (Integer)

    always positive

See Also:



168
169
170
# File 'lib/plain_text/split.rb', line 168

def count_lines(**kwd)
  PlainText::Split.public_send(__method__, self, **kwd)
end

#count_regexp(*rest, **kwd) ⇒ Integer, Array<Integer, Boolean>

Count the number of matches to self that satisfy the given Regexp

If like_linenum option is specified, it is counted like the number of lines, namely the returned value is incremented from the number of matches by 1 unless the very last characters of the String is the last match. For example, if no matches are found, this still returns one.

Note if the String (self) is empty, this always returns 0.

The special option is with_if_end. If given true,

this returns Array<Integer, Boolean> instead of a simple Integer, with the first parameter being the Integer of the count as with the default like_linenum=false, and the second parameter gives true if the number is the same even if it was like_linenum=true, namely if the end of the String coincides with the last match, else false. (This parameter is introduced just to reduce the overhead of potentially calling this routine twice or user’s making their own check.)

Parameters:

  • rest (Regexp, String)

    re_in: If String, it is interpreted literally as in String#split.

  • kwd (Hash<like_linenum: Boolean, with_if_end: Boolean>)

    if like_linenum: true (Def: false), it counts like the line number. with_if_end: a special case (see the description).

Returns:

  • (Integer, Array<Integer, Boolean>)

    always positive

See Also:



159
160
161
# File 'lib/plain_text/split.rb', line 159

def count_regexp(*rest, **kwd)
  PlainText::Split.public_send(__method__, self, *rest, **kwd)
end

#split_with_delimiter(*rest) ⇒ Array

Split with the delimiter even when Regexp (or String) is given

Note the last empty component, if exists, is deleted in the returned Array. If the input string is empty, the returned Array is also empty, as in String#split.

Examples:

Standard split (without grouping) : s=“XQabXXcXQ”

s.split(/X+Q?/)         #=> ["", "ab", "c"],                   
s.split(/X+Q?/, -1)     #=> ["", "ab", "c", ""],               

Standard split (with grouping) : s=“XQabXXcXQ”

s.split(/X+(Q?)/, -1)   #=> ["", "Q", "ab", "", "c", "Q", ""], 
s.split(/(X+(Q?))/, -1) #=> ["", "XQ", "Q", "ab", "XX", "", "c", "XQ", "Q", ""], 

This method (when included in String (as Default)) : s=“XQabXXcXQ”

s.split_with_delimiter(/X+(Q?)/)
                        #=> ["", "XQ", "ab", "XX", "c", "XQ"]

Parameters:

  • rest (Regexp, String)

    If String, it is interpreted literally as in String#split.

Returns:

  • (Array)


129
130
131
# File 'lib/plain_text/split.rb', line 129

def split_with_delimiter(*rest)
  PlainText::Split.public_send(__method__, self, *rest)
end