Class: String

Inherits:
Object show all
Defined in:
lib/rbot/irc.rb,
lib/rbot/irc.rb,
lib/rbot/irc.rb,
lib/rbot/irc.rb,
lib/rbot/irc.rb,
lib/rbot/botuser.rb,
lib/rbot/ircsocket.rb,
lib/rbot/core/utils/extends.rb

Overview

Extensions to the String class

TODO make riphtml() just call ircify_html() with stronger purify options.

Instance Method Summary collapse

Instance Method Details

#get_html_titleObject

This method tries to find an HTML title in the string, and returns it if found



338
339
340
341
342
343
344
345
# File 'lib/rbot/core/utils/extends.rb', line 338

def get_html_title
  if defined? ::Hpricot
    Hpricot(self).at("title").inner_html
  else
    return unless Irc::Utils::TITLE_REGEX.match(self)
    $1
  end
end

#has_irc_glob?Boolean

This method checks if the receiver contains IRC glob characters

IRC has a very primitive concept of globs: a * stands for “any number of arbitrary characters”, a ? stands for “one and exactly one arbitrary character”. These characters can be escaped by prefixing them with a slash (\).

A known limitation of this glob syntax is that there is no way to escape the escape character itself, so it’s not possible to build a glob pattern where the escape character precedes a glob.

Returns:

  • (Boolean)


332
333
334
# File 'lib/rbot/irc.rb', line 332

def has_irc_glob?
  self =~ /^[*?]|[^\\][*?]/
end

#irc_downcase(casemap = 'rfc1459') ⇒ Object

This method returns a string which is the downcased version of the receiver, according to the given casemap



289
290
291
292
# File 'lib/rbot/irc.rb', line 289

def irc_downcase(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr(cmap.upper, cmap.lower)
end

#irc_downcase!(casemap = 'rfc1459') ⇒ Object

This is the same as the above, except that the string is altered in place

See also the discussion about irc_downcase



298
299
300
301
# File 'lib/rbot/irc.rb', line 298

def irc_downcase!(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr!(cmap.upper, cmap.lower)
end

#irc_send_penaltyObject

Calculate the penalty which will be assigned to this message by the IRCd



14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/rbot/ircsocket.rb', line 14

def irc_send_penalty
  # According to eggdrop, the initial penalty is
  penalty = 1 + self.size/100
  # on everything but UnderNET where it's
  # penalty = 2 + self.size/120

  cmd, pars = self.split($;,2)
  debug "cmd: #{cmd}, pars: #{pars.inspect}"
  case cmd.to_sym
  when :KICK
    chan, nick, msg = pars.split
    chan = chan.split(',')
    nick = nick.split(',')
    penalty += nick.size
    penalty *= chan.size
  when :MODE
    chan, modes, argument = pars.split
    extra = 0
    if modes
      extra = 1
      if argument
        extra += modes.split(/\+|-/).size
      else
        extra += 3 * modes.split(/\+|-/).size
      end
    end
    if argument
      extra += 2 * argument.split.size
    end
    penalty += extra * chan.split.size
  when :TOPIC
    penalty += 1
    penalty += 2 unless pars.split.size < 2
  when :PRIVMSG, :NOTICE
    dests = pars.split($;,2).first
    penalty += dests.split(',').size
  when :WHO
    args = pars.split
    if args.length > 0
      penalty += args.inject(0){ |sum,x| sum += ((x.length > 4) ? 3 : 5) }
    else
      penalty += 10
    end
  when :PART
    penalty += 4
  when :AWAY, :JOIN, :VERSION, :TIME, :TRACE, :WHOIS, :DNS
    penalty += 2
  when :INVITE, :NICK
    penalty += 3
  when :ISON
    penalty += 1
  else # Unknown messages
    penalty += 1
  end
  if penalty > 99
    debug "Wow, more than 99 secs of penalty!"
    penalty = 99
  end
  if penalty < 2
    debug "Wow, less than 2 secs of penalty!"
    penalty = 2
  end
  debug "penalty: #{penalty}"
  return penalty
end

#irc_upcase(casemap = 'rfc1459') ⇒ Object

Upcasing functions are provided too

See also the discussion about irc_downcase



307
308
309
310
# File 'lib/rbot/irc.rb', line 307

def irc_upcase(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr(cmap.lower, cmap.upper)
end

#irc_upcase!(casemap = 'rfc1459') ⇒ Object

In-place upcasing

See also the discussion about irc_downcase



316
317
318
319
# File 'lib/rbot/irc.rb', line 316

def irc_upcase!(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr!(cmap.lower, cmap.upper)
end

#ircify_html(opts = {}) ⇒ Object

This method will return a purified version of the receiver, with all HTML stripped off and some of it converted to IRC formatting



214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
# File 'lib/rbot/core/utils/extends.rb', line 214

def ircify_html(opts={})
  txt = self.dup

  # remove scripts
  txt.gsub!(/<script(?:\s+[^>]*)?>.*?<\/script>/im, "")

  # remove styles
  txt.gsub!(/<style(?:\s+[^>]*)?>.*?<\/style>/im, "")

  # bold and strong -> bold
  txt.gsub!(/<\/?(?:b|strong)(?:\s+[^>]*)?>/im, "#{Bold}")

  # italic, emphasis and underline -> underline
  txt.gsub!(/<\/?(?:i|em|u)(?:\s+[^>]*)?>/im, "#{Underline}")

  ## This would be a nice addition, but the results are horrible
  ## Maybe make it configurable?
  # txt.gsub!(/<\/?a( [^>]*)?>/, "#{Reverse}")
  case val = opts[:a_href]
  when Reverse, Bold, Underline
    txt.gsub!(/<(?:\/a\s*|a (?:[^>]*\s+)?href\s*=\s*(?:[^>]*\s*)?)>/, val)
  when :link_out
    # Not good for nested links, but the best we can do without something like hpricot
    txt.gsub!(/<a (?:[^>]*\s+)?href\s*=\s*(?:([^"'>][^\s>]*)\s+|"((?:[^"]|\\")*)"|'((?:[^']|\\')*)')(?:[^>]*\s+)?>(.*?)<\/a>/) { |match|
      debug match
      debug [$1, $2, $3, $4].inspect
      link = $1 || $2 || $3
      str = $4
      str + ": " + link
    }
  else
    warning "unknown :a_href option #{val} passed to ircify_html" if val
  end

  # If opts[:img] is defined, it should be a String. Each image
  # will be replaced by the string itself, replacing occurrences of
  # %{alt} %{dimensions} and %{src} with the alt text, image dimensions
  # and URL
  if val = opts[:img]
    if val.kind_of? String
      txt.gsub!(/<img\s+(.*?)\s*\/?>/) do |imgtag|
        attrs = Hash.new
        imgtag.scan(/([[:alpha:]]+)\s*=\s*(['"])?(.*?)\2/) do |key, quote, value|
          k = key.downcase.intern rescue 'junk'
          attrs[k] = value
        end
        attrs[:alt] ||= attrs[:title]
        attrs[:width] ||= '...'
        attrs[:height] ||= '...'
        attrs[:dimensions] ||= "#{attrs[:width]}x#{attrs[:height]}"
        val % attrs
      end
    else
      warning ":img option is not a string"
    end
  end

  # Paragraph and br tags are converted to whitespace
  txt.gsub!(/<\/?(p|br)(?:\s+[^>]*)?\s*\/?\s*>/i, ' ')
  txt.gsub!("\n", ' ')
  txt.gsub!("\r", ' ')

  # Superscripts and subscripts are turned into ^{...} and _{...}
  # where the {} are omitted for single characters
  txt.gsub!(/<sup>(.*?)<\/sup>/, '^{\1}')
  txt.gsub!(/<sub>(.*?)<\/sub>/, '_{\1}')
  txt.gsub!(/(^|_)\{(.)\}/, '\1\2')

  # List items are converted to *). We don't have special support for
  # nested or ordered lists.
  txt.gsub!(/<li>/, ' *) ')

  # All other tags are just removed
  txt.gsub!(/<[^>]+>/, '')

  # Convert HTML entities. We do it now to be able to handle stuff
  # such as &nbsp;
  txt = Utils.decode_html_entities(txt)

  # Keep unbreakable spaces or conver them to plain spaces?
  case val = opts[:nbsp]
  when :space, ' '
    txt.gsub!([160].pack('U'), ' ')
  else
    warning "unknown :nbsp option #{val} passed to ircify_html" if val
  end

  # Remove double formatting options, since they only waste bytes
  txt.gsub!(/#{Bold}(\s*)#{Bold}/, '\1')
  txt.gsub!(/#{Underline}(\s*)#{Underline}/, '\1')

  # Simplify whitespace that appears on both sides of a formatting option
  txt.gsub!(/\s+(#{Bold}|#{Underline})\s+/, ' \1')
  txt.sub!(/\s+(#{Bold}|#{Underline})\z/, '\1')
  txt.sub!(/\A(#{Bold}|#{Underline})\s+/, '\1')

  # And finally whitespace is squeezed
  txt.gsub!(/\s+/, ' ')
  txt.strip!

  if opts[:limit] && txt.size > opts[:limit]
    txt = txt.slice(0, opts[:limit]) + "#{Reverse}...#{Reverse}"
  end

  # Decode entities and strip whitespace
  return txt
end

#ircify_html!(opts = {}) ⇒ Object

As above, but modify the receiver



324
325
326
327
328
# File 'lib/rbot/core/utils/extends.rb', line 324

def ircify_html!(opts={})
  old_hash = self.hash
  replace self.ircify_html(opts)
  return self unless self.hash == old_hash
end

#ircify_html_titleObject

This method returns the IRC-formatted version of an HTML title found in the string



349
350
351
# File 'lib/rbot/core/utils/extends.rb', line 349

def ircify_html_title
  self.get_html_title.ircify_html rescue nil
end

#riphtmlObject

This method will strip all HTML crud from the receiver



332
333
334
# File 'lib/rbot/core/utils/extends.rb', line 332

def riphtml
  self.gsub(/<[^>]+>/, '').gsub(/&amp;/,'&').gsub(/&quot;/,'"').gsub(/&lt;/,'<').gsub(/&gt;/,'>').gsub(/&ellip;/,'...').gsub(/&apos;/, "'").gsub("\n",'')
end

#to_irc_auth_commandObject

Returns an Irc::Bot::Auth::Comand from the receiver



119
120
121
# File 'lib/rbot/botuser.rb', line 119

def to_irc_auth_command
  Irc::Bot::Auth::Command.new(self)
end

#to_irc_casemapObject

This method returns the Irc::Casemap whose name is the receiver



275
276
277
278
279
280
281
282
283
# File 'lib/rbot/irc.rb', line 275

def to_irc_casemap
  begin
    Irc::Casemap.get(self)
  rescue
    # raise TypeError, "Unkown Irc::Casemap #{self.inspect}"
    error "Unkown Irc::Casemap #{self.inspect} requested, defaulting to rfc1459"
    Irc::Casemap.get('rfc1459')
  end
end

#to_irc_channel(opts = {}) ⇒ Object

We keep extending String, this time adding a method that converts a String into an Irc::Channel object



1513
1514
1515
# File 'lib/rbot/irc.rb', line 1513

def to_irc_channel(opts={})
  Irc::Channel.new(self, opts)
end

#to_irc_channel_topicObject

Returns an Irc::Channel::Topic with self as text



1318
1319
1320
# File 'lib/rbot/irc.rb', line 1318

def to_irc_channel_topic
  Irc::Channel::Topic.new(self)
end

#to_irc_netmask(opts = {}) ⇒ Object

We keep extending String, this time adding a method that converts a String into an Irc::Netmask object



915
916
917
# File 'lib/rbot/irc.rb', line 915

def to_irc_netmask(opts={})
  Irc::Netmask.new(self, opts)
end

#to_irc_regexpObject

This method is used to convert the receiver into a Regular Expression that matches according to the IRC glob syntax



339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
# File 'lib/rbot/irc.rb', line 339

def to_irc_regexp
  regmask = Regexp.escape(self)
  regmask.gsub!(/(\\\\)?\\[*?]/) { |m|
    case m
    when /\\(\\[*?])/
      $1
    when /\\\*/
      '.*'
    when /\\\?/
      '.'
    else
      raise "Unexpected match #{m} when converting #{self}"
    end
  }
  Regexp.new("^#{regmask}$")
end

#to_irc_user(opts = {}) ⇒ Object

We keep extending String, this time adding a method that converts a String into an Irc::User object



1108
1109
1110
# File 'lib/rbot/irc.rb', line 1108

def to_irc_user(opts={})
  Irc::User.new(self, opts)
end

#wrap_nonempty(pre, post, opts = {}) ⇒ Object

This method is used to wrap a nonempty String by adding the prefix and postfix



355
356
357
358
359
360
361
# File 'lib/rbot/core/utils/extends.rb', line 355

def wrap_nonempty(pre, post, opts={})
  if self.empty?
    String.new
  else
    "#{pre}#{self}#{post}"
  end
end