Class: Bbcode::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/bbcode/tokenizer.rb

Overview

Scans a string and converts it to a stream of bbcode tokens.

Constant Summary collapse

BBCODE_TAG_PATTERN =
/\[(\/?)([a-z0-9_-]*)(\s*=?(?:(?:\s*(?:(?:[a-z0-9_-]+)|(?<=\=))\s*[:=]\s*)?(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'|[^\]\s,]+|(?<=,)(?=\s*,))\s*,?\s*)*)\]/i
ATTRIBUTE_PATTERN =
/(?:\s*(?:([a-z0-9_-]+)|^)\s*[:=]\s*)?("[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'|[^\]\s,]+|(?<=,)(?=\s*,))\s*,?/i
UNESCAPE_PATTERN =
/\\(.)/

Instance Method Summary collapse

Instance Method Details

#parse_attributes_string(attributes_string) ⇒ Object



8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# File 'lib/bbcode/tokenizer.rb', line 8

def parse_attributes_string( attributes_string )
	attrs = HashWithIndifferentAccess.new
	return attrs if attributes_string.nil?

	next_anonymous_key = -1
	attributes_string.scan ATTRIBUTE_PATTERN do |key, value|
		skip_value = key.blank? && value.blank?
		key = next_anonymous_key+=1 if key.blank?
		unless skip_value
			value = value[1...-1].gsub UNESCAPE_PATTERN, "\\1" if value[0] == value[-1] && ["'", '"'].include?(value[0])
			attrs[key] = value
		end
	end

	return attrs
end

#tokenize(document, handler) ⇒ Object

Parses the document as BBCode-formatted text and calls block with bbcode events.

The handler will have the following methods called:

  • .text text A text-event with an additional parameter containing the actual text.

  • .start_element element_name, element_arguments An element-event with 2 additional parameters: The element name as a symbol and the element attributes as a hash. This events indicate the start of the element.

  • .end_element element_name An element-event indicating the end of an element. Optionally, the element_name is added as a parameter. If no parameter is present, it is assumed to be the last started element.

Note that :start_element and :end_element are not guaranteed to be called evenly or in the “correct” order. You must match correct start- and end tags yourself to create the elements.

Also note that :text events are not guaranteed to match the whole text. In some cases, the text might be separated to multiple :text events, even though there are no nodes in between.



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/bbcode/tokenizer.rb', line 47

def tokenize(document, handler)
	while !(match = BBCODE_TAG_PATTERN.match(document)).nil?
		offset = match.begin(0)
		elem_source = match[0]

		handler.text document[0...offset] unless offset == 0

		elem_is_closing_tag = match[1]=='/'
		elem_name = (match[2].length > 0 && match[2].to_sym) || nil
		elem_attr_string = (match[3].length > 0 && match[3]) || nil

		if (elem_is_closing_tag && !elem_attr_string) || (!elem_is_closing_tag && elem_name)
			if !elem_is_closing_tag
				handler.start_element elem_name, parse_attributes_string(elem_attr_string), elem_source
			else
				handler.end_element elem_name, elem_source
			end
		else
			handler.text elem_source
		end

		document = document[(offset+elem_source.length)..-1]
	end

	handler.text document unless document.length == 0
end