Module: Text::Latex::Util::Macronconversions
- Defined in:
- lib/macronconversions/macronconversions.rb,
lib/macronconversions/version.rb,
lib/macronconversions/conversion_structure.rb
Overview
Synopsis
Text::Latex::Util::MacronConversions: module providing class methods to convert macron (dis-)enabled strings into the opposite.
Usage
require 'require macronconversions'
Description
The class provides two class methods: convert and deconvert In the event that you need to transform LaTeX-style markep into entities of some sort, use the former class. In the event that you need to down-sample macron-characters into LaTeX-style, use the latter.
Example Code
# Basic conversion and advanced conversion
puts Text::Latex::Util::Macronconversions.convert("mon\\={e}re", 'mc') #=> monēre
# Complex de-conversion
puts MacronConversions::MacronDeConverter.new("laudāre") #=> "laud\={a}re"
# Coup de grace
puts MacronConversions::MacronDeConverter.new(
MacronConversions::MacronConverter.new('to bring up, educate: \={e}duc\={o}, \={e}duc\={a}re, \={e}duc\={a}v\={\i}, \={e}ducatus; education, educator, educable', 'mc').to_s)
Author
Steven G. Harms, www.stevengharms.com
Constant Summary collapse
- VERSION =
"1.0.0"- CONVERSION_TABLE =
Chart used for ASCII LaTeX lookup against the formats
{ "\\={a}" => { :mc => "ā", :utf8 => "\\xc4\\x81", :html => "ā" }, "\\={e}" => { :mc => "ē", :utf8 => "\\xc4\\x93", :html => "ē" }, "\\={\\i}" => { :mc => "ī", :utf8 => "\\xc4\\xab", :html => "ī" }, "\\={o}" => { :mc => "ō", :utf8 => "\\xc5\\x8d", :html => "ō" }, "\\={u}" => { :mc => "ū", :utf8 => "\\xc5\\xab", :html => "ū" }, "\\={A}" => { :mc => "Ā", :utf8 => "\\xc4\\x80", :html => "Ā" }, "\\={E}" => { :mc => "Ē", :utf8 => "\\xc4\\x92", :html => "Ē" }, "\\={\\I}" => { :mc => "Ī", :utf8 => "\\xc4\\xaa", :html => "Ī" }, "\\={O}" => { :mc => "Ō", :utf8 => "\\xc5\\x8c", :html => "Ō" }, "\\={U}" => { :mc => "Ū", :utf8 => "\\xc5\\xaa", :html => "Ū" } }
Class Method Summary collapse
-
._convert_char(c, mode) ⇒ Object
“Private” method (still available for unit testing, but you probably shouldn’t mess with it).
-
._deconvert_char(c, chart) ⇒ Object
“Private” method (still available for unit testing, but you probably shouldn’t mess with it).
-
.convert(word, mode = :mc, &b) ⇒ Object
Macronconversions::convert is the routine that scans a token for LaTeX macron codes, recursively.
-
.deconvert(word, *arg) ⇒ Object
Deconverts a string that has macron-bearing vowels from the format to the ASCII representation used by LaTeX.
Class Method Details
._convert_char(c, mode) ⇒ Object
“Private” method (still available for unit testing, but you probably shouldn’t mess with it)
Does the lookup to convert LaTeX ASCII to macron bearing character formatting
229 230 231 232 233 234 235 236 237 238 |
# File 'lib/macronconversions/macronconversions.rb', line 229 def _convert_char(c,mode) begin r = Text::Latex::Util::Macronconversions::CONVERSION_TABLE[c][mode] raise if r.nil? rescue puts "_convert_char failed to find a match for [#{c}]" raise end r end |
._deconvert_char(c, chart) ⇒ Object
“Private” method (still available for unit testing, but you probably shouldn’t mess with it)
Does the lookup to convert macron bearing character to LaTeX ASCII formatting
211 212 213 214 215 216 217 218 219 220 221 |
# File 'lib/macronconversions/macronconversions.rb', line 211 def _deconvert_char(c, chart) begin r = chart[c] raise if r.nil? rescue puts "_deconvert_char failed to find a match for [#{c}]" pp chart raise end r end |
.convert(word, mode = :mc, &b) ⇒ Object
Macronconversions::convert is the routine that scans a token for LaTeX macron codes, recursively. Upon the indetification of a macron-ized character, it passes that character to the “private” method MacronConverter#_convert_char
Params:
word-
A string that uses the LaTeX standard for macron denotation
mode-
How the resultant string should be formatted (mc|utf8|html)
The resultant string may be operated upon by passing an optional block.
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/macronconversions/macronconversions.rb', line 178 def convert(word, mode=:mc, &b) # Ends the recurse return "" if word.empty? # String to which the recurse's outputs will be appended # # All LaTeX Macron codes begin with an '\\={' token and end with # '}' Scan for that using a RegEx thus creating a match and rest. # The match is passed to _convert_char and the rest is recursed to # this method. return_string = if word.slice(0) == "\\" word =~ /(\\.*?})(.*)/ _convert_char($1,mode.to_sym) + convert(word[($1.length)..-1], mode.to_sym) else word.slice(0) + convert(word[1..-1],mode) end # Allow a block to be given to mutate the string after having been fabricated if block_given? return_string = (yield return_string ) end return_string end |
.deconvert(word, *arg) ⇒ Object
Deconverts a string that has macron-bearing vowels from the format to the ASCII representation used by LaTeX.
The method is recursive and as such the 2 optional arguments are defined after the initial call.
Params:
+word+ :: (a string to convert
+from_format+ Never Directly Called: Which format of macron should be expected? See Macronconversions documentation
+conversion_chart+ Never Directly Called: Which lookup table should the characters of word be tested against?
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/macronconversions/macronconversions.rb', line 85 def deconvert(word, *arg) return "" if word.empty? # If the target has already been set, then we should respect that # fact. This makes recurses over longer strings faster # # If it has not already been set, we derive the type heuristically mode = ((! arg[0].nil?) or (! arg[0]==:skip)) ? arg[0] : if word =~ /\&\#/ :html elsif word =~ /[āēīōūĀĒĪŌŪ]/ :mc elsif word =~ /\\x/ :utf8 end # If the mode has not been set, we should have a plain old letter # otherwise you want to die since we won't be able to build a # chart for a non-existant format. raise ArgumentError if (mode.nil? and word.slice(0) !~ /^[a-z]/) # Mutate the chart, but use the one given, if it was given (i.e. # we are in a recursive call) mutated_chart = {} if arg[1].nil? Text::Latex::Util::Macronconversions::CONVERSION_TABLE.each do |k,v| mutated_chart[v[mode]]=k end else mutated_chart = arg[1] end # String to which the recurse's outputs will be appended # # All LaTeX Macron codes begin with an '=' token. Scan for that # using a RegEx. The value is set to firstSlash. # # This is just ugly, but is nothing to be afraid of. # # You look to see if the character is an ampersand. That means # you've got HTML entities. Take the ending token of the entity # and hold it, and then recursively send the tail to this method # to be processed again. A cheap serialization is established by # sending the logic-requiring results on to recursive invocations # # The same logic applies to the second if state, we're dealing # with the representation of utf-8 characters # # The third case varies slightly, we have a multibyte *single* # character. This character can be slice!d off and the tail # recursively sent onward. # # Lastly, if you have a plain character, follow the same model as # the preceeding. return_string = if word.slice(0) == "&" word =~ /(&.*?;)(.*)/ _deconvert_char($1, mutated_chart) + deconvert(word[($1.length)..-1], mode.to_sym, mutated_chart) elsif word.slice(0) == "\\" word =~ /(^\\x..\\x..)(.*)/ _deconvert_char($1, mutated_chart) + deconvert(word[($1.length)..-1], mode.to_sym, mutated_chart) elsif word.slice(0) =~ /[āēīōūĀĒĪŌŪ]/ _deconvert_char(word.slice!(0), mutated_chart) + deconvert(word, mode.to_sym, mutated_chart) else # This is kinda ugly. Particularly arg1. word.slice!(0) + deconvert(word, :skip, mutated_chart) end # Allow a block to be given to mutate the string after having been fabricated if block_given? return_string = (yield return_string ) end # debugger if word == "" return_string end |