Module: Text::Latex::Util::Macronconversions

Defined in:
lib/macronconversions/macronconversions.rb,
lib/macronconversions/version.rb,
lib/macronconversions/conversion_structure.rb

Overview

Synopsis

Text::Latex::Util::MacronConversions: module providing class methods to convert macron (dis-)enabled strings into the opposite.

Usage

require 'require macronconversions'

Description

The class provides two class methods: convert and deconvert In the event that you need to transform LaTeX-style markep into entities of some sort, use the former class. In the event that you need to down-sample macron-characters into LaTeX-style, use the latter.

Example Code

# Basic conversion and advanced conversion
puts Text::Latex::Util::Macronconversions.convert("mon\\={e}re", 'mc') #=> monēre

# Complex de-conversion
puts MacronConversions::MacronDeConverter.new("laudāre") #=> "laud\={a}re"

# Coup de grace
puts MacronConversions::MacronDeConverter.new(
  MacronConversions::MacronConverter.new('to bring up, educate: \={e}duc\={o}, \={e}duc\={a}re, \={e}duc\={a}v\={\i}, \={e}ducatus; education, educator, educable', 'mc').to_s)

Author

Steven G. Harms, www.stevengharms.com

Constant Summary collapse

VERSION =
"1.0.0"
CONVERSION_TABLE =

Chart used for ASCII LaTeX lookup against the formats

{
  "\\={a}"   => 
                {
                  :mc   => "ā",
                  :utf8 => "\\xc4\\x81",
                  :html => "ā"
                },
  "\\={e}"   => 
                {
                  :mc   => "ē",
                  :utf8 => "\\xc4\\x93",
                  :html => "ē"
                },
  "\\={\\i}" => 
                {
                  :mc   => "ī",
                  :utf8 => "\\xc4\\xab",
                  :html => "ī"
                },
  "\\={o}"   => 
                {
                  :mc   => "ō",
                  :utf8 => "\\xc5\\x8d",
                  :html => "ō"
                },
  "\\={u}"   => 
                {
                  :mc   => "ū",
                  :utf8 => "\\xc5\\xab",
                  :html => "ū"
                },
  "\\={A}"   => 
                {
                  :mc   => "Ā",
                  :utf8 => "\\xc4\\x80",
                  :html => "Ā"
                },
  "\\={E}"   => 
                {
                  :mc   => "Ē",
                  :utf8 => "\\xc4\\x92",
                  :html => "Ē"
                },
  "\\={\\I}" => 
                {
                  :mc   => "Ī",
                  :utf8 => "\\xc4\\xaa",
                  :html => "Ī"
                },
  "\\={O}"   => 
                {
                  :mc   => "Ō",
                  :utf8 => "\\xc5\\x8c",
                  :html => "Ō"
                },
  "\\={U}"   => 
                {
                  :mc   => "Ū",
                  :utf8 => "\\xc5\\xaa",
                  :html => "Ū"
                }
}

Class Method Summary collapse

Class Method Details

._convert_char(c, mode) ⇒ Object

“Private” method (still available for unit testing, but you probably shouldn’t mess with it)

Does the lookup to convert LaTeX ASCII to macron bearing character formatting



229
230
231
232
233
234
235
236
237
238
# File 'lib/macronconversions/macronconversions.rb', line 229

def _convert_char(c,mode)             
  begin
    r = Text::Latex::Util::Macronconversions::CONVERSION_TABLE[c][mode]
    raise if r.nil?
  rescue
    puts "_convert_char failed to find a match for [#{c}]"
    raise
  end
  r
end

._deconvert_char(c, chart) ⇒ Object

“Private” method (still available for unit testing, but you probably shouldn’t mess with it)

Does the lookup to convert macron bearing character to LaTeX ASCII formatting



211
212
213
214
215
216
217
218
219
220
221
# File 'lib/macronconversions/macronconversions.rb', line 211

def _deconvert_char(c, chart)
  begin
    r = chart[c] 
    raise if r.nil?
  rescue
    puts "_deconvert_char failed to find a match for [#{c}]"
    pp chart
    raise 
  end
  r
end

.convert(word, mode = :mc, &b) ⇒ Object

Macronconversions::convert is the routine that scans a token for LaTeX macron codes, recursively. Upon the indetification of a macron-ized character, it passes that character to the “private” method MacronConverter#_convert_char

Params:

word

A string that uses the LaTeX standard for macron denotation

mode

How the resultant string should be formatted (mc|utf8|html)

The resultant string may be operated upon by passing an optional block.



178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/macronconversions/macronconversions.rb', line 178

def convert(word, mode=:mc, &b)
  # Ends the recurse
  return "" if word.empty?
  
  # String to which the recurse's outputs will be appended
  #
  # All LaTeX Macron codes begin with an '\\={' token and end with
  # '}' Scan for that using a RegEx thus creating a match and rest. 
  # The match is passed to _convert_char and the rest is recursed to
  # this method.
  return_string = 
    if word.slice(0) == "\\"
      word =~ /(\\.*?})(.*)/
       _convert_char($1,mode.to_sym) + 
             convert(word[($1.length)..-1], mode.to_sym)
    else
       word.slice(0) + convert(word[1..-1],mode)
    end

  # Allow a block to be given to mutate the string after having been fabricated              
  if block_given?
    return_string = (yield return_string )
  end

  return_string
end

.deconvert(word, *arg) ⇒ Object

Deconverts a string that has macron-bearing vowels from the format to the ASCII representation used by LaTeX.

The method is recursive and as such the 2 optional arguments are defined after the initial call.
Params:
+word+ :: (a string to convert
+from_format+ Never Directly Called:  Which format of macron should be expected?  See Macronconversions documentation
+conversion_chart+ Never Directly Called:  Which lookup table should the characters of word be tested against?

Raises:

  • (ArgumentError)


85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/macronconversions/macronconversions.rb', line 85

def deconvert(word, *arg)
  return "" if word.empty?

  # If the target has already been set, then we should respect that
  # fact.  This makes recurses over longer strings faster
  #
  # If it has not already been set, we derive the type heuristically
  mode =  ((! arg[0].nil?) or 
           (! arg[0]==:skip)) ?
        arg[0] 
      :
        if word =~ /\&\#/
          :html
        elsif word =~ /[āēīōūĀĒĪŌŪ]/
          :mc 
        elsif word =~ /\\x/
          :utf8
        end
  
  # If the mode has not been set, we should have a plain old letter
  # otherwise you want to die since we won't be able to build a
  # chart for a non-existant format.
  raise ArgumentError if (mode.nil? and word.slice(0) !~ /^[a-z]/)
  
  # Mutate the chart, but use the one given, if it was given (i.e.
  # we are in a recursive call)
  mutated_chart = {}
  if arg[1].nil?
    Text::Latex::Util::Macronconversions::CONVERSION_TABLE.each do |k,v|
      mutated_chart[v[mode]]=k
    end
  else
    mutated_chart = arg[1]
  end
  
  # String to which the recurse's outputs will be appended
  #
  # All LaTeX Macron codes begin with an '=' token.  Scan for that
  # using a RegEx.  The value is set to firstSlash.
  # 
  # This is just ugly, but is nothing to be afraid of.
  #
  # You look to see if the character is an ampersand.  That means
  # you've got HTML entities.  Take the ending token of the entity
  # and hold it, and then recursively send the tail to this method
  # to be processed again.  A cheap serialization is established by
  # sending the logic-requiring results on to recursive invocations
  # 
  # The same logic applies to the second if state, we're dealing
  # with the representation of utf-8 characters
  #
  # The third case varies slightly, we have a multibyte *single*
  # character.  This character can be slice!d off and the tail
  # recursively sent onward.
  #
  # Lastly, if you have a plain character, follow the same model as
  # the preceeding.

  return_string = 
    if word.slice(0) == "&"
      word =~ /(&.*?;)(.*)/
      _deconvert_char($1, mutated_chart) + 
            deconvert(word[($1.length)..-1], mode.to_sym, mutated_chart)                
    elsif word.slice(0) == "\\"
      word =~ /(^\\x..\\x..)(.*)/
      _deconvert_char($1, mutated_chart) + 
            deconvert(word[($1.length)..-1], mode.to_sym, mutated_chart)
    elsif word.slice(0) =~ /[āēīōūĀĒĪŌŪ]/
      _deconvert_char(word.slice!(0),  mutated_chart) + 
            deconvert(word, mode.to_sym, mutated_chart)
    else
      # This is kinda ugly.  Particularly arg1.  
      word.slice!(0) + deconvert(word, :skip, mutated_chart)
    end

  # Allow a block to be given to mutate the string after having been fabricated              
  if block_given?
    return_string = (yield return_string )
  end

  # debugger if word == "" 
  return_string
end