Class: CBETA

Inherits:
Object
  • Object
show all
Defined in:
lib/cbeta.rb

Defined Under Namespace

Classes: BMToText, Canon, CharCount, CharFrequency, Gaiji, HTMLToPDF, HTMLToText, P5aChecker, P5aToHTML, P5aToHTMLForEveryEdition, P5aToHTMLForPDF, P5aToSimpleHTML, P5aToText, P5aValidator, UnicodeService, XMLDocument

Constant Summary collapse

CANON =
'CC|DA|GA|GB|LC|TX|ZS|ZW|[A-Z]'
SORT_ORDER =
%w(T X A K S F C D U P J L G M N ZS I ZW B GA GB Y LC TX CC)
VOL3 =
%w[A CC C G GA GB L M P U]
DATA =
File.join(File.dirname(__FILE__), 'data')
PUNCS =
',.()[] 。‧.,、;?!:︰/()「」『』《》<>〈〉〔〕[]【】〖〗〃…—─ ~│┬▆△*+-='
WORK_PART =

經號 (不含 Canon ID):

四碼數字 T0001
四碼數字 + 英文字母 T0150A, T0128a
英文字母 + 三碼數字 JA041, ZWa073
'\d{4}[a-zA-Z]?|[ABa]\d{3}'
BASENAME =

XML file 主檔名GA010n0009

"(?:#{CANON})\\d{2,3}n(?:#{WORK_PART})"

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeCBETA

載入藏經資料



229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# File 'lib/cbeta.rb', line 229

def initialize()
  @canon_abbr = {}
  @canon_nickname = {}
  fn = File.join(File.dirname(__FILE__), 'data/canons.csv')
  CSV.foreach(fn, :headers => true, encoding: 'utf-8') do |row|
    id = row['id']
    unless row['nickname'].nil?
      @canon_nickname[id] = row['nickname']
    end
    next if row['abbreviation'].nil?
    next if row['abbreviation'].empty?
    @canon_abbr[id] = row['abbreviation']
  end
  
  fn = File.join(File.dirname(__FILE__), 'data/categories.json')
  s = File.read(fn)
  @categories = JSON.parse(s)
end

Class Method Details

.get_canon_from_vol(vol) ⇒ String

由 冊號 取得 藏經 ID

Parameters:

  • vol (String)

    冊號, 例如 “T01” 或 “GA009”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “GA”



42
43
44
# File 'lib/cbeta.rb', line 42

def self.get_canon_from_vol(vol)
  vol.sub(/^(#{CANON}).*$/, '\1')
end

.get_canon_id_from_linehead(linehead) ⇒ String

由 行首資訊 取得 藏經 ID

Parameters:

  • linehead (String)

    行首資訊, 例如 “T01n0001_p0001a01” 或 “GA009n0008_p0003a01”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “GA”



28
29
30
# File 'lib/cbeta.rb', line 28

def self.get_canon_id_from_linehead(linehead)
  linehead.sub(/^(#{CANON}).*$/, '\1')
end

.get_canon_id_from_work_id(work) ⇒ String

由 典籍編號 取得 藏經 ID

Parameters:

  • work (String)

    典籍編號, 例如 “T0001” 或 “ZW0001”

Returns:

  • (String)

    藏經 ID,例如 “T” 或 “ZW”



35
36
37
# File 'lib/cbeta.rb', line 35

def self.get_canon_id_from_work_id(work)
  work.sub(/^(#{CANON}).*$/, '\1')
end

.get_linehead(file_basename, lb) ⇒ String

Returns CBETA 行首資訊,例如 “T01n0001_p0001a01” 或 “T25n1510ap0757b29”.

Parameters:

  • file_basename (String)

    XML檔主檔名, 例如 “T01n0001” 或 “T25n1510a”

  • lb (String)

    例如 “0001a01” 或 “0757b29”

Returns:

  • (String)

    CBETA 行首資訊,例如 “T01n0001_p0001a01” 或 “T25n1510ap0757b29”



49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/cbeta.rb', line 49

def self.get_linehead(file_basename, lb)
  return nil if file_basename.nil?
  
  if file_basename.match(/^(T\d\dn0220)/)
    r = $1
  else
    r = file_basename
  end
  r += '_' if r.match(/\d$/)
  r += 'p' + lb
  r
end

.get_sort_order_from_canon_id(canon) ⇒ String

由「藏經 ID」取得「排序用編號」,例如:傳入 “T” 回傳 “A”;傳入 “X” 回傳 “B”

Parameters:

  • canon (String)

    藏經 ID

Returns:

  • (String)

    排序用編號



155
156
157
158
159
160
161
162
163
164
# File 'lib/cbeta.rb', line 155

def self.get_sort_order_from_canon_id(canon)
  # CBETA 提供,惠敏法師最後決定的全文檢索順序表, 2016-06-03
  i = SORT_ORDER.index(canon)
  if i.nil?
    puts "unknown canon id: #{canon}" 
    return nil
  end
  
  (i + 'A'.ord).chr
end

.get_work_id_from_file_basename(fn) ⇒ String

由 XML檔主檔名 取得 典籍編號

Parameters:

  • fn (String)

    檔名, 例如 “T01n0001” 或 “GA009n0008”

Returns:

  • (String)

    典籍編號,例如 “T0001” 或 “GA0008”



65
66
67
68
69
# File 'lib/cbeta.rb', line 65

def self.get_work_id_from_file_basename(fn)
  r = fn.sub(/^(#{CANON})\d{2,3}n(.*)$/, '\1\2')
  r = 'T0220' if r.start_with? 'T0220'
  r
end

.get_work_id_from_linehead(linehead) ⇒ String

由 行首資訊 取得 典籍編號

Parameters:

  • linehead (String)

    CBETA 行首資訊,例如 “T01n0001_p0001a01” 或 “T25n1510ap0757b29”

Returns:

  • (String)

    典籍編號,例如 “T0001” 或 “T1510a”



74
75
76
# File 'lib/cbeta.rb', line 74

def self.get_work_id_from_linehead(linehead)
  linehead.sub(/^(#{CANON})\d{2,3}n(#{WORK_PART}).*$/, '\1\2')
end

.get_xml_file_from_vol_and_work(vol, work) ⇒ String

由 冊號 及 典籍編號 取得 XML 主檔名

Parameters:

  • vol (String)

    冊號, 例如 “T01” 或 “GA009”

  • work (String)

    典籍編號, 例如 “T0001” 或 “GA0008”

Returns:

  • (String)

    XML主檔名,例如 “T01n0001” 或 “GA009n0008”



82
83
84
# File 'lib/cbeta.rb', line 82

def self.get_xml_file_from_vol_and_work(vol, work)
  vol + 'n' + work.sub(/^(#{CANON})(.*)$/, '\2')
end

.juan_across_vol(vol, work, juan = nil) ⇒ Numeric

卷跨冊

Returns:

  • (Numeric)

    1: 卷跨冊的上半部; 2: 卷跨冊的下半部



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/cbeta.rb', line 104

def self.juan_across_vol(vol, work, juan=nil)
  case work
  when 'GA0037'
    case vol
    when 'GA036'
      1 if juan == 2
    when 'GA037'
      2 if juan.nil? or juan == 2
    end
  when 'L1557'
    case vol
    when 'L130'
      1 if juan == 17 # 上半卷
    when 'L131'
      case juan
      when 17, nil then 2
      when 34      then 1
      end
    when 'L132'
      case juan
      when 34, nil then 2
      when 51      then 1
      end
    when 'L133'
      2 if juan.nil? or juan == 51
    end
  when 'X0714'
    case vol
    when 'X39'
      1 if juan == 3
    when 'X40'
      2 if juan.nil? or juan == 3
    end
  end
end

.linehead_to_s(linehead) ⇒ String

將行首資訊轉為引用格式

Examples:

CBETA.linehead_to_s('T85n2838_p1291a03')
# return "T85, no. 2838, p. 1291, a03"

Parameters:

  • linehead (String)

    行首資訊, 例如:T85n2838_p1291a03

Returns:

  • (String)

    引用格式的出處資訊,例如:T85, no. 2838, p. 1291, a03



174
175
176
177
178
179
# File 'lib/cbeta.rb', line 174

def self.linehead_to_s(linehead)
  linehead.match(/^((?:#{CANON})\d+)n(.*)_p(\d+)([a-z]\d+)$/) {
    return "#{$1}, no. #{$2}, p. #{$3}, #{$4}"
  }
  nil
end

.linehead_to_xml_file_path(linehead) ⇒ String

由 行首資訊 取得 XML檔相對路徑

Parameters:

  • linehead (String)

    行首資訊, 例如 “GA009n0008_p0003a01” ex: J36nB348_p0284c01

Returns:

  • (String)

    XML檔相對路徑,例如 “GA/GA009/GA009n0008.xml”



144
145
146
147
148
149
150
# File 'lib/cbeta.rb', line 144

def self.linehead_to_xml_file_path(linehead)
  if m = linehead.match(/^(?<work>(?<vol>(?<canon>#{CANON})\d+)n(?:#{WORK_PART})).*$/)
    File.join(m[:canon], m[:vol], m[:work]+'.xml')
  else
    nil
  end
end

.normalize_vol(vol) ⇒ Object



181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'lib/cbeta.rb', line 181

def self.normalize_vol(vol)
  if vol.match(/^(#{CANON})(.*)$/)
    canon = $1
    vol = $2
  
    if VOL3.include? canon
      # 這些藏經的冊號是三碼
      vol_len = 3
    else
      vol_len = 2      
    end
    canon + vol.rjust(vol_len, '0')
  else
    abort "unknown vol format: #{vol}"
  end
end

.open_xml(fn) ⇒ Object



198
199
200
201
202
203
# File 'lib/cbeta.rb', line 198

def self.open_xml(fn)
  s = File.read(fn)
  doc = Nokogiri::XML(s)
  doc.remove_namespaces!()
  doc
end

.pua(gid) ⇒ Object

傳入 缺字碼,傳回 Unicode PUA 字元



206
207
208
209
210
211
212
213
214
# File 'lib/cbeta.rb', line 206

def self.pua(gid)
  if gid.start_with? 'SD'
    siddham_pua(gid)
  elsif gid.start_with? 'RJ'
    ranjana_pua(gid)
  else
    [0xf0000 + gid[2..-1].to_i].pack 'U'
  end
end

.ranjana_pua(gid) ⇒ Object

傳入 蘭札體 缺字碼,傳回 Unicode PUA 字元



217
218
219
220
# File 'lib/cbeta.rb', line 217

def self.ranjana_pua(gid)
  i = 0x100000 + gid[-4..-1].hex
  [i].pack("U")
end

.siddham_pua(gid) ⇒ Object

傳入 悉曇字 缺字碼,傳回 Unicode PUA 字元



223
224
225
226
# File 'lib/cbeta.rb', line 223

def self.siddham_pua(gid)
  i = 0xFA000 + gid[-4..-1].hex
  [i].pack("U")
end

.work_juan_vol_range(work, juan) ⇒ Object

如果 卷跨冊,回傳 冊號範圍



87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/cbeta.rb', line 87

def self.work_juan_vol_range(work, juan)
  case work
  when 'GA0037'
    (36..37) if juan == 2
  when 'L1557'
    case juan
    when 17 then (130..131)
    when 34 then (131..132)
    when 51 then (132..133)
    end
  when 'X0714'
    (39..40) if juan == 3
  end
end

Instance Method Details

#get_canon_abbr(id) ⇒ String

取得藏經略名

Examples:

cbeta = CBETA.new
cbeta.get_canon_abbr('T') # return "大"

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經短名,例如 “大”



276
277
278
279
280
# File 'lib/cbeta.rb', line 276

def get_canon_abbr(id)
   r = get_canon_symbol(id)
   return nil if r.nil?
   r.sub(/^【(.*?)】$/, '\1')
end

#get_canon_nickname(id) ⇒ String

Returns 藏經短名,例如 “大正藏”.

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經短名,例如 “大正藏”



250
251
252
253
# File 'lib/cbeta.rb', line 250

def get_canon_nickname(id)
  return nil unless @canon_nickname.key? id
  @canon_nickname[id]
end

#get_canon_symbol(id) ⇒ String

取得藏經略符

Examples:

cbeta = CBETA.new
cbeta.get_canon_symbol('T') # return "【大】"

Parameters:

  • id (String)

    藏經 ID, 例如大正藏的 ID 是 “T”

Returns:

  • (String)

    藏經略符,例如 “【大】”



263
264
265
266
# File 'lib/cbeta.rb', line 263

def get_canon_symbol(id)
  return nil unless @canon_abbr.key? id
  @canon_abbr[id]
end

#get_category(book_id) ⇒ String

傳入經號,取得部類

Examples:

cbeta = CBETA.new
cbeta.get_category('T0220') # return '般若部類'

Parameters:

  • book_id (String)

    Book ID (經號), ex. “T0220”

Returns:

  • (String)

    部類名稱,例如 “阿含部類”



289
290
291
# File 'lib/cbeta.rb', line 289

def get_category(book_id)
  @categories[book_id]
end