Class: Swot

Inherits:
Object
  • Object
show all
Extended by:
SwotCollectionMethods
Includes:
NaughtyOrNice
Defined in:
lib/swot.rb,
lib/swot/academic_tlds.rb

Constant Summary collapse

VERSION =
"0.4.2"
BLACKLIST =

These are domains that snuck into the edu registry, but don’t pass the education sniff test Note: validated domain must not end with the blacklisted string

%w(
  si.edu
  america.edu
  californiacolleges.edu
  australia.edu
  cet.edu
).freeze
ACADEMIC_TLDS =

These top-level domains are guaranteed to be academic institutions.

%w(
  ac.ae
  ac.at
  ac.bd
  ac.be
  ac.cn
  ac.cr
  ac.cy
  ac.fj
  ac.gg
  ac.gn
  ac.id
  ac.il
  ac.in
  ac.ir
  ac.jp
  ac.ke
  ac.kr
  ac.ma
  ac.me
  ac.mu
  ac.mw
  ac.mz
  ac.ni
  ac.nz
  ac.om
  ac.pa
  ac.pg
  ac.pr
  ac.rs
  ac.ru
  ac.rw
  ac.sz
  ac.th
  ac.tz
  ac.ug
  ac.uk
  ac.yu
  ac.za
  ac.zm
  ac.zw
  cc.al.us
  cc.ar.us
  cc.az.us
  cc.ca.us
  cc.co.us
  cc.fl.us
  cc.ga.us
  cc.hi.us
  cc.ia.us
  cc.id.us
  cc.il.us
  cc.in.us
  cc.ks.us
  cc.ky.us
  cc.la.us
  cc.md.us
  cc.me.us
  cc.mi.us
  cc.mn.us
  cc.mo.us
  cc.ms.us
  cc.mt.us
  cc.nc.us
  cc.nd.us
  cc.ne.us
  cc.nj.us
  cc.nm.us
  cc.nv.us
  cc.ny.us
  cc.oh.us
  cc.ok.us
  cc.or.us
  cc.pa.us
  cc.ri.us
  cc.sc.us
  cc.sd.us
  cc.tx.us
  cc.va.us
  cc.vi.us
  cc.wa.us
  cc.wi.us
  cc.wv.us
  cc.wy.us
  ed.ao
  ed.cr
  ed.jp
  edu
  edu.af
  edu.al
  edu.ar
  edu.au
  edu.az
  edu.ba
  edu.bb
  edu.bd
  edu.bh
  edu.bi
  edu.bn
  edu.bo
  edu.br
  edu.bs
  edu.bt
  edu.bz
  edu.ck
  edu.cn
  edu.co
  edu.cu
  edu.do
  edu.dz
  edu.ec
  edu.ee
  edu.eg
  edu.er
  edu.es
  edu.et
  edu.ge
  edu.gh
  edu.gr
  edu.gt
  edu.hk
  edu.hn
  edu.ht
  edu.in
  edu.iq
  edu.jm
  edu.jo
  edu.kg
  edu.kh
  edu.kn
  edu.kw
  edu.ky
  edu.kz
  edu.la
  edu.lb
  edu.lr
  edu.lv
  edu.ly
  edu.me
  edu.mg
  edu.mk
  edu.ml
  edu.mm
  edu.mn
  edu.mo
  edu.mt
  edu.mv
  edu.mw
  edu.mx
  edu.my
  edu.ni
  edu.np
  edu.om
  edu.pa
  edu.pe
  edu.ph
  edu.pk
  edu.pl
  edu.pr
  edu.ps
  edu.pt
  edu.pw
  edu.py
  edu.qa
  edu.rs
  edu.ru
  edu.sa
  edu.sc
  edu.sd
  edu.sg
  edu.sh
  edu.sl
  edu.sv
  edu.sy
  edu.tr
  edu.tt
  edu.tw
  edu.ua
  edu.uy
  edu.ve
  edu.vn
  edu.ws
  edu.ye
  edu.zm
  es.kr
  g12.br
  hs.kr
  ms.kr
  sc.kr
  sc.ug
  sch.ae
  sch.gg
  sch.id
  sch.ir
  sch.je
  sch.jo
  sch.lk
  sch.ly
  sch.my
  sch.om
  sch.ps
  sch.sa
  sch.uk
  school.nz
  school.za
  tec.ar.us
  tec.az.us
  tec.co.us
  tec.fl.us
  tec.ga.us
  tec.ia.us
  tec.id.us
  tec.il.us
  tec.in.us
  tec.ks.us
  tec.ky.us
  tec.la.us
  tec.ma.us
  tec.md.us
  tec.me.us
  tec.mi.us
  tec.mn.us
  tec.mo.us
  tec.ms.us
  tec.mt.us
  tec.nc.us
  tec.nd.us
  tec.nh.us
  tec.nm.us
  tec.nv.us
  tec.ny.us
  tec.oh.us
  tec.ok.us
  tec.pa.us
  tec.sc.us
  tec.sd.us
  tec.tx.us
  tec.ut.us
  tec.vi.us
  tec.wa.us
  tec.wi.us
  tec.wv.us
  vic.edu.au
).to_set.freeze

Class Method Summary collapse

Instance Method Summary collapse

Methods included from SwotCollectionMethods

all_domains, each_domain

Class Method Details

.academic?Object



25
# File 'lib/swot.rb', line 25

alias_method :academic?, :valid?

.domains_pathObject



32
33
34
# File 'lib/swot.rb', line 32

def domains_path
  @domains_path ||= File.expand_path "domains", File.dirname(__FILE__)
end

.from_path(path_string_or_path) ⇒ Object

Returns a new Swot instance for the domain file at the given path.

Note that the path must be absolute.

Returns a Swot instance or false is no domain is found at the given path.



40
41
42
43
44
45
46
47
# File 'lib/swot.rb', line 40

def from_path(path_string_or_path)
  path = Pathname.new(path_string_or_path)
  return false unless path.exist?
  path_dir, file = path.relative_path_from(Pathname.new(domains_path)).split
  backwards_path = path_dir.to_s.split('/').push(file.basename('.txt').to_s)
  domain = backwards_path.reverse.join('.')
  Swot.new(domain)
end

.get_institution_name(text) ⇒ Object Also known as: school_name



27
28
29
# File 'lib/swot.rb', line 27

def get_institution_name(text)
  Swot.new(text).institution_name
end

.is_academic?Object



24
# File 'lib/swot.rb', line 24

alias_method :is_academic?, :valid?

Instance Method Details

#academic_domain?Boolean

Figure out if a domain name is a know academic institution.

Returns true if the domain name belongs to a known academic institution;

false otherwise.

Returns:

  • (Boolean)


83
84
85
# File 'lib/swot.rb', line 83

def academic_domain?
  @academic_domain ||= File.exist?(file_path)
end

#institution_nameObject Also known as: school_name, name

Figure out the institution name based on the email address/domain.

Returns a string with the institution name; nil if nothing is found.



71
72
73
74
75
# File 'lib/swot.rb', line 71

def institution_name
  @institution_name ||= File.read(file_path, :mode => "rb", :external_encoding => "UTF-8").strip
rescue
  nil
end

#valid?Boolean

Figure out if an email or domain belongs to academic institution.

Returns true if the domain name belongs to an academic institution;

false otherwise.

Returns:

  • (Boolean)


54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/swot.rb', line 54

def valid?
  if domain.nil?
    false
  elsif BLACKLIST.any? { |d| to_s =~ /(\A|\.)#{Regexp.escape(d)}\z/ }
    false
  elsif ACADEMIC_TLDS.include?(domain.tld)
    true
  elsif academic_domain?
    true
  else
    false
  end
end