Class: BrowserWebData::EntitySumarization::NIFLineParser

Inherits:
Object
  • Object
show all
Includes:
BrowserWebData::EntitySumarizationConfig
Defined in:
lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb

Overview

The class include helpers to retrieve structured nif data from nif lines.

Constant Summary

Constants included from BrowserWebData::EntitySumarizationConfig

BrowserWebData::EntitySumarizationConfig::COMMON_PROPERTIES, BrowserWebData::EntitySumarizationConfig::IDENTICAL_PROPERTY_LIMIT, BrowserWebData::EntitySumarizationConfig::IMPORTANCE_TO_IDENTIFY_MAX_COUNT, BrowserWebData::EntitySumarizationConfig::NO_SENSE_PROPERTIES, BrowserWebData::EntitySumarizationConfig::SCAN_REGEXP

Class Method Summary collapse

Class Method Details

.parse_line_group(lines_group) ⇒ Hash

The method apply scan to recognize link, anchor, indexes and section from given nif dataset group of 7 lines.

link: "http://dbpedia.org/resource/Science_fiction_film",
anchor: "science fiction film",
indexes:  ["33", "53"],
section: "paragraph_0_419"

Examples:

nif_data:

Parameters:

  • lines_group (Array<String>)

Returns:

  • (Hash)

    nif_data



40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb', line 40

def self.parse_line_group(lines_group)
  begin_index = lines_group[2].scan(SCAN_REGEXP[:begin_index])[0]
  end_index = lines_group[3].scan(SCAN_REGEXP[:end_index])[0]
  target_resource_link = lines_group[5].scan(SCAN_REGEXP[:target_resource_link])[0]
  section = lines_group[4].scan(SCAN_REGEXP[:section])[0]
  anchor = lines_group[6].scan(SCAN_REGEXP[:anchor])[0]

  {
    link: target_resource_link[1].force_encoding('utf-8'),
    anchor: anchor[1].force_encoding('utf-8'),
    indexes: [begin_index[1], end_index[1]],
    section: section[0].split('=')[1]
  }
end

.parse_resource_uri(line) ⇒ String

The method apply scan to recognize resource uri from given nif dataset line.

Examples:

resource_uri: “dbpedia.org/resource/Captain_EO

Parameters:

  • line (String)

Returns:

  • (String)

    resource_uri



23
24
25
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb', line 23

def self.parse_resource_uri(line)
  (line.scan(SCAN_REGEXP[:scan_resource])[0])[0].split('?').first
end