Class: BrowserWebData::EntitySumarization::NIFLineParser
- Inherits:
-
Object
- Object
- BrowserWebData::EntitySumarization::NIFLineParser
- Includes:
- BrowserWebData::EntitySumarizationConfig
- Defined in:
- lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb
Overview
The class include helpers to retrieve structured nif data from nif lines.
Constant Summary
Constants included from BrowserWebData::EntitySumarizationConfig
BrowserWebData::EntitySumarizationConfig::COMMON_PROPERTIES, BrowserWebData::EntitySumarizationConfig::IDENTICAL_PROPERTY_LIMIT, BrowserWebData::EntitySumarizationConfig::IMPORTANCE_TO_IDENTIFY_MAX_COUNT, BrowserWebData::EntitySumarizationConfig::NO_SENSE_PROPERTIES, BrowserWebData::EntitySumarizationConfig::SCAN_REGEXP
Class Method Summary collapse
-
.parse_line_group(lines_group) ⇒ Hash
The method apply scan to recognize link, anchor, indexes and section from given nif dataset group of 7 lines.
-
.parse_resource_uri(line) ⇒ String
The method apply scan to recognize resource uri from given nif dataset line.
Class Method Details
.parse_line_group(lines_group) ⇒ Hash
The method apply scan to recognize link, anchor, indexes and section from given nif dataset group of 7 lines.
link: "http://dbpedia.org/resource/Science_fiction_film",
anchor: "science fiction film",
indexes: ["33", "53"],
section: "paragraph_0_419"
40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb', line 40 def self.parse_line_group(lines_group) begin_index = lines_group[2].scan(SCAN_REGEXP[:begin_index])[0] end_index = lines_group[3].scan(SCAN_REGEXP[:end_index])[0] target_resource_link = lines_group[5].scan(SCAN_REGEXP[:target_resource_link])[0] section = lines_group[4].scan(SCAN_REGEXP[:section])[0] anchor = lines_group[6].scan(SCAN_REGEXP[:anchor])[0] { link: target_resource_link[1].force_encoding('utf-8'), anchor: anchor[1].force_encoding('utf-8'), indexes: [begin_index[1], end_index[1]], section: section[0].split('=')[1] } end |
.parse_resource_uri(line) ⇒ String
The method apply scan to recognize resource uri from given nif dataset line.
23 24 25 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_nif_parser.rb', line 23 def self.parse_resource_uri(line) (line.scan(SCAN_REGEXP[:scan_resource])[0])[0].split('?').first end |