Class: Wikipedia::VandalismDetection::Instances

Inherits:
Object
  • Object
show all
Defined in:
lib/wikipedia/vandalism_detection/instances.rb

Constant Summary collapse

REGULAR_CLASS_INDEX =
0
VANDALISM_CLASS_INDEX =
1
NOT_KNOWN_INDEX =
2
CLASS =
'class'.freeze
VANDALISM =
'vandalism'.freeze
REGULAR =
'regular'.freeze
NOT_KNOWN =
'?'.freeze
OUTLIER =
Weka::Classifiers::Meta::OneClassClassifier::OUTLIER_LABEL
VANDALISM_SHORT =
'V'.freeze
REGULAR_SHORT =
'R'.freeze
OLD_REVISION_ID =
'oldrevisionid'.freeze
NEW_REVISION_ID =
'newrevisionid'.freeze
CLASSES =
{
  REGULAR_CLASS_INDEX => REGULAR,
  VANDALISM_CLASS_INDEX => VANDALISM,
  NOT_KNOWN_INDEX => NOT_KNOWN
}.freeze
CLASSES_SHORT =
{
  REGULAR_CLASS_INDEX => REGULAR_SHORT,
  VANDALISM_CLASS_INDEX => VANDALISM_SHORT,
  NOT_KNOWN_INDEX => NOT_KNOWN
}.freeze

Class Method Summary collapse

Class Method Details

.emptyObject

Returns an empty instances dataset of type Java::WekaCore::Instances. This dataset is used for feature computation and classification for Wikipedia vandalism detection while training.

Examples:

datset = Wikipedia::VandalismDetection::Instances.empty
=> #<Java::WekaCore::Instances:0xf0f9a00
   @positions=[
     #<Java::WekaCore::Attribute:0x17207a76>,
     #<Java::WekaCore::Attribute:0x5547e4d6>,
     #<Java::WekaCore::Attribute:0x6300c957>,
     ...,
     #<Java::WekaCore::Attribute:0x5a74fae4>]>


50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/wikipedia/vandalism_detection/instances.rb', line 50

def empty
  features = Wikipedia::VandalismDetection.config.features
  classes = dataset_classes

  dataset = Weka::Core::Instances.new.with_attributes do
    features.each do |name|
      numeric name.tr(' ', '_')
    end

    nominal :class, values: classes, class_attribute: true
  end

  dataset
end

.empty_for_feature(name) ⇒ Object

Returns an empty instances dataset of type Java::WekaCore::Instances. This dataset is used for feature computation and classification for Wikipedia vandalism detection while training.

Examples:

datset = Wikipedia::VandalismDetection::Instances.empty
=> #<Java::WekaCore::Instances:0xf0f9a00
   @positions=[
     #<Java::WekaCore::Attribute:0x17207a76>


74
75
76
77
78
79
80
81
# File 'lib/wikipedia/vandalism_detection/instances.rb', line 74

def empty_for_feature(name)
  classes = dataset_classes

  Weka::Core::Instances.new.with_attributes do
    numeric name.tr(' ', '_')
    nominal :class, values: classes, class_attribute: true
  end
end

.empty_for_test_classObject

Returns an empty instances dataset of type Java::WekaCore::Instances. This dataset is used for creating the ground truth classification.



102
103
104
105
106
107
108
# File 'lib/wikipedia/vandalism_detection/instances.rb', line 102

def empty_for_test_class
  classes = dataset_classes

  Weka::Core::Instances.new.with_attributes do
    nominal :class, values: classes
  end
end

.empty_for_test_feature(name) ⇒ Object

Returns an empty instances dataset of type Java::WekaCore::Instances. This dataset is used for feature computation and classification for Wikipedia vandalism detection while testing.

Examples:

datset = Wikipedia::VandalismDetection::Instances.empty_for_test
=> #<Java::WekaCore::Instances:0xf0f9a00
   @positions=[
     #<Java::WekaCore::Attribute:0x17207a76>]>


92
93
94
95
96
97
98
# File 'lib/wikipedia/vandalism_detection/instances.rb', line 92

def empty_for_test_feature(name)
  Weka::Core::Instances.new.with_attributes do
    numeric name.tr(' ', '_')
    numeric OLD_REVISION_ID
    numeric NEW_REVISION_ID
  end
end