Class: RubberbandFlamethrower::DataGenerator

Inherits:
Object
  • Object
show all
Defined in:
lib/rubberband_flamethrower/data_generator.rb

Constant Summary collapse

WORD_FILES =

the WORD_FILES constant is an array of included word files which will be used to create the pool of random words used for data generation. You can uncomment or comment particular files to change the size of the pool of words. Please see the README file in the words folder for more information about the lists.

[
# "/words/american-words.95",
# "/words/american-words.80",
# "/words/american-words.70",
# "/words/american-words.60",
# "/words/american-words.55",
# "/words/american-words.50",
# "/words/american-words.40",
"/words/american-words.35",
"/words/american-words.20",
"/words/american-words.10"
]

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeDataGenerator

Will initialize the word_list variable with an array of all the words contained in the WORD_FILES array



31
32
33
34
35
36
37
# File 'lib/rubberband_flamethrower/data_generator.rb', line 31

def initialize
  self.word_list = []
  WORD_FILES.each do |word_file|
    contents = File.read(File.dirname(__FILE__)+word_file)
    self.word_list = word_list + contents.split(/\n/)
  end
end

Instance Attribute Details

#word_listObject

Returns the value of attribute word_list.



11
12
13
# File 'lib/rubberband_flamethrower/data_generator.rb', line 11

def word_list
  @word_list
end

Instance Method Details

#current_timestampString

create an Elastic Search friendly timestamp for right now

Returns:

  • (String)


55
56
57
# File 'lib/rubberband_flamethrower/data_generator.rb', line 55

def current_timestamp
  Time.now.strftime "%Y%m%dT%H:%M:%S"
end

#generate_dataset(batch_size) ⇒ Object



66
67
68
69
70
71
72
# File 'lib/rubberband_flamethrower/data_generator.rb', line 66

def generate_dataset(batch_size)
  File.open("dataset", 'w') do |file|
    batch_size.to_i.times do |i|
      file.write(generate_random_insert_data+"\n") 
    end
  end
end

#generate_random_insert_dataJSON

generate a JSON object that contains a message, username, and post_date intended to be passed as insert data to an elastic search server

Returns:

  • (JSON)


62
63
64
# File 'lib/rubberband_flamethrower/data_generator.rb', line 62

def generate_random_insert_data
  {message: "#{random_tweet}", username: "#{random_username}", post_date: "#{current_timestamp}"}.to_json
end

#random_tweetString

create a message from between 6 and 16 random words that maxes at 140 characters and ends with a period

Returns:

  • (String)


41
42
43
44
# File 'lib/rubberband_flamethrower/data_generator.rb', line 41

def random_tweet
  number_of_words = 6 + rand(10)
  ((number_of_words.times.map{word_list.sample}.join(" "))[0,139])+"."
end

#random_usernameString

create a random value to be used as a username the return value is one random word, only letters and numbers allowed

Returns:

  • (String)


49
50
51
# File 'lib/rubberband_flamethrower/data_generator.rb', line 49

def random_username
  word_list.sample.gsub(/[^0-9a-z]/i, '')
end