Class: RubberbandFlamethrower::DataGenerator
- Inherits:
-
Object
- Object
- RubberbandFlamethrower::DataGenerator
- Defined in:
- lib/rubberband_flamethrower/data_generator.rb
Constant Summary collapse
- WORD_FILES =
the WORD_FILES constant is an array of included word files which will be used to create the pool of random words used for data generation. You can uncomment or comment particular files to change the size of the pool of words. Please see the README file in the words folder for more information about the lists.
[ # "/words/american-words.95", # "/words/american-words.80", # "/words/american-words.70", # "/words/american-words.60", # "/words/american-words.55", # "/words/american-words.50", # "/words/american-words.40", "/words/american-words.35", "/words/american-words.20", "/words/american-words.10" ]
Instance Attribute Summary collapse
-
#word_list ⇒ Object
Returns the value of attribute word_list.
Instance Method Summary collapse
-
#current_timestamp ⇒ String
create an Elastic Search friendly timestamp for right now.
- #generate_dataset(batch_size) ⇒ Object
-
#generate_random_insert_data ⇒ JSON
generate a JSON object that contains a message, username, and post_date intended to be passed as insert data to an elastic search server.
-
#initialize ⇒ DataGenerator
constructor
Will initialize the word_list variable with an array of all the words contained in the WORD_FILES array.
-
#random_tweet ⇒ String
create a message from between 6 and 16 random words that maxes at 140 characters and ends with a period.
-
#random_username ⇒ String
create a random value to be used as a username the return value is one random word, only letters and numbers allowed.
Constructor Details
#initialize ⇒ DataGenerator
Will initialize the word_list variable with an array of all the words contained in the WORD_FILES array
31 32 33 34 35 36 37 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 31 def initialize self.word_list = [] WORD_FILES.each do |word_file| contents = File.read(File.dirname(__FILE__)+word_file) self.word_list = word_list + contents.split(/\n/) end end |
Instance Attribute Details
#word_list ⇒ Object
Returns the value of attribute word_list.
11 12 13 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 11 def word_list @word_list end |
Instance Method Details
#current_timestamp ⇒ String
create an Elastic Search friendly timestamp for right now
55 56 57 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 55 def Time.now.strftime "%Y%m%dT%H:%M:%S" end |
#generate_dataset(batch_size) ⇒ Object
66 67 68 69 70 71 72 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 66 def generate_dataset(batch_size) File.open("dataset", 'w') do |file| batch_size.to_i.times do |i| file.write(generate_random_insert_data+"\n") end end end |
#generate_random_insert_data ⇒ JSON
generate a JSON object that contains a message, username, and post_date intended to be passed as insert data to an elastic search server
62 63 64 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 62 def generate_random_insert_data {message: "#{random_tweet}", username: "#{random_username}", post_date: "#{}"}.to_json end |
#random_tweet ⇒ String
create a message from between 6 and 16 random words that maxes at 140 characters and ends with a period
41 42 43 44 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 41 def random_tweet number_of_words = 6 + rand(10) ((number_of_words.times.map{word_list.sample}.join(" "))[0,139])+"." end |
#random_username ⇒ String
create a random value to be used as a username the return value is one random word, only letters and numbers allowed
49 50 51 |
# File 'lib/rubberband_flamethrower/data_generator.rb', line 49 def random_username word_list.sample.gsub(/[^0-9a-z]/i, '') end |