Module: TextStat::ReadabilityFormulas
- Included in:
- Main
- Defined in:
- lib/textstat/readability_formulas.rb
Overview
Readability formulas and text difficulty calculations
This module implements various readability formulas used to determine the reading level and complexity of text. Each formula uses different metrics and is suitable for different types of content and audiences.
Instance Method Summary collapse
-
#automated_readability_index(text) ⇒ Float
Calculate Automated Readability Index (ARI).
-
#coleman_liau_index(text) ⇒ Float
Calculate Coleman-Liau Index.
-
#dale_chall_readability_score(text, language = 'en_us') ⇒ Float
Calculate Dale-Chall Readability Score.
-
#flesch_kincaid_grade(text, language = 'en_us') ⇒ Float
Calculate Flesch-Kincaid Grade Level.
-
#flesch_reading_ease(text, language = 'en_us') ⇒ Float
Calculate Flesch Reading Ease score.
-
#forcast(text, language = 'en_us') ⇒ Integer
Calculate FORCAST Readability Formula.
-
#gunning_fog(text, language = 'en_us') ⇒ Float
Calculate Gunning Fog Index.
-
#linsear_write_formula(text, language = 'en_us') ⇒ Float
Calculate Linsear Write Formula.
-
#lix(text) ⇒ Float
Calculate LIX Readability Formula.
-
#powers_sumner_kearl(text, language = 'en_us') ⇒ Float
Calculate Powers-Sumner-Kearl Readability Formula.
-
#smog_index(text, language = 'en_us') ⇒ Float
Calculate SMOG Index (Simple Measure of Gobbledygook).
-
#spache(text, language = 'en_us') ⇒ Float
Calculate SPACHE Readability Formula.
-
#text_standard(text, float_output = nil) ⇒ String, Float
Calculate consensus text standard from multiple formulas.
Instance Method Details
#automated_readability_index(text) ⇒ Float
Calculate Automated Readability Index (ARI)
ARI uses character counts and word lengths to estimate readability. It’s designed to be easily calculated by computer programs.
117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/textstat/readability_formulas.rb', line 117 def automated_readability_index(text) chars = char_count(text) words = lexicon_count(text) sentences = sentence_count(text) a = chars.to_f / words b = words.to_f / sentences readability = (4.71 * a) + (0.5 * b) - 21.43 readability.round(1) rescue ZeroDivisionError 0.0 end |
#coleman_liau_index(text) ⇒ Float
Calculate Coleman-Liau Index
This formula relies on character counts instead of syllable counts, making it more suitable for automated analysis. It estimates the U.S. grade level required to understand the text.
101 102 103 104 105 106 |
# File 'lib/textstat/readability_formulas.rb', line 101 def coleman_liau_index(text) letters = (avg_letter_per_word(text) * 100).round(2) sentences = (avg_sentence_per_word(text) * 100).round(2) coleman = (0.0588 * letters) - (0.296 * sentences) - 15.8 coleman.round(2) end |
#dale_chall_readability_score(text, language = 'en_us') ⇒ Float
Calculate Dale-Chall Readability Score
This formula uses a list of 3000 familiar words to determine text difficulty. It’s particularly effective for elementary and middle school texts.
169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/textstat/readability_formulas.rb', line 169 def dale_chall_readability_score(text, language = 'en_us') word_count = lexicon_count(text) count = word_count - difficult_words(text, language) per = (100.0 * count) / word_count difficult_words_percentage = 100 - per score = (0.1579 * difficult_words_percentage) + (0.0496 * avg_sentence_length(text)) score += 3.6365 if difficult_words_percentage > 5 score.round(2) rescue ZeroDivisionError 0.0 end |
#flesch_kincaid_grade(text, language = 'en_us') ⇒ Float
Calculate Flesch-Kincaid Grade Level
This formula converts the Flesch Reading Ease score into a U.S. grade level, making it easier to understand the education level required to comprehend the text.
59 60 61 62 63 64 |
# File 'lib/textstat/readability_formulas.rb', line 59 def flesch_kincaid_grade(text, language = 'en_us') sentence_length = avg_sentence_length(text) syllables_per_word = avg_syllables_per_word(text, language) flesch = (0.39 * sentence_length) + (11.8 * syllables_per_word) - 15.59 flesch.round(1) end |
#flesch_reading_ease(text, language = 'en_us') ⇒ Float
Calculate Flesch Reading Ease score
The Flesch Reading Ease formula produces a score between 0 and 100, with higher scores indicating easier readability.
Score ranges:
-
90-100: Very Easy
-
80-89: Easy
-
70-79: Fairly Easy
-
60-69: Standard
-
50-59: Fairly Difficult
-
30-49: Difficult
-
0-29: Very Difficult
41 42 43 44 45 46 |
# File 'lib/textstat/readability_formulas.rb', line 41 def flesch_reading_ease(text, language = 'en_us') sentence_length = avg_sentence_length(text) syllables_per_word = avg_syllables_per_word(text, language) flesch = 206.835 - (1.015 * sentence_length) - (84.6 * syllables_per_word) flesch.round(2) end |
#forcast(text, language = 'en_us') ⇒ Integer
Calculate FORCAST Readability Formula
FORCAST (FOg Readability by CASTing) is designed for technical materials and focuses on single-syllable words to determine readability.
231 232 233 234 235 236 237 |
# File 'lib/textstat/readability_formulas.rb', line 231 def forcast(text, language = 'en_us') words = text.split[0..149] words_with_one_syllabe = words.count do |word| syllable_count(word, language) == 1 end 20 - (words_with_one_syllabe / 10) end |
#gunning_fog(text, language = 'en_us') ⇒ Float
Calculate Gunning Fog Index
The Fog Index estimates the years of formal education needed to understand the text. It focuses on sentence length and polysyllabic words.
193 194 195 196 197 198 199 |
# File 'lib/textstat/readability_formulas.rb', line 193 def gunning_fog(text, language = 'en_us') per_diff_words = ((100.0 * difficult_words(text, language)) / lexicon_count(text)) + 5 grade = 0.4 * (avg_sentence_length(text) + per_diff_words) grade.round(2) rescue ZeroDivisionError 0.0 end |
#linsear_write_formula(text, language = 'en_us') ⇒ Float
Calculate Linsear Write Formula
This formula is designed for technical writing and focuses on the percentage of words with three or more syllables.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/textstat/readability_formulas.rb', line 140 def linsear_write_formula(text, language = 'en_us') easy_word = 0 difficult_word = 0 text_list = text.split[0..100] text_list.each do |word| if syllable_count(word, language) < 3 easy_word += 1 else difficult_word += 1 end end text = text_list.join(' ') number = ((easy_word * 1) + (difficult_word * 3)).to_f / sentence_count(text) number -= 2 if number <= 20 number / 2 end |
#lix(text) ⇒ Float
Calculate LIX Readability Formula
LIX (Läsbarhetsindex) is a Swedish readability formula that works well for multiple languages. It uses sentence length and percentage of long words.
210 211 212 213 214 215 216 217 218 219 |
# File 'lib/textstat/readability_formulas.rb', line 210 def lix(text) words = text.split words_length = words.length long_words = words.count { |word| word.length > 6 } per_long_words = (100.0 * long_words) / words_length asl = avg_sentence_length(text) lix = asl + per_long_words lix.round(2) end |
#powers_sumner_kearl(text, language = 'en_us') ⇒ Float
Calculate Powers-Sumner-Kearl Readability Formula
This formula was developed for primary-grade reading materials and uses sentence length and syllable count to determine grade level.
249 250 251 252 |
# File 'lib/textstat/readability_formulas.rb', line 249 def powers_sumner_kearl(text, language = 'en_us') grade = (0.0778 * avg_sentence_length(text)) + (0.0455 * syllable_count(text, language)) - 2.2029 grade.round(2) end |
#smog_index(text, language = 'en_us') ⇒ Float
Calculate SMOG Index (Simple Measure of Gobbledygook)
SMOG estimates the years of education needed to understand a text. It focuses on polysyllabic words and is particularly useful for health and educational materials.
77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/textstat/readability_formulas.rb', line 77 def smog_index(text, language = 'en_us') sentences = sentence_count(text) if sentences >= 3 polysyllab = polysyllab_count(text, language) smog = (1.043 * Math.sqrt((30.0 * polysyllab) / sentences)) + 3.1291 smog.round(1) else 0.0 end rescue ZeroDivisionError 0.0 end |
#spache(text, language = 'en_us') ⇒ Float
Calculate SPACHE Readability Formula
The SPACHE formula is designed for primary-grade reading materials (grades 1-4) and uses a list of familiar words for analysis.
264 265 266 267 268 269 |
# File 'lib/textstat/readability_formulas.rb', line 264 def spache(text, language = 'en_us') words = text.split.count unfamiliar_words = difficult_words(text, language) / words grade = (0.141 * avg_sentence_length(text)) + (0.086 * unfamiliar_words) + 0.839 grade.round(2) end |
#text_standard(text, float_output = nil) ⇒ String, Float
Calculate consensus text standard from multiple formulas
This method combines results from multiple readability formulas to provide a consensus grade level recommendation. It’s more reliable than using a single formula alone.
283 284 285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/textstat/readability_formulas.rb', line 283 def text_standard(text, float_output = nil) grade = [] # Collect grades from all formulas add_flesch_kincaid_grades(text, grade) add_flesch_reading_ease_grade(text, grade) add_other_readability_grades(text, grade) # Find consensus grade final_grade = calculate_consensus_grade(grade) format_grade_output(final_grade, float_output) end |