Module: TextStat::ReadabilityFormulas

Included in:
Main
Defined in:
lib/textstat/readability_formulas.rb

Overview

Readability formulas and text difficulty calculations

This module implements various readability formulas used to determine the reading level and complexity of text. Each formula uses different metrics and is suitable for different types of content and audiences.

Examples:

Basic readability analysis

text = "This is a sample text for readability analysis."
TextStat.flesch_reading_ease(text)      # => 83.32
TextStat.flesch_kincaid_grade(text)     # => 3.7
TextStat.text_standard(text)            # => "3rd and 4th grade"

Multi-language support

TextStat.flesch_reading_ease(spanish_text, 'es')
TextStat.smog_index(french_text, 'fr')
TextStat.gunning_fog(german_text, 'de')

Author:

  • Jakub Polak

Since:

  • 1.0.0

Instance Method Summary collapse

Instance Method Details

#automated_readability_index(text) ⇒ Float

Calculate Automated Readability Index (ARI)

ARI uses character counts and word lengths to estimate readability. It’s designed to be easily calculated by computer programs.

Examples:

TextStat.automated_readability_index("This text is easy to read.")  # => 2.9

Parameters:

  • text (String)

    the text to analyze

Returns:

  • (Float)

    ARI grade level

Since:

  • 1.0.0



117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/textstat/readability_formulas.rb', line 117

def automated_readability_index(text)
  chars = char_count(text)
  words = lexicon_count(text)
  sentences = sentence_count(text)

  a = chars.to_f / words
  b = words.to_f / sentences
  readability = (4.71 * a) + (0.5 * b) - 21.43
  readability.round(1)
rescue ZeroDivisionError
  0.0
end

#coleman_liau_index(text) ⇒ Float

Calculate Coleman-Liau Index

This formula relies on character counts instead of syllable counts, making it more suitable for automated analysis. It estimates the U.S. grade level required to understand the text.

Examples:

TextStat.coleman_liau_index("Short words are easy to read.")  # => 4.71

Parameters:

  • text (String)

    the text to analyze

Returns:

  • (Float)

    Coleman-Liau grade level

Since:

  • 1.0.0



101
102
103
104
105
106
# File 'lib/textstat/readability_formulas.rb', line 101

def coleman_liau_index(text)
  letters = (avg_letter_per_word(text) * 100).round(2)
  sentences = (avg_sentence_per_word(text) * 100).round(2)
  coleman = (0.0588 * letters) - (0.296 * sentences) - 15.8
  coleman.round(2)
end

#dale_chall_readability_score(text, language = 'en_us') ⇒ Float

Calculate Dale-Chall Readability Score

This formula uses a list of 3000 familiar words to determine text difficulty. It’s particularly effective for elementary and middle school texts.

Examples:

TextStat.dale_chall_readability_score("Simple story for children.")  # => 5.12

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for dictionary selection

Returns:

  • (Float)

    Dale-Chall readability score

Since:

  • 1.0.0



169
170
171
172
173
174
175
176
177
178
179
180
181
# File 'lib/textstat/readability_formulas.rb', line 169

def dale_chall_readability_score(text, language = 'en_us')
  word_count = lexicon_count(text)
  count = word_count - difficult_words(text, language)

  per = (100.0 * count) / word_count
  difficult_words_percentage = 100 - per
  score = (0.1579 * difficult_words_percentage) + (0.0496 * avg_sentence_length(text))
  score += 3.6365 if difficult_words_percentage > 5

  score.round(2)
rescue ZeroDivisionError
  0.0
end

#flesch_kincaid_grade(text, language = 'en_us') ⇒ Float

Calculate Flesch-Kincaid Grade Level

This formula converts the Flesch Reading Ease score into a U.S. grade level, making it easier to understand the education level required to comprehend the text.

Examples:

TextStat.flesch_kincaid_grade("Simple text.")      # => 2.1
TextStat.flesch_kincaid_grade("Complex analysis.") # => 5.8

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    grade level (e.g., 8.5 = 8th to 9th grade)

Since:

  • 1.0.0



59
60
61
62
63
64
# File 'lib/textstat/readability_formulas.rb', line 59

def flesch_kincaid_grade(text, language = 'en_us')
  sentence_length = avg_sentence_length(text)
  syllables_per_word = avg_syllables_per_word(text, language)
  flesch = (0.39 * sentence_length) + (11.8 * syllables_per_word) - 15.59
  flesch.round(1)
end

#flesch_reading_ease(text, language = 'en_us') ⇒ Float

Calculate Flesch Reading Ease score

The Flesch Reading Ease formula produces a score between 0 and 100, with higher scores indicating easier readability.

Score ranges:

  • 90-100: Very Easy

  • 80-89: Easy

  • 70-79: Fairly Easy

  • 60-69: Standard

  • 50-59: Fairly Difficult

  • 30-49: Difficult

  • 0-29: Very Difficult

Examples:

TextStat.flesch_reading_ease("The cat sat on the mat.")  # => 116.15
TextStat.flesch_reading_ease("Comprehensive analysis.")  # => 43.73

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    Flesch Reading Ease score

Since:

  • 1.0.0



41
42
43
44
45
46
# File 'lib/textstat/readability_formulas.rb', line 41

def flesch_reading_ease(text, language = 'en_us')
  sentence_length = avg_sentence_length(text)
  syllables_per_word = avg_syllables_per_word(text, language)
  flesch = 206.835 - (1.015 * sentence_length) - (84.6 * syllables_per_word)
  flesch.round(2)
end

#forcast(text, language = 'en_us') ⇒ Integer

Calculate FORCAST Readability Formula

FORCAST (FOg Readability by CASTing) is designed for technical materials and focuses on single-syllable words to determine readability.

Examples:

TextStat.forcast("Technical manual instructions.")  # => 11

Parameters:

  • text (String)

    the text to analyze (uses first 150 words)

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Integer)

    FORCAST grade level

Since:

  • 1.0.0



231
232
233
234
235
236
237
# File 'lib/textstat/readability_formulas.rb', line 231

def forcast(text, language = 'en_us')
  words = text.split[0..149]
  words_with_one_syllabe = words.count do |word|
    syllable_count(word, language) == 1
  end
  20 - (words_with_one_syllabe / 10)
end

#gunning_fog(text, language = 'en_us') ⇒ Float

Calculate Gunning Fog Index

The Fog Index estimates the years of formal education needed to understand the text. It focuses on sentence length and polysyllabic words.

Examples:

TextStat.gunning_fog("Business communication analysis.")  # => 12.3

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    Gunning Fog grade level

Since:

  • 1.0.0



193
194
195
196
197
198
199
# File 'lib/textstat/readability_formulas.rb', line 193

def gunning_fog(text, language = 'en_us')
  per_diff_words = ((100.0 * difficult_words(text, language)) / lexicon_count(text)) + 5
  grade = 0.4 * (avg_sentence_length(text) + per_diff_words)
  grade.round(2)
rescue ZeroDivisionError
  0.0
end

#linsear_write_formula(text, language = 'en_us') ⇒ Float

Calculate Linsear Write Formula

This formula is designed for technical writing and focuses on the percentage of words with three or more syllables.

Examples:

TextStat.linsear_write_formula("Technical documentation analysis.")  # => 6.5

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    Linsear Write grade level

Since:

  • 1.0.0



140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/textstat/readability_formulas.rb', line 140

def linsear_write_formula(text, language = 'en_us')
  easy_word = 0
  difficult_word = 0
  text_list = text.split[0..100]

  text_list.each do |word|
    if syllable_count(word, language) < 3
      easy_word += 1
    else
      difficult_word += 1
    end
  end

  text = text_list.join(' ')
  number = ((easy_word * 1) + (difficult_word * 3)).to_f / sentence_count(text)
  number -= 2 if number <= 20
  number / 2
end

#lix(text) ⇒ Float

Calculate LIX Readability Formula

LIX (Läsbarhetsindex) is a Swedish readability formula that works well for multiple languages. It uses sentence length and percentage of long words.

Examples:

TextStat.lix("International readability measurement.")  # => 45.2

Parameters:

  • text (String)

    the text to analyze

Returns:

  • (Float)

    LIX readability score

Since:

  • 1.0.0



210
211
212
213
214
215
216
217
218
219
# File 'lib/textstat/readability_formulas.rb', line 210

def lix(text)
  words = text.split
  words_length = words.length
  long_words = words.count { |word| word.length > 6 }

  per_long_words = (100.0 * long_words) / words_length
  asl = avg_sentence_length(text)
  lix = asl + per_long_words
  lix.round(2)
end

#powers_sumner_kearl(text, language = 'en_us') ⇒ Float

Calculate Powers-Sumner-Kearl Readability Formula

This formula was developed for primary-grade reading materials and uses sentence length and syllable count to determine grade level.

Examples:

TextStat.powers_sumner_kearl("Elementary school reading material.")  # => 4.2

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    Powers-Sumner-Kearl grade level

Since:

  • 1.0.0



249
250
251
252
# File 'lib/textstat/readability_formulas.rb', line 249

def powers_sumner_kearl(text, language = 'en_us')
  grade = (0.0778 * avg_sentence_length(text)) + (0.0455 * syllable_count(text, language)) - 2.2029
  grade.round(2)
end

#smog_index(text, language = 'en_us') ⇒ Float

Calculate SMOG Index (Simple Measure of Gobbledygook)

SMOG estimates the years of education needed to understand a text. It focuses on polysyllabic words and is particularly useful for health and educational materials.

Examples:

TextStat.smog_index("The quick brown fox jumps. It is fast. Very agile.")  # => 8.2

Parameters:

  • text (String)

    the text to analyze (minimum 3 sentences)

  • language (String) (defaults to: 'en_us')

    language code for syllable counting

Returns:

  • (Float)

    SMOG grade level

Since:

  • 1.0.0



77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/textstat/readability_formulas.rb', line 77

def smog_index(text, language = 'en_us')
  sentences = sentence_count(text)

  if sentences >= 3
    polysyllab = polysyllab_count(text, language)
    smog = (1.043 * Math.sqrt((30.0 * polysyllab) / sentences)) + 3.1291
    smog.round(1)
  else
    0.0
  end
rescue ZeroDivisionError
  0.0
end

#spache(text, language = 'en_us') ⇒ Float

Calculate SPACHE Readability Formula

The SPACHE formula is designed for primary-grade reading materials (grades 1-4) and uses a list of familiar words for analysis.

Examples:

TextStat.spache("Primary school reading text.")  # => 2.8

Parameters:

  • text (String)

    the text to analyze

  • language (String) (defaults to: 'en_us')

    language code for dictionary selection

Returns:

  • (Float)

    SPACHE grade level

Since:

  • 1.0.0



264
265
266
267
268
269
# File 'lib/textstat/readability_formulas.rb', line 264

def spache(text, language = 'en_us')
  words = text.split.count
  unfamiliar_words = difficult_words(text, language) / words
  grade = (0.141 * avg_sentence_length(text)) + (0.086 * unfamiliar_words) + 0.839
  grade.round(2)
end

#text_standard(text, float_output = nil) ⇒ String, Float

Calculate consensus text standard from multiple formulas

This method combines results from multiple readability formulas to provide a consensus grade level recommendation. It’s more reliable than using a single formula alone.

Examples:

TextStat.text_standard("Sample text for analysis.")        # => "5th and 6th grade"
TextStat.text_standard("Sample text for analysis.", true)  # => 5.0

Parameters:

  • text (String)

    the text to analyze

  • float_output (Boolean) (defaults to: nil)

    whether to return numeric grade or description

Returns:

  • (String, Float)

    grade level description or numeric value

Since:

  • 1.0.0



283
284
285
286
287
288
289
290
291
292
293
294
295
# File 'lib/textstat/readability_formulas.rb', line 283

def text_standard(text, float_output = nil)
  grade = []

  # Collect grades from all formulas
  add_flesch_kincaid_grades(text, grade)
  add_flesch_reading_ease_grade(text, grade)
  add_other_readability_grades(text, grade)

  # Find consensus grade
  final_grade = calculate_consensus_grade(grade)

  format_grade_output(final_grade, float_output)
end