Module: TextStat::ReadabilityFormulas

Included in:: Main

Defined in:: lib/textstat/readability_formulas.rb

Overview

Readability formulas and text difficulty calculations

This module implements various readability formulas used to determine the reading level and complexity of text. Each formula uses different metrics and is suitable for different types of content and audiences.

Examples:

Basic readability analysis

text = "This is a sample text for readability analysis."
TextStat.flesch_reading_ease(text)      # => 83.32
TextStat.flesch_kincaid_grade(text)     # => 3.7
TextStat.text_standard(text)            # => "3rd and 4th grade"

Multi-language support

TextStat.flesch_reading_ease(spanish_text, 'es')
TextStat.smog_index(french_text, 'fr')
TextStat.gunning_fog(german_text, 'de')

Author:

Jakub Polak

Since:

1.0.0

Instance Method Summary collapse

#automated_readability_index(text) ⇒ Float

Calculate Automated Readability Index (ARI).
#coleman_liau_index(text) ⇒ Float

Calculate Coleman-Liau Index.
#dale_chall_readability_score(text, language = 'en_us') ⇒ Float

Calculate Dale-Chall Readability Score.
#flesch_kincaid_grade(text, language = 'en_us') ⇒ Float

Calculate Flesch-Kincaid Grade Level.
#flesch_reading_ease(text, language = 'en_us') ⇒ Float

Calculate Flesch Reading Ease score.
#forcast(text, language = 'en_us') ⇒ Integer

Calculate FORCAST Readability Formula.
#gunning_fog(text, language = 'en_us') ⇒ Float

Calculate Gunning Fog Index.
#linsear_write_formula(text, language = 'en_us') ⇒ Float

Calculate Linsear Write Formula.
#lix(text) ⇒ Float

Calculate LIX Readability Formula.
#powers_sumner_kearl(text, language = 'en_us') ⇒ Float

Calculate Powers-Sumner-Kearl Readability Formula.
#smog_index(text, language = 'en_us') ⇒ Float

Calculate SMOG Index (Simple Measure of Gobbledygook).
#spache(text, language = 'en_us') ⇒ Float

Calculate SPACHE Readability Formula.
#text_standard(text, float_output = nil) ⇒ String, Float

Calculate consensus text standard from multiple formulas.

Instance Method Details

#automated_readability_index(text) ⇒ `Float`

Calculate Automated Readability Index (ARI)

ARI uses character counts and word lengths to estimate readability. It’s designed to be easily calculated by computer programs.

Examples:

TextStat.automated_readability_index("This text is easy to read.")  # => 2.9

Parameters:

text (String) —

the text to analyze

Returns:

(Float) —

ARI grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 117

def automated_readability_index(text)
  chars = char_count(text)
  words = lexicon_count(text)
  sentences = sentence_count(text)

  a = chars.to_f / words
  b = words.to_f / sentences
  readability = (4.71 * a) + (0.5 * b) - 21.43
  readability.round(1)
rescue ZeroDivisionError
  0.0
end

#coleman_liau_index(text) ⇒ `Float`

Calculate Coleman-Liau Index

This formula relies on character counts instead of syllable counts, making it more suitable for automated analysis. It estimates the U.S. grade level required to understand the text.

Examples:

TextStat.coleman_liau_index("Short words are easy to read.")  # => 4.71

Parameters:

text (String) —

the text to analyze

Returns:

(Float) —

Coleman-Liau grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 101

def coleman_liau_index(text)
  letters = (avg_letter_per_word(text) * 100).round(2)
  sentences = (avg_sentence_per_word(text) * 100).round(2)
  coleman = (0.0588 * letters) - (0.296 * sentences) - 15.8
  coleman.round(2)
end

#dale_chall_readability_score(text, language = 'en_us') ⇒ `Float`

Calculate Dale-Chall Readability Score

This formula uses a list of 3000 familiar words to determine text difficulty. It’s particularly effective for elementary and middle school texts.

Examples:

TextStat.dale_chall_readability_score("Simple story for children.")  # => 5.12

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for dictionary selection

Returns:

(Float) —

Dale-Chall readability score

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 169

def dale_chall_readability_score(text, language = 'en_us')
  word_count = lexicon_count(text)
  count = word_count - difficult_words(text, language)

  per = (100.0 * count) / word_count
  difficult_words_percentage = 100 - per
  score = (0.1579 * difficult_words_percentage) + (0.0496 * avg_sentence_length(text))
  score += 3.6365 if difficult_words_percentage > 5

  score.round(2)
rescue ZeroDivisionError
  0.0
end

#flesch_kincaid_grade(text, language = 'en_us') ⇒ `Float`

Calculate Flesch-Kincaid Grade Level

This formula converts the Flesch Reading Ease score into a U.S. grade level, making it easier to understand the education level required to comprehend the text.

Examples:

TextStat.flesch_kincaid_grade("Simple text.")      # => 2.1
TextStat.flesch_kincaid_grade("Complex analysis.") # => 5.8

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

grade level (e.g., 8.5 = 8th to 9th grade)

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 59

def flesch_kincaid_grade(text, language = 'en_us')
  sentence_length = avg_sentence_length(text)
  syllables_per_word = avg_syllables_per_word(text, language)
  flesch = (0.39 * sentence_length) + (11.8 * syllables_per_word) - 15.59
  flesch.round(1)
end

#flesch_reading_ease(text, language = 'en_us') ⇒ `Float`

Calculate Flesch Reading Ease score

The Flesch Reading Ease formula produces a score between 0 and 100, with higher scores indicating easier readability.

Score ranges:

90-100: Very Easy
80-89: Easy
70-79: Fairly Easy
60-69: Standard
50-59: Fairly Difficult
30-49: Difficult
0-29: Very Difficult

Examples:

TextStat.flesch_reading_ease("The cat sat on the mat.")  # => 116.15
TextStat.flesch_reading_ease("Comprehensive analysis.")  # => 43.73

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

Flesch Reading Ease score

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 41

def flesch_reading_ease(text, language = 'en_us')
  sentence_length = avg_sentence_length(text)
  syllables_per_word = avg_syllables_per_word(text, language)
  flesch = 206.835 - (1.015 * sentence_length) - (84.6 * syllables_per_word)
  flesch.round(2)
end

#forcast(text, language = 'en_us') ⇒ `Integer`

Calculate FORCAST Readability Formula

FORCAST (FOg Readability by CASTing) is designed for technical materials and focuses on single-syllable words to determine readability.

Examples:

TextStat.forcast("Technical manual instructions.")  # => 11

Parameters:

text (String) —

the text to analyze (uses first 150 words)
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Integer) —

FORCAST grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 231

def forcast(text, language = 'en_us')
  words = text.split[0..149]
  words_with_one_syllabe = words.count do |word|
    syllable_count(word, language) == 1
  end
  20 - (words_with_one_syllabe / 10)
end

#gunning_fog(text, language = 'en_us') ⇒ `Float`

Calculate Gunning Fog Index

The Fog Index estimates the years of formal education needed to understand the text. It focuses on sentence length and polysyllabic words.

Examples:

TextStat.gunning_fog("Business communication analysis.")  # => 12.3

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

Gunning Fog grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 193

def gunning_fog(text, language = 'en_us')
  per_diff_words = ((100.0 * difficult_words(text, language)) / lexicon_count(text)) + 5
  grade = 0.4 * (avg_sentence_length(text) + per_diff_words)
  grade.round(2)
rescue ZeroDivisionError
  0.0
end

#linsear_write_formula(text, language = 'en_us') ⇒ `Float`

Calculate Linsear Write Formula

This formula is designed for technical writing and focuses on the percentage of words with three or more syllables.

Examples:

TextStat.linsear_write_formula("Technical documentation analysis.")  # => 6.5

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

Linsear Write grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 140

def linsear_write_formula(text, language = 'en_us')
  easy_word = 0
  difficult_word = 0
  text_list = text.split[0..100]

  text_list.each do |word|
    if syllable_count(word, language) < 3
      easy_word += 1
    else
      difficult_word += 1
    end
  end

  text = text_list.join(' ')
  number = ((easy_word * 1) + (difficult_word * 3)).to_f / sentence_count(text)
  number -= 2 if number <= 20
  number / 2
end

#lix(text) ⇒ `Float`

Calculate LIX Readability Formula

LIX (Läsbarhetsindex) is a Swedish readability formula that works well for multiple languages. It uses sentence length and percentage of long words.

Examples:

TextStat.lix("International readability measurement.")  # => 45.2

Parameters:

text (String) —

the text to analyze

Returns:

(Float) —

LIX readability score

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 210

def lix(text)
  words = text.split
  words_length = words.length
  long_words = words.count { |word| word.length > 6 }

  per_long_words = (100.0 * long_words) / words_length
  asl = avg_sentence_length(text)
  lix = asl + per_long_words
  lix.round(2)
end

#powers_sumner_kearl(text, language = 'en_us') ⇒ `Float`

Calculate Powers-Sumner-Kearl Readability Formula

This formula was developed for primary-grade reading materials and uses sentence length and syllable count to determine grade level.

Examples:

TextStat.powers_sumner_kearl("Elementary school reading material.")  # => 4.2

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

Powers-Sumner-Kearl grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 249

def powers_sumner_kearl(text, language = 'en_us')
  grade = (0.0778 * avg_sentence_length(text)) + (0.0455 * syllable_count(text, language)) - 2.2029
  grade.round(2)
end

#smog_index(text, language = 'en_us') ⇒ `Float`

Calculate SMOG Index (Simple Measure of Gobbledygook)

SMOG estimates the years of education needed to understand a text. It focuses on polysyllabic words and is particularly useful for health and educational materials.

Examples:

TextStat.smog_index("The quick brown fox jumps. It is fast. Very agile.")  # => 8.2

Parameters:

text (String) —

the text to analyze (minimum 3 sentences)
language (String) (defaults to: 'en_us') —

language code for syllable counting

Returns:

(Float) —

SMOG grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 77

def smog_index(text, language = 'en_us')
  sentences = sentence_count(text)

  if sentences >= 3
    polysyllab = polysyllab_count(text, language)
    smog = (1.043 * Math.sqrt((30.0 * polysyllab) / sentences)) + 3.1291
    smog.round(1)
  else
    0.0
  end
rescue ZeroDivisionError
  0.0
end

#spache(text, language = 'en_us') ⇒ `Float`

Calculate SPACHE Readability Formula

The SPACHE formula is designed for primary-grade reading materials (grades 1-4) and uses a list of familiar words for analysis.

Examples:

TextStat.spache("Primary school reading text.")  # => 2.8

Parameters:

text (String) —

the text to analyze
language (String) (defaults to: 'en_us') —

language code for dictionary selection

Returns:

(Float) —

SPACHE grade level

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 264

def spache(text, language = 'en_us')
  words = text.split.count
  unfamiliar_words = difficult_words(text, language) / words
  grade = (0.141 * avg_sentence_length(text)) + (0.086 * unfamiliar_words) + 0.839
  grade.round(2)
end

#text_standard(text, float_output = nil) ⇒ `String`, `Float`

Calculate consensus text standard from multiple formulas

This method combines results from multiple readability formulas to provide a consensus grade level recommendation. It’s more reliable than using a single formula alone.

Examples:

TextStat.text_standard("Sample text for analysis.")        # => "5th and 6th grade"
TextStat.text_standard("Sample text for analysis.", true)  # => 5.0

Parameters:

text (String) —

the text to analyze
float_output (Boolean) (defaults to: nil) —

whether to return numeric grade or description

Returns:

(String, Float) —

grade level description or numeric value

Since:

1.0.0

# File 'lib/textstat/readability_formulas.rb', line 283

def text_standard(text, float_output = nil)
  grade = []

  # Collect grades from all formulas
  add_flesch_kincaid_grades(text, grade)
  add_flesch_reading_ease_grade(text, grade)
  add_other_readability_grades(text, grade)

  # Find consensus grade
  final_grade = calculate_consensus_grade(grade)

  format_grade_output(final_grade, float_output)
end

Module: TextStat::ReadabilityFormulas

Overview

Examples:

Basic readability analysis

Multi-language support

Instance Method Summary collapse

Instance Method Details

#automated_readability_index(text) ⇒ Float

Examples:

#coleman_liau_index(text) ⇒ Float

Examples:

#dale_chall_readability_score(text, language = 'en_us') ⇒ Float

Examples:

#flesch_kincaid_grade(text, language = 'en_us') ⇒ Float

Examples:

#flesch_reading_ease(text, language = 'en_us') ⇒ Float

Examples:

#forcast(text, language = 'en_us') ⇒ Integer

Examples:

#gunning_fog(text, language = 'en_us') ⇒ Float

Examples:

#linsear_write_formula(text, language = 'en_us') ⇒ Float

Examples:

#lix(text) ⇒ Float

Examples:

#powers_sumner_kearl(text, language = 'en_us') ⇒ Float

Examples:

#smog_index(text, language = 'en_us') ⇒ Float

Examples:

#spache(text, language = 'en_us') ⇒ Float

Examples:

#text_standard(text, float_output = nil) ⇒ String, Float

Examples:

#automated_readability_index(text) ⇒ `Float`

#coleman_liau_index(text) ⇒ `Float`

#dale_chall_readability_score(text, language = 'en_us') ⇒ `Float`

#flesch_kincaid_grade(text, language = 'en_us') ⇒ `Float`

#flesch_reading_ease(text, language = 'en_us') ⇒ `Float`

#forcast(text, language = 'en_us') ⇒ `Integer`

#gunning_fog(text, language = 'en_us') ⇒ `Float`

#linsear_write_formula(text, language = 'en_us') ⇒ `Float`

#lix(text) ⇒ `Float`

#powers_sumner_kearl(text, language = 'en_us') ⇒ `Float`

#smog_index(text, language = 'en_us') ⇒ `Float`

#spache(text, language = 'en_us') ⇒ `Float`

#text_standard(text, float_output = nil) ⇒ `String`, `Float`