Class: Obfuscator::Naturalizer

Inherits:
Object
  • Object
show all
Includes:
Constants, Internal::RNG
Defined in:
lib/obfuscator/naturalizer.rb

Overview

A class responsible for naturalizing words by applying linguistic rules to make them more readable and natural-looking while preserving their structure.

The naturalizer applies several rules to improve readability:

  1. No soft/hard signs (ь/ъ) after Latin letters

  2. No щ after w/th combinations

  3. No й after consonants

  4. No triple consonants (inserts appropriate vowel)

  5. Handles impossible letter combinations

  6. No double vowels

  7. Special handling for ё, ю, я after consonants

  8. Applies appropriate language-specific endings for longer words

Examples:

Basic usage

naturalizer = Naturalizer.new
naturalizer.naturalize("Thщит") # => "Thкит"

With seed for reproducible results

naturalizer = Naturalizer.new(12345)
naturalizer.naturalize("Thщит") # => Same result for same seed

See Also:

Constant Summary

Constants included from Constants

Constants::ENGLISH_CONSONANTS, Constants::ENGLISH_ENDINGS, Constants::ENGLISH_VOWELS, Constants::IMPOSSIBLE_COMBINATIONS, Constants::RUSSIAN_CONSONANTS, Constants::RUSSIAN_ENDINGS, Constants::RUSSIAN_VOWELS

Instance Method Summary collapse

Constructor Details

#initialize(seed = nil) ⇒ Naturalizer



35
36
37
38
# File 'lib/obfuscator/naturalizer.rb', line 35

def initialize(seed = nil)
  @seed = seed # Store the seed
  setup_rng(seed)
end

Instance Method Details

#naturalize(word) ⇒ Object

rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/MethodLength,Metrics/PerceivedComplexity



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# File 'lib/obfuscator/naturalizer.rb', line 41

def naturalize(word)
  # Reset RNG state before each naturalization if seed was provided
  setup_rng(@seed) if @seed

  return word unless word.respond_to?(:to_s)
  return word if word.length < 2

  begin
    chars = word.chars
    result = []

    chars.each_with_index do |char, i|
      next_char = chars[i + 1]

      if next_char.nil?
        result << char
        next
      end

      # Rule 1: No ь/ъ after Latin letters
      soft_hard_signs = %w[ь ъ]
      if latin?(char) && soft_hard_signs.include?(next_char)
        chars[i + 1] = random_sample(RUSSIAN_CONSONANTS.reject { |c| soft_hard_signs.include?(c) })
      end

      # Rule 2: No щ after w/th
      if (char == 'w' || (i.positive? && chars[i - 1] == 't' && char == 'h')) && next_char == 'щ'
        chars[i + 1] = random_sample(RUSSIAN_CONSONANTS - ['щ'])
      end

      # Rule 3: No й after consonants
      chars[i + 1] = random_sample(RUSSIAN_CONSONANTS - ['й']) if consonant?(char) && next_char == 'й'

      # Rule 4: No triple consonants
      if i < chars.length - 2 &&
         consonant?(char) &&
         consonant?(next_char) &&
         consonant?(chars[i + 2])
        chars[i + 1] = if cyrillic?(next_char)
                         random_sample(RUSSIAN_VOWELS)
                       else
                         random_sample(ENGLISH_VOWELS)
                       end
      end

      # Rule 5: Handle impossible combinations
      current_pair = char + next_char
      if IMPOSSIBLE_COMBINATIONS.any? { |combo| current_pair.include?(combo) }
        chars[i + 1] = if cyrillic?(next_char)
                         random_sample(RUSSIAN_CONSONANTS)
                       else
                         random_sample(ENGLISH_CONSONANTS)
                       end
      end

      # Rule 6: No double vowels
      if vowel?(char) && vowel?(next_char)
        chars[i + 1] = if cyrillic?(next_char)
                         random_sample(RUSSIAN_CONSONANTS)
                       else
                         random_sample(ENGLISH_CONSONANTS)
                       end
      end

      # Rule 7: Handle ё, ю, я after consonants
      # This rule is a special case of Rule 5
      soft_vowels = %w[ё ю я]
      if consonant?(char) && soft_vowels.include?(next_char)
        chars[i + 1] = random_sample(RUSSIAN_VOWELS - soft_vowels)
      end

      result << char
    rescue StandardError => e
      raise Error, "Naturalization error for '#{word}': #{e.message}"
    end
  end

  # Rule 8: Apply appropriate ending if word is long enough
  final_word = result.join
  if final_word.length > 4
    if mostly_russian?(final_word)
      apply_russian_ending(final_word)
    elsif mostly_english?(final_word)
      apply_english_ending(final_word)
    else
      final_word
    end
  else
    final_word
  end
end