Class: Spurline::Security::InjectionScanner

Inherits:
Object
  • Object
show all
Defined in:
lib/spurline/security/injection_scanner.rb

Overview

Scans Content objects for prompt injection patterns. Configurable strictness: :strict, :moderate, :permissive.

Only scans content at trust levels that could be injected (:user, :external, :untrusted). System and operator content is trusted by definition and bypasses scanning.

Pattern tiers are additive: :strict includes all :moderate patterns, :moderate includes all :permissive (BASE) patterns.

Constant Summary collapse

SKIP_TRUST_LEVELS =
i[system operator].freeze
BASE_PATTERNS =

Patterns checked at all strictness levels — the most obvious injection attempts.

[
  /ignore\s+(all\s+)?(previous|prior|above|earlier)\s+(instructions|prompts|context|rules)/i,
  /you\s+are\s+now\s+(a|an|in)\s+/i,
  /\bsystem\s*:\s*\n/i,
  /\bforget\s+(all\s+|everything\s+)?(previous|prior|your)\s+(instructions|context|rules|training)/i,
  /\bdisregard\s+(all\s+)?(previous|prior|above|your)\s+(instructions|prompts|rules)/i,
  /\bnew\s+instructions\s*:/i,
  /\bpretend\s+(you\s+are|to\s+be|that\s+you)/i,
].freeze
MODERATE_PATTERNS =

Additional patterns for :moderate and :strict — social engineering and role manipulation.

[
  /\bdo\s+not\s+follow\b/i,
  /\boverride\s+(your|the)\s+(instructions|rules|guidelines|programming)\b/i,
  /\bact\s+as\s+(if\s+you\s+are|though\s+you|a\b)/i,
  /\bbehave\s+as\s+(if|though|a\b)/i,
  /\bfrom\s+now\s+on\s*,?\s*(you|your|act|behave|respond|ignore)/i,
  /\bjailbreak/i,
  /\bdeveloper\s+mode\b/i,
  /\bDAN\s+(mode|prompt)\b/i,
  /\bdo\s+anything\s+now\b/i,
  /\bunfiltered\s+(mode|response|output)\b/i,
  /\bno\s+(restrictions|rules|limitations|filters|censorship)\b/i,
  /\bbypass\s+(your|the|any|all)\s+(restrictions|rules|filters|safety|guidelines)/i,
].freeze
STRICT_PATTERNS =

Additional patterns for :strict only — structural attacks and format manipulation.

[
  /\brole\s*:\s*(system|assistant)\b/i,
  /<\/?system>/i,
  /\[INST\]/i,
  /<<\s*SYS\s*>>/i,
  /<\|im_start\|>/i,
  /\bIMPORTANT\s*:\s*(new|override|ignore|forget|disregard|update)/i,
  /\bATTENTION\s*:\s*(new|override|ignore|forget|disregard|update)/i,
  /\b(BEGIN|END)\s+(SYSTEM|INSTRUCTION|PROMPT)\b/i,
  /---+\s*\n\s*(system|instruction|new prompt|override)/i,
  /\bbase64\s*[\s:]+[A-Za-z0-9+\/=]{20,}/i,
  /\btranslate\s+(the\s+)?(following|this)\s+(from|to)\s+.*\s+(and|then)\s+(ignore|forget|override)/i,
  /\brepeat\s+(the\s+)?(system\s+prompt|instructions|your\s+rules)/i,
  /\b(reveal|show|display|output|print)\s+(your|the)\s+(system\s+prompt|instructions|rules)/i,
].freeze
LEVELS =
i[strict moderate permissive].freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(level: :strict) ⇒ InjectionScanner

Returns a new instance of InjectionScanner.



64
65
66
67
# File 'lib/spurline/security/injection_scanner.rb', line 64

def initialize(level: :strict)
  validate_level!(level)
  @level = level
end

Instance Attribute Details

#levelObject (readonly)

Returns the value of attribute level.



62
63
64
# File 'lib/spurline/security/injection_scanner.rb', line 62

def level
  @level
end

Instance Method Details

#scan!(content) ⇒ Object

Scans a Content object for injection patterns. Returns nil if clean, raises InjectionAttemptError if detected.



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/spurline/security/injection_scanner.rb', line 71

def scan!(content)
  return if SKIP_TRUST_LEVELS.include?(content.trust)

  text = content.text
  patterns_for_level.each do |pattern|
    next unless text.match?(pattern)

    raise Spurline::InjectionAttemptError,
      "Injection pattern detected in content (trust: #{content.trust}, " \
      "source: #{content.source}). Pattern: #{pattern.source[0..40]}. " \
      "Review the content or adjust injection_filter level."
  end

  nil
end