TouringTest

AI-Powered Natural Language Testing for Cucumber

Ruby License: MIT

TouringTest is a Ruby gem that integrates Google's Gemini "computer use" AI model with Cucumber testing framework. Write high-level, natural language test instructions and watch as an AI agent executes them by analyzing screenshots and performing browser actions via Capybara.

Status: ⚠️ Experimental - relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025)


What is TouringTest?

Traditional Cucumber tests require writing step definitions that use brittle CSS selectors and detailed browser automation logic. TouringTest flips this model:

# Traditional approach
When('I sign up with email and password') do
  visit 
  fill_in 'user[email]', with: '[email protected]'
  fill_in 'user[password]', with: 'password123'
  click_button 'Sign Up'
end

# TouringTest approach
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

# In your feature file:
When the agent "signs up with email '[email protected]' and password 'password123'"

The AI agent:

  1. Takes a screenshot of the current page
  2. Analyzes it to understand the UI layout
  3. Determines what actions to take (click fields, type text, submit forms)
  4. Executes those actions via Capybara
  5. Repeats until the goal is achieved

Benefits:

  • More resilient tests - No brittle CSS selectors that break when markup changes
  • Usability testing - Tests reflect real user interactions
  • Faster test writing - Describe what you want, not how to do it
  • Better readability - Tests read like user stories
  • Self-healing - AI adapts to UI changes automatically

Quick Start

Prerequisites

Installation

# Add to your Gemfile
gem 'touring_test'

# Install
bundle install

# Set your API key
export GEMINI_API_KEY='your_api_key_here'

Minimal Example

# features/support/env.rb
require 'touring_test'
require 'capybara'

Capybara.default_driver = :selenium_chrome_headless
World(TouringTest::WorldExtension)

# features/step_definitions/agent_steps.rb
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

# features/login.feature
Feature: User 
  Scenario: Successful 
    Given I am on the  page
    When the agent "logs in with username 'admin' and password 'secret'"
    Then I should see the dashboard

Example Output

Here's what TouringTest looks like in action:

TouringTest Example Output

The AI agent narrates its actions in real-time, showing:

  • Each step it evaluates
  • The actions it takes (click_at, type_text_at, etc.)
  • Its reasoning about what to do next
  • Final success confirmation

Installation (Detailed)

1. Add the Gem

Add to your Gemfile:

gem 'touring_test'

Or install directly:

gem install touring_test

2. Set Up API Key

Get a Gemini API key from Google AI Studio and set it as an environment variable:

# In your shell or .env file
export GEMINI_API_KEY='your_api_key_here'

3. Configure Cucumber

Add the following to your features/support/env.rb:

require 'touring_test'
require 'capybara'

# Configure your Capybara driver (Selenium, Playwright, etc.)
Capybara.default_driver = :selenium_chrome_headless

# Add TouringTest's WorldExtension to Cucumber
World(TouringTest::WorldExtension)

If you're using Rails, you may need to create this file if it doesn't exist yet (usually generated by rails generate cucumber:install).


Usage

Basic Usage

The core of TouringTest is the computer_use method, which accepts a natural language instruction:

# In your step definitions
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

Now you can write Cucumber scenarios like:

Scenario: User creates an account
  Given I am on the homepage
  When the agent "clicks on Sign Up and creates an account with email '[email protected]'"
  Then I should see "Welcome!"

Writing Effective Natural Language Instructions

Good instructions:

  • Be specific about the goal: "sign up with email '[email protected]' and password 'password123'"
  • Include exact text when important: "click the blue 'Submit' button"
  • Break complex tasks into steps if needed

Less effective:

  • Too vague: "do the signup thing"
  • Missing critical data: "sign up with some credentials"
  • Overly complex: "navigate through multiple pages and fill out everything"

Available UI Actions

The AI agent can perform these 11 browser actions:

Action Description Example Use
click_at(x:, y:) Click element at coordinates Clicking buttons, links
type_text_at(x:, y:, text:) Type in an input field Filling forms
hover_at(x:, y:) Hover over element Revealing dropdowns
scroll_document(direction:) Scroll entire page UP, DOWN, LEFT, RIGHT
scroll_at(x:, y:, direction:) Scroll specific element Scrollable divs
drag_and_drop(start_x:, start_y:, end_x:, end_y:) Drag element Reordering lists
navigate(url:) Go to URL Changing pages
go_back() Browser back button Navigation
go_forward() Browser forward button Navigation
wait_5_seconds() Explicit wait Slow loading
key_combination(keys:) Keyboard shortcuts "enter", "ctrl+a"

The agent automatically chooses which actions to use based on its analysis of your instruction and the page screenshot.

Configuration Options

# Default usage (screenshots saved to current directory)
computer_use("sign up with email '[email protected]'")

# Custom root path for screenshots
computer_use(
  "sign up with email '[email protected]'",
  root_path: Rails.root
)

Screenshots & Debugging

TouringTest automatically captures screenshots at each step:

  • Location: {root_path}/tmp/screenshots/
  • Naming: step_1.png, step_2.png, etc.
  • Cleared: At the start of each test run

API Logs:

  • Full request/response JSON logged to tmp/gemini_api_log.jsonl
  • Useful for debugging API issues or understanding agent decisions

Console Output:

  • Shows each instruction sent to the agent
  • Displays actions taken (e.g., "click_at(x: 450, y: 320)")
  • Reports success or failure

How It Works

Architecture

TouringTest uses a clean three-layer architecture:

Conversation Flow

1. Initial Turn:
   User: "sign up with email '[email protected]' and password 'password123'"
   + Base64 screenshot of current page
   + Current URL
   

Coordinate System

Gemini returns normalized coordinates in a 0-1000 range. TouringTest converts these to pixel coordinates:

  • API sends: {x: 500, y: 250} (middle of screen on 1000-unit scale)
  • Driver converts: (500 / 1000.0) * screenshot_width → pixel position

Critical Detail: Coordinates are denormalized using screenshot dimensions, not window size, to handle HiDPI/Retina displays correctly. On a 2x display:

  • Window size: 1512×834
  • Screenshot size: 756×417
  • Agent analyzes the 756×417 screenshot, so coordinates must match those dimensions

Step Limit

To prevent infinite loops from AI hallucination or impossible tasks:

  • Default: 15 steps maximum
  • Configurable: Pass max_steps to Agent (for advanced usage)
  • Exception raised: If limit exceeded

Example: Real-World Test

Here's a complete example from the test app included in this gem:

Feature file (features/sign_up.feature):

Feature: Sign up
  Scenario: User signs up with email and password
    Given I am on the sign up page
    When the agent "signs up with the email address '[email protected]' and password 'password123'"
    Then I should be signed in

Step definitions (features/step_definitions/sign_up_steps.rb):

Given('I am on the sign up page') do
  visit 
end

When('the agent {string}') do |instruction|
  computer_use(instruction, root_path: Rails.root)
end

Then('I should be signed in') do
  expect(page).to have_content('Welcome')
end

What the AI does:

  1. Analyzes screenshot of sign-up form
  2. Identifies email input field coordinates
  3. Clicks email field: click_at(x: 450, y: 280)
  4. Types email: type_text_at(x: 450, y: 280, text: "[email protected]")
  5. Identifies password field
  6. Clicks password field: click_at(x: 450, y: 350)
  7. Types password: type_text_at(x: 450, y: 350, text: "password123")
  8. Finds Submit button
  9. Clicks Submit: click_at(x: 500, y: 420)
  10. Mission accomplished!

File Structure & Architecture

TouringTest follows standard Ruby gem conventions:

touring_test/

Core Components

  1. Agent (lib/touring_test/agent.rb)

    • Orchestrates conversation with Gemini API
    • Maintains conversation history during execution
    • Enforces max step limit (default: 15)
    • Logs full API interactions to tmp/gemini_api_log.jsonl
  2. Driver (lib/touring_test/driver.rb)

    • Wraps Capybara session with AI-friendly interface
    • Handles coordinate denormalization (0-1000 → pixels)
    • Executes 11 different UI actions
    • Manages screenshot capture
  3. WorldExtension (lib/touring_test/world_extension.rb)

    • Provides computer_use() method to Cucumber World
    • Bridges step definitions to Agent/Driver
  4. Railtie (lib/touring_test/railtie.rb)

    • Automatic Rails integration
    • Generates support files
    • Zero-config experience

Test App (Non-Standard)

The spec/test_app/ directory contains a complete Rails 7.1.2 application for integration testing. This is unusual for a gem (most use minimal fixtures), but valuable for demonstrating end-to-end functionality.


API & Configuration

Gemini API Requirements

  • Model: gemini-2.5-computer-use-preview-10-2025
  • Endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
  • Authentication: API key via query parameter: ?key={GEMINI_API_KEY}
  • Required Tool Specification: json { "computer_use": { "environment": "ENVIRONMENT_BROWSER" } }

Environment Variables

  • GEMINI_API_KEY (required): Your Google API key for Gemini access

Get your API key: https://aistudio.google.com/apikey

API Request Format

The Agent sends multi-turn conversations to Gemini:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {"text": "sign up with email '[email protected]'"},
        {"inline_data": {"mime_type": "image/png", "data": "base64..."}},
        {"text": "Current URL: http://localhost:3000/sign_up"}
      ]
    },
    {
      "role": "model",
      "parts": [
        {"functionCall": {"name": "click_at", "args": {"x": 450, "y": 280}}}
      ]
    },
    {
      "role": "user",
      "parts": [
        {"functionResponse": {"name": "click_at", "response": {"success": true}}},
        {"inline_data": {"mime_type": "image/png", "data": "base64..."}}
      ]
    }
  ],
  "tools": [{"computer_use": {"environment": "ENVIRONMENT_BROWSER"}}]
}

Development

Running Tests

# Unit tests (RSpec) - default rake task
bundle exec rake
# or
bundle exec rake spec

# Integration tests (Cucumber features in test app)
cd spec/test_app
bundle install
bundle exec cucumber

# Run specific feature
bundle exec cucumber features/sign_up.feature

Interactive Console

# Opens IRB with the gem loaded
bin/console

# Experiment with the gem
> require 'touring_test'
> driver = TouringTest::Driver.new(session, root_path: Dir.pwd)
> agent = TouringTest::Agent.new(driver, "click the button")

Building and Installing Locally

# Build the gem
bundle exec rake build

# Install locally
bundle exec rake install

# Release (requires RubyGems permissions)
bundle exec rake release

Testing the Test App

The spec/test_app/ directory contains a complete Rails application for testing TouringTest end-to-end.

Test App Structure

spec/test_app/

Running the Test App

cd spec/test_app

# Install dependencies
bundle install

# Run Cucumber features
bundle exec cucumber

# Start Rails server (for manual testing)
bundle exec rails server

# Rails console
bundle exec rails console

Test App Configuration

  • Ruby: 3.4.5
  • Rails: 7.1.2
  • Database: SQLite3
  • Capybara Driver: selenium_chrome_headless
  • Database Cleaner: :truncation strategy (for JavaScript tests)

Troubleshooting

Missing API Key

Error: "GEMINI_API_KEY environment variable not set"

Solution:

export GEMINI_API_KEY='your_api_key_here'

Or add to .env file if using dotenv:

GEMINI_API_KEY=your_api_key_here

Coordinate Misalignment (Clicks Wrong Location)

Symptom: Agent clicks in wrong places on the page

Cause: HiDPI/Retina display coordinate mismatch

Solution: TouringTest automatically handles this by extracting screenshot dimensions. If issues persist:

  1. Check tmp/screenshots/ to see what the AI sees
  2. Verify Capybara driver supports screenshot capture
  3. Check console output for coordinate denormalization debug info

Max Steps Exceeded

Error: "Agent exceeded maximum steps (15)"

Cause: Task too complex, AI stuck in loop, or impossible task

Solutions:

  1. Break instruction into smaller steps
  2. Make instruction more specific
  3. Check screenshots to see where agent got stuck
  4. For advanced usage, increase max_steps when creating Agent

Screenshot Directory Permission Issues

Error: Can't write to tmp/screenshots/

Solution:

mkdir -p tmp/screenshots
chmod 755 tmp/screenshots

Or specify a different root_path:

computer_use(instruction, root_path: '/path/with/permissions')

Agent Can't Find Elements

Symptom: "Warning: No element found at (x, y)"

Possible causes:

  1. Element not visible (hidden, off-screen)
  2. JavaScript not finished loading
  3. Element inside iframe (not currently supported)

Solutions:

  • Add explicit wait steps: "wait for the page to load, then click submit"
  • Ensure elements are visible: page.execute_script("window.scrollTo(0, 0)")
  • Check screenshots to verify element visibility

Limitations & Known Issues

Experimental API

TouringTest relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025):

  • May change without notice
  • No SLA or production guarantees
  • Rate limits apply

Step Limit Constraints

  • Default 15 steps may be insufficient for complex workflows
  • No dynamic adjustment based on task complexity
  • Manual tuning required for edge cases

Performance Considerations

  • Each step requires API call + screenshot capture (1-3 seconds)
  • Long tests can be slow (15 steps ≈ 30-45 seconds)
  • Not suitable for load testing or CI pipelines with strict time limits

HiDPI/Retina Display Requirements

  • Coordinate system assumes screenshot capture works correctly
  • Issues may occur on exotic display configurations
  • Tested primarily on macOS Retina displays

Iframes Not Supported

  • Agent cannot interact with elements inside iframes
  • Workaround: Use traditional Capybara within_frame blocks

No Multi-Tab/Window Support

  • Agent operates on single Capybara session
  • Cannot switch between tabs/windows automatically

Roadmap / Future Plans

  • [ ] Support for additional Gemini models
  • [ ] Configurable step limits per instruction
  • [ ] Iframe interaction support
  • [ ] Multi-tab/window handling
  • [ ] Performance optimizations (screenshot caching, parallel API calls)
  • [ ] Alternative AI providers (OpenAI, Anthropic)
  • [ ] Visual regression testing mode
  • [ ] Accessibility testing integration
  • [ ] Record/replay functionality

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/stwerner92/touring_test.

Development Setup

  1. Clone the repository
  2. Run bin/setup to install dependencies
  3. Run rake spec to run unit tests
  4. Run cd spec/test_app && bundle exec cucumber for integration tests

Pull Request Guidelines

  • Add tests for new functionality
  • Update README for user-facing changes
  • Follow existing code style
  • Keep commits focused and atomic

License

The gem is available as open source under the terms of the MIT License.


Credits & Acknowledgments

Author: Scott Werner ([email protected])

Powered by:

Inspired by: Anthropic's computer use demo and the vision of more maintainable, human-readable tests.


Support


Made with ❤️ for better testing experiences