TouringTest

AI-Powered Natural Language Testing for Cucumber

TouringTest is a Ruby gem that integrates Google's Gemini "computer use" AI model with Cucumber testing framework. Write high-level, natural language test instructions and watch as an AI agent executes them by analyzing screenshots and performing browser actions via Capybara.

Status: ⚠️ Experimental - relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025)

What is TouringTest?

Traditional Cucumber tests require writing step definitions that use brittle CSS selectors and detailed browser automation logic. TouringTest flips this model:

# Traditional approach
When('I sign up with email and password') do
  visit sign_up_path
  fill_in 'user[email]', with: '[email protected]'
  fill_in 'user[password]', with: 'password123'
  click_button 'Sign Up'
end

# TouringTest approach
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

# In your feature file:
When the agent "signs up with email '[email protected]' and password 'password123'"

The AI agent:

Takes a screenshot of the current page
Analyzes it to understand the UI layout
Determines what actions to take (click fields, type text, submit forms)
Executes those actions via Capybara
Repeats until the goal is achieved

Benefits:

More resilient tests - No brittle CSS selectors that break when markup changes
Usability testing - Tests reflect real user interactions
Faster test writing - Describe what you want, not how to do it
Better readability - Tests read like user stories
Self-healing - AI adapts to UI changes automatically

Quick Start

Prerequisites

Ruby >= 3.2.0
A Google Gemini API key (Get one here)

Installation

# Add to your Gemfile
gem 'touring_test'

# Install
bundle install

# Set your API key
export GEMINI_API_KEY='your_api_key_here'

Minimal Example

# features/support/env.rb
require 'touring_test'
require 'capybara'

Capybara.default_driver = :selenium_chrome_headless
World(TouringTest::WorldExtension)

# features/step_definitions/agent_steps.rb
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

# features/login.feature
Feature: User Login
  Scenario: Successful login
    Given I am on the login page
    When the agent "logs in with username 'admin' and password 'secret'"
    Then I should see the dashboard

Example Output

Here's what TouringTest looks like in action:

TouringTest Example Output

The AI agent narrates its actions in real-time, showing:

Each step it evaluates
The actions it takes (click_at, type_text_at, etc.)
Its reasoning about what to do next
Final success confirmation

Installation (Detailed)

1. Add the Gem

Add to your Gemfile:

gem 'touring_test'

Or install directly:

gem install touring_test

2. Set Up API Key

Get a Gemini API key from Google AI Studio and set it as an environment variable:

# In your shell or .env file
export GEMINI_API_KEY='your_api_key_here'

3. Configure Cucumber

Add the following to your features/support/env.rb:

require 'touring_test'
require 'capybara'

# Configure your Capybara driver (Selenium, Playwright, etc.)
Capybara.default_driver = :selenium_chrome_headless

# Add TouringTest's WorldExtension to Cucumber
World(TouringTest::WorldExtension)

If you're using Rails, you may need to create this file if it doesn't exist yet (usually generated by rails generate cucumber:install).

Usage

Basic Usage

The core of TouringTest is the computer_use method, which accepts a natural language instruction:

# In your step definitions
When('the agent {string}') do |instruction|
  computer_use(instruction)
end

Now you can write Cucumber scenarios like:

Scenario: User creates an account
  Given I am on the homepage
  When the agent "clicks on Sign Up and creates an account with email '[email protected]'"
  Then I should see "Welcome!"

Writing Effective Natural Language Instructions

Good instructions:

Be specific about the goal: "sign up with email '[email protected]' and password 'password123'"
Include exact text when important: "click the blue 'Submit' button"
Break complex tasks into steps if needed

Less effective:

Too vague: "do the signup thing"
Missing critical data: "sign up with some credentials"
Overly complex: "navigate through multiple pages and fill out everything"

Available UI Actions

The AI agent can perform these 11 browser actions:

Action	Description	Example Use
`click_at(x:, y:)`	Click element at coordinates	Clicking buttons, links
`type_text_at(x:, y:, text:)`	Type in an input field	Filling forms
`hover_at(x:, y:)`	Hover over element	Revealing dropdowns
`scroll_document(direction:)`	Scroll entire page	UP, DOWN, LEFT, RIGHT
`scroll_at(x:, y:, direction:)`	Scroll specific element	Scrollable divs
`drag_and_drop(start_x:, start_y:, end_x:, end_y:)`	Drag element	Reordering lists
`navigate(url:)`	Go to URL	Changing pages
`go_back()`	Browser back button	Navigation
`go_forward()`	Browser forward button	Navigation
`wait_5_seconds()`	Explicit wait	Slow loading
`key_combination(keys:)`	Keyboard shortcuts	"enter", "ctrl+a"

The agent automatically chooses which actions to use based on its analysis of your instruction and the page screenshot.

Configuration Options

# Default usage (screenshots saved to current directory)
computer_use("sign up with email '[email protected]'")

# Custom root path for screenshots
computer_use(
  "sign up with email '[email protected]'",
  root_path: Rails.root
)

Screenshots & Debugging

TouringTest automatically captures screenshots at each step:

Location: {root_path}/tmp/screenshots/
Naming: step_1.png, step_2.png, etc.
Cleared: At the start of each test run

API Logs:

Full request/response JSON logged to tmp/gemini_api_log.jsonl
Useful for debugging API issues or understanding agent decisions

Console Output:

Shows each instruction sent to the agent
Displays actions taken (e.g., "click_at(x: 450, y: 320)")
Reports success or failure

How It Works

Architecture

TouringTest uses a clean three-layer architecture:

Conversation Flow

1. Initial Turn:
   User: "sign up with email '[email protected]' and password 'password123'"
   + Base64 screenshot of current page
   + Current URL

Coordinate System

Gemini returns normalized coordinates in a 0-1000 range. TouringTest converts these to pixel coordinates:

API sends: {x: 500, y: 250} (middle of screen on 1000-unit scale)
Driver converts: (500 / 1000.0) * screenshot_width → pixel position

Critical Detail: Coordinates are denormalized using screenshot dimensions, not window size, to handle HiDPI/Retina displays correctly. On a 2x display:

Window size: 1512×834
Screenshot size: 756×417
Agent analyzes the 756×417 screenshot, so coordinates must match those dimensions

Step Limit

To prevent infinite loops from AI hallucination or impossible tasks:

Default: 15 steps maximum
Configurable: Pass max_steps to Agent (for advanced usage)
Exception raised: If limit exceeded

Example: Real-World Test

Here's a complete example from the test app included in this gem:

Feature file (features/sign_up.feature):

Feature: Sign up
  Scenario: User signs up with email and password
    Given I am on the sign up page
    When the agent "signs up with the email address '[email protected]' and password 'password123'"
    Then I should be signed in

Step definitions (features/step_definitions/sign_up_steps.rb):

Given('I am on the sign up page') do
  visit sign_up_path
end

When('the agent {string}') do |instruction|
  computer_use(instruction, root_path: Rails.root)
end

Then('I should be signed in') do
  expect(page).to have_content('Welcome')
end

What the AI does:

Analyzes screenshot of sign-up form
Identifies email input field coordinates
Clicks email field: click_at(x: 450, y: 280)
Types email: type_text_at(x: 450, y: 280, text: "[email protected]")
Identifies password field
Clicks password field: click_at(x: 450, y: 350)
Types password: type_text_at(x: 450, y: 350, text: "password123")
Finds Submit button
Clicks Submit: click_at(x: 500, y: 420)
Mission accomplished!

File Structure & Architecture

TouringTest follows standard Ruby gem conventions:

touring_test/

Core Components

Agent (lib/touring_test/agent.rb)
- Orchestrates conversation with Gemini API
- Maintains conversation history during execution
- Enforces max step limit (default: 15)
- Logs full API interactions to tmp/gemini_api_log.jsonl
Driver (lib/touring_test/driver.rb)
- Wraps Capybara session with AI-friendly interface
- Handles coordinate denormalization (0-1000 → pixels)
- Executes 11 different UI actions
- Manages screenshot capture
WorldExtension (lib/touring_test/world_extension.rb)
- Provides computer_use() method to Cucumber World
- Bridges step definitions to Agent/Driver
Railtie (lib/touring_test/railtie.rb)
- Automatic Rails integration
- Generates support files
- Zero-config experience

Test App (Non-Standard)

The spec/test_app/ directory contains a complete Rails 7.1.2 application for integration testing. This is unusual for a gem (most use minimal fixtures), but valuable for demonstrating end-to-end functionality.

API & Configuration

Gemini API Requirements

Model: gemini-2.5-computer-use-preview-10-2025
Endpoint: https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
Authentication: API key via query parameter: ?key={GEMINI_API_KEY}
Required Tool Specification: json { "computer_use": { "environment": "ENVIRONMENT_BROWSER" } }

Environment Variables

GEMINI_API_KEY (required): Your Google API key for Gemini access

Get your API key: https://aistudio.google.com/apikey

API Request Format

The Agent sends multi-turn conversations to Gemini:

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {"text": "sign up with email '[email protected]'"},
        {"inline_data": {"mime_type": "image/png", "data": "base64..."}},
        {"text": "Current URL: http://localhost:3000/sign_up"}
      ]
    },
    {
      "role": "model",
      "parts": [
        {"functionCall": {"name": "click_at", "args": {"x": 450, "y": 280}}}
      ]
    },
    {
      "role": "user",
      "parts": [
        {"functionResponse": {"name": "click_at", "response": {"success": true}}},
        {"inline_data": {"mime_type": "image/png", "data": "base64..."}}
      ]
    }
  ],
  "tools": [{"computer_use": {"environment": "ENVIRONMENT_BROWSER"}}]
}

Development

Running Tests

# Unit tests (RSpec) - default rake task
bundle exec rake
# or
bundle exec rake spec

# Integration tests (Cucumber features in test app)
cd spec/test_app
bundle install
bundle exec cucumber

# Run specific feature
bundle exec cucumber features/sign_up.feature

Interactive Console

# Opens IRB with the gem loaded
bin/console

# Experiment with the gem
> require 'touring_test'
> driver = TouringTest::Driver.new(session, root_path: Dir.pwd)
> agent = TouringTest::Agent.new(driver, "click the button")

Building and Installing Locally

# Build the gem
bundle exec rake build

# Install locally
bundle exec rake install

# Release (requires RubyGems permissions)
bundle exec rake release

Testing the Test App

The spec/test_app/ directory contains a complete Rails application for testing TouringTest end-to-end.

Test App Structure

spec/test_app/

Running the Test App

cd spec/test_app

# Install dependencies
bundle install

# Run Cucumber features
bundle exec cucumber

# Start Rails server (for manual testing)
bundle exec rails server

# Rails console
bundle exec rails console

Test App Configuration

Ruby: 3.4.5
Rails: 7.1.2
Database: SQLite3
Capybara Driver: selenium_chrome_headless
Database Cleaner: :truncation strategy (for JavaScript tests)

Troubleshooting

Missing API Key

Error: "GEMINI_API_KEY environment variable not set"

Solution:

export GEMINI_API_KEY='your_api_key_here'

Or add to .env file if using dotenv:

GEMINI_API_KEY=your_api_key_here

Coordinate Misalignment (Clicks Wrong Location)

Symptom: Agent clicks in wrong places on the page

Cause: HiDPI/Retina display coordinate mismatch

Solution: TouringTest automatically handles this by extracting screenshot dimensions. If issues persist:

Check tmp/screenshots/ to see what the AI sees
Verify Capybara driver supports screenshot capture
Check console output for coordinate denormalization debug info

Max Steps Exceeded

Error: "Agent exceeded maximum steps (15)"

Cause: Task too complex, AI stuck in loop, or impossible task

Solutions:

Break instruction into smaller steps
Make instruction more specific
Check screenshots to see where agent got stuck
For advanced usage, increase max_steps when creating Agent

Screenshot Directory Permission Issues

Error: Can't write to tmp/screenshots/

Solution:

mkdir -p tmp/screenshots
chmod 755 tmp/screenshots

Or specify a different root_path:

computer_use(instruction, root_path: '/path/with/permissions')

Agent Can't Find Elements

Symptom: "Warning: No element found at (x, y)"

Possible causes:

Element not visible (hidden, off-screen)
JavaScript not finished loading
Element inside iframe (not currently supported)

Solutions:

Add explicit wait steps: "wait for the page to load, then click submit"
Ensure elements are visible: page.execute_script("window.scrollTo(0, 0)")
Check screenshots to verify element visibility

Limitations & Known Issues

Experimental API

TouringTest relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025):

May change without notice
No SLA or production guarantees
Rate limits apply

Step Limit Constraints

Default 15 steps may be insufficient for complex workflows
No dynamic adjustment based on task complexity
Manual tuning required for edge cases

Performance Considerations

Each step requires API call + screenshot capture (1-3 seconds)
Long tests can be slow (15 steps ≈ 30-45 seconds)
Not suitable for load testing or CI pipelines with strict time limits

HiDPI/Retina Display Requirements

Coordinate system assumes screenshot capture works correctly
Issues may occur on exotic display configurations
Tested primarily on macOS Retina displays

Iframes Not Supported

Agent cannot interact with elements inside iframes
Workaround: Use traditional Capybara within_frame blocks

No Multi-Tab/Window Support

Agent operates on single Capybara session
Cannot switch between tabs/windows automatically

Roadmap / Future Plans

[ ] Support for additional Gemini models
[ ] Configurable step limits per instruction
[ ] Iframe interaction support
[ ] Multi-tab/window handling
[ ] Performance optimizations (screenshot caching, parallel API calls)
[ ] Alternative AI providers (OpenAI, Anthropic)
[ ] Visual regression testing mode
[ ] Accessibility testing integration
[ ] Record/replay functionality

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/stwerner92/touring_test.

Development Setup

Clone the repository
Run bin/setup to install dependencies
Run rake spec to run unit tests
Run cd spec/test_app && bundle exec cucumber for integration tests

Pull Request Guidelines

Add tests for new functionality
Update README for user-facing changes
Follow existing code style
Keep commits focused and atomic

License

The gem is available as open source under the terms of the MIT License.

Credits & Acknowledgments

Author: Scott Werner ([email protected])

Powered by:

Google Gemini API - AI computer use capabilities
Cucumber - BDD testing framework
Capybara - Browser automation

Inspired by: Anthropic's computer use demo and the vision of more maintainable, human-readable tests.

Support

Documentation: CLAUDE.md contains detailed architectural information
Issues: GitHub Issues
Email: [email protected]

Made with ❤️ for better testing experiences