TouringTest
AI-Powered Natural Language Testing for Cucumber
TouringTest is a Ruby gem that integrates Google's Gemini "computer use" AI model with Cucumber testing framework. Write high-level, natural language test instructions and watch as an AI agent executes them by analyzing screenshots and performing browser actions via Capybara.
Status: ⚠️ Experimental - relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025)
What is TouringTest?
Traditional Cucumber tests require writing step definitions that use brittle CSS selectors and detailed browser automation logic. TouringTest flips this model:
# Traditional approach
When('I sign up with email and password') do
visit sign_up_path
fill_in 'user[email]', with: '[email protected]'
fill_in 'user[password]', with: 'password123'
'Sign Up'
end
# TouringTest approach
When('the agent {string}') do |instruction|
computer_use(instruction)
end
# In your feature file:
When the agent "signs up with email '[email protected]' and password 'password123'"
The AI agent:
- Takes a screenshot of the current page
- Analyzes it to understand the UI layout
- Determines what actions to take (click fields, type text, submit forms)
- Executes those actions via Capybara
- Repeats until the goal is achieved
Benefits:
- More resilient tests - No brittle CSS selectors that break when markup changes
- Usability testing - Tests reflect real user interactions
- Faster test writing - Describe what you want, not how to do it
- Better readability - Tests read like user stories
- Self-healing - AI adapts to UI changes automatically
Quick Start
Prerequisites
- Ruby >= 3.2.0
- A Google Gemini API key (Get one here)
Installation
# Add to your Gemfile
gem 'touring_test'
# Install
bundle install
# Set your API key
export GEMINI_API_KEY='your_api_key_here'
Minimal Example
# features/support/env.rb
require 'touring_test'
require 'capybara'
.default_driver = :selenium_chrome_headless
World(TouringTest::WorldExtension)
# features/step_definitions/agent_steps.rb
When('the agent {string}') do |instruction|
computer_use(instruction)
end
# features/login.feature
Feature: User Login
Scenario: Successful login
Given I am on the login page
When the agent "logs in with username 'admin' and password 'secret'"
Then I should see the dashboard
Example Output
Here's what TouringTest looks like in action:

The AI agent narrates its actions in real-time, showing:
- Each step it evaluates
- The actions it takes (click_at, type_text_at, etc.)
- Its reasoning about what to do next
- Final success confirmation
Installation (Detailed)
1. Add the Gem
Add to your Gemfile:
gem 'touring_test'
Or install directly:
gem install touring_test
2. Set Up API Key
Get a Gemini API key from Google AI Studio and set it as an environment variable:
# In your shell or .env file
export GEMINI_API_KEY='your_api_key_here'
3. Configure Cucumber
Add the following to your features/support/env.rb:
require 'touring_test'
require 'capybara'
# Configure your Capybara driver (Selenium, Playwright, etc.)
.default_driver = :selenium_chrome_headless
# Add TouringTest's WorldExtension to Cucumber
World(TouringTest::WorldExtension)
If you're using Rails, you may need to create this file if it doesn't exist yet (usually generated by rails generate cucumber:install).
Usage
Basic Usage
The core of TouringTest is the computer_use method, which accepts a natural language instruction:
# In your step definitions
When('the agent {string}') do |instruction|
computer_use(instruction)
end
Now you can write Cucumber scenarios like:
Scenario: User creates an account
Given I am on the homepage
When the agent "clicks on Sign Up and creates an account with email '[email protected]'"
Then I should see "Welcome!"
Writing Effective Natural Language Instructions
Good instructions:
- Be specific about the goal: "sign up with email '[email protected]' and password 'password123'"
- Include exact text when important: "click the blue 'Submit' button"
- Break complex tasks into steps if needed
Less effective:
- Too vague: "do the signup thing"
- Missing critical data: "sign up with some credentials"
- Overly complex: "navigate through multiple pages and fill out everything"
Available UI Actions
The AI agent can perform these 11 browser actions:
| Action | Description | Example Use |
|---|---|---|
click_at(x:, y:) |
Click element at coordinates | Clicking buttons, links |
type_text_at(x:, y:, text:) |
Type in an input field | Filling forms |
hover_at(x:, y:) |
Hover over element | Revealing dropdowns |
scroll_document(direction:) |
Scroll entire page | UP, DOWN, LEFT, RIGHT |
scroll_at(x:, y:, direction:) |
Scroll specific element | Scrollable divs |
drag_and_drop(start_x:, start_y:, end_x:, end_y:) |
Drag element | Reordering lists |
navigate(url:) |
Go to URL | Changing pages |
go_back() |
Browser back button | Navigation |
go_forward() |
Browser forward button | Navigation |
wait_5_seconds() |
Explicit wait | Slow loading |
key_combination(keys:) |
Keyboard shortcuts | "enter", "ctrl+a" |
The agent automatically chooses which actions to use based on its analysis of your instruction and the page screenshot.
Configuration Options
# Default usage (screenshots saved to current directory)
computer_use("sign up with email '[email protected]'")
# Custom root path for screenshots
computer_use(
"sign up with email '[email protected]'",
root_path: Rails.root
)
Screenshots & Debugging
TouringTest automatically captures screenshots at each step:
- Location:
{root_path}/tmp/screenshots/ - Naming:
step_1.png,step_2.png, etc. - Cleared: At the start of each test run
API Logs:
- Full request/response JSON logged to
tmp/gemini_api_log.jsonl - Useful for debugging API issues or understanding agent decisions
Console Output:
- Shows each instruction sent to the agent
- Displays actions taken (e.g., "click_at(x: 450, y: 320)")
- Reports success or failure
How It Works
Architecture
TouringTest uses a clean three-layer architecture:
Conversation Flow
1. Initial Turn:
User: "sign up with email '[email protected]' and password 'password123'"
+ Base64 screenshot of current page
+ Current URL
Coordinate System
Gemini returns normalized coordinates in a 0-1000 range. TouringTest converts these to pixel coordinates:
- API sends:
{x: 500, y: 250}(middle of screen on 1000-unit scale) - Driver converts:
(500 / 1000.0) * screenshot_width→ pixel position
Critical Detail: Coordinates are denormalized using screenshot dimensions, not window size, to handle HiDPI/Retina displays correctly. On a 2x display:
- Window size: 1512×834
- Screenshot size: 756×417
- Agent analyzes the 756×417 screenshot, so coordinates must match those dimensions
Step Limit
To prevent infinite loops from AI hallucination or impossible tasks:
- Default: 15 steps maximum
- Configurable: Pass
max_stepsto Agent (for advanced usage) - Exception raised: If limit exceeded
Example: Real-World Test
Here's a complete example from the test app included in this gem:
Feature file (features/sign_up.feature):
Feature: Sign up
Scenario: User signs up with email and password
Given I am on the sign up page
When the agent "signs up with the email address '[email protected]' and password 'password123'"
Then I should be signed in
Step definitions (features/step_definitions/sign_up_steps.rb):
Given('I am on the sign up page') do
visit sign_up_path
end
When('the agent {string}') do |instruction|
computer_use(instruction, root_path: Rails.root)
end
Then('I should be signed in') do
expect(page).to have_content('Welcome')
end
What the AI does:
- Analyzes screenshot of sign-up form
- Identifies email input field coordinates
- Clicks email field:
click_at(x: 450, y: 280) - Types email:
type_text_at(x: 450, y: 280, text: "[email protected]") - Identifies password field
- Clicks password field:
click_at(x: 450, y: 350) - Types password:
type_text_at(x: 450, y: 350, text: "password123") - Finds Submit button
- Clicks Submit:
click_at(x: 500, y: 420) - Mission accomplished!
File Structure & Architecture
TouringTest follows standard Ruby gem conventions:
touring_test/
Core Components
Agent (
lib/touring_test/agent.rb)- Orchestrates conversation with Gemini API
- Maintains conversation history during execution
- Enforces max step limit (default: 15)
- Logs full API interactions to
tmp/gemini_api_log.jsonl
Driver (
lib/touring_test/driver.rb)- Wraps Capybara session with AI-friendly interface
- Handles coordinate denormalization (0-1000 → pixels)
- Executes 11 different UI actions
- Manages screenshot capture
WorldExtension (
lib/touring_test/world_extension.rb)- Provides
computer_use()method to Cucumber World - Bridges step definitions to Agent/Driver
- Provides
Railtie (
lib/touring_test/railtie.rb)- Automatic Rails integration
- Generates support files
- Zero-config experience
Test App (Non-Standard)
The spec/test_app/ directory contains a complete Rails 7.1.2 application for integration testing. This is unusual for a gem (most use minimal fixtures), but valuable for demonstrating end-to-end functionality.
API & Configuration
Gemini API Requirements
- Model:
gemini-2.5-computer-use-preview-10-2025 - Endpoint:
https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent - Authentication: API key via query parameter:
?key={GEMINI_API_KEY} - Required Tool Specification:
json { "computer_use": { "environment": "ENVIRONMENT_BROWSER" } }
Environment Variables
GEMINI_API_KEY(required): Your Google API key for Gemini access
Get your API key: https://aistudio.google.com/apikey
API Request Format
The Agent sends multi-turn conversations to Gemini:
{
"contents": [
{
"role": "user",
"parts": [
{"text": "sign up with email '[email protected]'"},
{"inline_data": {"mime_type": "image/png", "data": "base64..."}},
{"text": "Current URL: http://localhost:3000/sign_up"}
]
},
{
"role": "model",
"parts": [
{"functionCall": {"name": "click_at", "args": {"x": 450, "y": 280}}}
]
},
{
"role": "user",
"parts": [
{"functionResponse": {"name": "click_at", "response": {"success": true}}},
{"inline_data": {"mime_type": "image/png", "data": "base64..."}}
]
}
],
"tools": [{"computer_use": {"environment": "ENVIRONMENT_BROWSER"}}]
}
Development
Running Tests
# Unit tests (RSpec) - default rake task
bundle exec rake
# or
bundle exec rake spec
# Integration tests (Cucumber features in test app)
cd spec/test_app
bundle install
bundle exec cucumber
# Run specific feature
bundle exec cucumber features/sign_up.feature
Interactive Console
# Opens IRB with the gem loaded
bin/console
# Experiment with the gem
> require 'touring_test'
> driver = TouringTest::Driver.new(session, root_path: Dir.pwd)
> agent = TouringTest::Agent.new(driver, "click the button")
Building and Installing Locally
# Build the gem
bundle exec rake build
# Install locally
bundle exec rake install
# Release (requires RubyGems permissions)
bundle exec rake release
Testing the Test App
The spec/test_app/ directory contains a complete Rails application for testing TouringTest end-to-end.
Test App Structure
spec/test_app/
Running the Test App
cd spec/test_app
# Install dependencies
bundle install
# Run Cucumber features
bundle exec cucumber
# Start Rails server (for manual testing)
bundle exec rails server
# Rails console
bundle exec rails console
Test App Configuration
- Ruby: 3.4.5
- Rails: 7.1.2
- Database: SQLite3
- Capybara Driver:
selenium_chrome_headless - Database Cleaner:
:truncationstrategy (for JavaScript tests)
Troubleshooting
Missing API Key
Error: "GEMINI_API_KEY environment variable not set"
Solution:
export GEMINI_API_KEY='your_api_key_here'
Or add to .env file if using dotenv:
GEMINI_API_KEY=your_api_key_here
Coordinate Misalignment (Clicks Wrong Location)
Symptom: Agent clicks in wrong places on the page
Cause: HiDPI/Retina display coordinate mismatch
Solution: TouringTest automatically handles this by extracting screenshot dimensions. If issues persist:
- Check
tmp/screenshots/to see what the AI sees - Verify Capybara driver supports screenshot capture
- Check console output for coordinate denormalization debug info
Max Steps Exceeded
Error: "Agent exceeded maximum steps (15)"
Cause: Task too complex, AI stuck in loop, or impossible task
Solutions:
- Break instruction into smaller steps
- Make instruction more specific
- Check screenshots to see where agent got stuck
- For advanced usage, increase
max_stepswhen creating Agent
Screenshot Directory Permission Issues
Error: Can't write to tmp/screenshots/
Solution:
mkdir -p tmp/screenshots
chmod 755 tmp/screenshots
Or specify a different root_path:
computer_use(instruction, root_path: '/path/with/permissions')
Agent Can't Find Elements
Symptom: "Warning: No element found at (x, y)"
Possible causes:
- Element not visible (hidden, off-screen)
- JavaScript not finished loading
- Element inside iframe (not currently supported)
Solutions:
- Add explicit wait steps: "wait for the page to load, then click submit"
- Ensure elements are visible:
page.execute_script("window.scrollTo(0, 0)") - Check screenshots to verify element visibility
Limitations & Known Issues
Experimental API
TouringTest relies on Google's preview API (gemini-2.5-computer-use-preview-10-2025):
- May change without notice
- No SLA or production guarantees
- Rate limits apply
Step Limit Constraints
- Default 15 steps may be insufficient for complex workflows
- No dynamic adjustment based on task complexity
- Manual tuning required for edge cases
Performance Considerations
- Each step requires API call + screenshot capture (1-3 seconds)
- Long tests can be slow (15 steps ≈ 30-45 seconds)
- Not suitable for load testing or CI pipelines with strict time limits
HiDPI/Retina Display Requirements
- Coordinate system assumes screenshot capture works correctly
- Issues may occur on exotic display configurations
- Tested primarily on macOS Retina displays
Iframes Not Supported
- Agent cannot interact with elements inside iframes
- Workaround: Use traditional Capybara
within_frameblocks
No Multi-Tab/Window Support
- Agent operates on single Capybara session
- Cannot switch between tabs/windows automatically
Roadmap / Future Plans
- [ ] Support for additional Gemini models
- [ ] Configurable step limits per instruction
- [ ] Iframe interaction support
- [ ] Multi-tab/window handling
- [ ] Performance optimizations (screenshot caching, parallel API calls)
- [ ] Alternative AI providers (OpenAI, Anthropic)
- [ ] Visual regression testing mode
- [ ] Accessibility testing integration
- [ ] Record/replay functionality
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/stwerner92/touring_test.
Development Setup
- Clone the repository
- Run
bin/setupto install dependencies - Run
rake specto run unit tests - Run
cd spec/test_app && bundle exec cucumberfor integration tests
Pull Request Guidelines
- Add tests for new functionality
- Update README for user-facing changes
- Follow existing code style
- Keep commits focused and atomic
License
The gem is available as open source under the terms of the MIT License.
Credits & Acknowledgments
Author: Scott Werner ([email protected])
Powered by:
- Google Gemini API - AI computer use capabilities
- Cucumber - BDD testing framework
- Capybara - Browser automation
Inspired by: Anthropic's computer use demo and the vision of more maintainable, human-readable tests.
Support
- Documentation: CLAUDE.md contains detailed architectural information
- Issues: GitHub Issues
- Email: [email protected]
Made with ❤️ for better testing experiences