LLMBench

A standalone Ruby gem for benchmarking and comparing the performance of different Large Language Model providers and APIs.

Features

  • Support for both OpenAI and Anthropic-compatible API formats
  • Parallel execution across multiple models and providers
  • Continuous tracking with CSV export functionality
  • No external dependencies - uses only Ruby standard library

Installation

Important: This is a standalone executable gem, not a library for use in other applications. Install it system-wide:

gem install llm_bench

Do not add this gem to your application's Gemfile - it is designed to be used as a command-line tool only.

Using Docker

If you don't have Ruby installed or prefer containerized environments, you can use the Docker image:

# Build the Docker image
docker build -t llm_bench .

# Or use the pre-built image
docker pull vitobotta/llm-bench:v2

The Docker image includes everything needed to run llm_bench without installing Ruby locally.

Usage

Configuration

Create a configuration file named models.yaml in your current directory, or specify a custom path with the --config argument:

prompt: "Explain the concept of machine learning in simple terms in exactly 300 words..."

providers:
  - name: "openai"
    base_url: "https://api.openai.com/v1"
    api_key: "your-api-key-here"
    models:
      - nickname: "gpt-4"
        id: "gpt-4"
        api_format: "openai"

  - name: "anthropic"
    base_url: "https://api.anthropic.com"
    api_key: "your-api-key-here"
    models:
      - nickname: "claude"
        id: "claude-3-sonnet-20240229"
        api_format: "anthropic"

Commands

Benchmark a single model:

llm_bench --config ./my-config.yaml --provider openai --model gpt-4

Benchmark all configured models:

llm_bench --all

Benchmark all models with custom config:

llm_bench --config ./my-config.yaml --all

Enable continuous tracking:

llm_bench --config ./my-config.yaml --all --track

Enable continuous tracking with custom interval (default is 600 seconds):

llm_bench --config ./my-config.yaml --all --track --interval-in-seconds 300

Enable continuous tracking with custom output file:

llm_bench --config ./my-config.yaml --all --track --output-file ./results/benchmark_results.csv
llm_bench --config ./my-config.yaml --provider openai --model gpt-4 --print-result

Show version information:

llm_bench --version

Note: If no --config argument is provided, llm_bench will look for models.yaml in the current directory. If the configuration file is not found, an error will be displayed. When using --track, you can optionally specify --interval-in-seconds to control the frequency of benchmark cycles (default: 600 seconds) and --output-file to specify the CSV output path (default: llm_benchmark_results_TIMESTAMP.csv in current directory).

Docker Usage

When using Docker, you need to mount your configuration file and any output directories:

# Benchmark a single model with Docker
docker run -v $(pwd)/my-config.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --provider openai --model gpt-4

# Benchmark all models with Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all

# Enable continuous tracking with Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track

# Enable continuous tracking with custom interval (5 minutes) using Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track --interval-in-seconds 300

# Enable continuous tracking with custom output file using Docker
docker run -v $(pwd)/models.yaml:/data/models.yaml \
           -v $(pwd)/results:/data/results \
           llm_bench --all --track --output-file /data/results/custom_benchmark.csv

The Docker container uses /data as the working directory, so mount your config file to /data/models.yaml (or use the --config argument with the mounted path) and mount any directories where you want to save output files.

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To build and install the gem locally:

gem build llm_bench.gemspec
gem install ./llm_bench-0.1.0.gem

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/vitobotta/llm-bench.

License

The gem is available as open source under the terms of the MIT License.