llms-txt-ruby
A Ruby tool for generating llms.txt files from existing markdown documentation. Transform your docs to be AI-friendly.
What is llms.txt?
The llms.txt file is a proposed standard for providing LLM-friendly content on websites. It offers brief background information, guidance, and links to detailed markdown files, helping Large Language Models understand and navigate your project more effectively.
Learn more at llmstxt.org.
What This Tool Does
This library converts existing human-first documentation into LLM-friendly formats:
- Generates llms.txt - Transforms your existing markdown documentation into a structured overview that helps LLMs understand your project's layout and find relevant information
- Transforms markdown - Converts individual markdown files from human-readable format to AI-optimized format by expanding relative links to absolute URLs and normalizing link structures
- Bulk transforms - Processes all markdown files in a directory recursively, creating LLM-friendly versions alongside originals (or transforming in-place) with customizable exclusion patterns
Installation
Add this line to your application's Gemfile:
gem 'llms-txt-ruby'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install llms-txt-ruby
Quick Start
Option 1: Using Config File (Recommended)
Create a llms-txt.yml file in your project root:
# llms-txt.yml
docs: ./docs
base_url: https://myproject.io
title: My Awesome Project
description: A Ruby library that helps developers build amazing applications
output: llms.txt
convert_urls: true
verbose: false
Then simply run:
llms-txt generate
Option 2: Using CLI Only
# Generate from docs directory
llms-txt generate --docs ./docs
# Transform a single file
llms-txt transform --docs README.md
# Transform all markdown files in directory
llms-txt bulk-transform --docs ./docs
# Use custom config file
llms-txt generate --config my-config.yml
CLI Reference
Commands
llms-txt generate [options] # Generate llms.txt from documentation (default)
llms-txt transform [options] # Transform a markdown file to be AI-friendly
llms-txt bulk-transform [options] # Transform all markdown files in directory
llms-txt parse [options] # Parse existing llms.txt file
llms-txt validate [options] # Validate llms.txt file
llms-txt version # Show version
Options
-c, --config PATH Configuration file path (default: llms-txt.yml)
-d, --docs PATH Path to documentation directory or file
-o, --output PATH Output file path
-v, --verbose Verbose output
-h, --help Show help message
For advanced options like base_url, title, description, suffix, excludes, and convert_urls, use a config file.
Configuration File
The recommended way to use llms-txt is with a llms-txt.yml config file. This allows you to:
- ✅ Store all your settings in one place
- ✅ Version control your llms.txt configuration
- ✅ Avoid typing long CLI commands repeatedly
- ✅ Share configuration across team members
Config File Options
# Path to documentation directory or file
docs: ./docs
# Base URL for expanding relative links (optional)
base_url: https://myproject.io
# Project information (optional - auto-detected if not provided)
title: My Project Name
description: Brief description of what your project does
# Output file (optional, default: llms.txt)
output: llms.txt
# Transformation options (optional)
convert_urls: true # Convert .html links to .md
suffix: .llm # Suffix for transformed files (use "" for in-place)
verbose: false # Enable verbose output
# Exclusion patterns (optional)
excludes:
- "**/private/**"
- "**/drafts/**"
The config file will be automatically found if named:
llms-txt.ymlllms-txt.yaml.llms-txt.yml
Configuration Options Reference
| Option | Type | Default | Description |
|---|---|---|---|
docs |
String | ./docs |
Directory containing markdown files to process |
base_url |
String | - | Base URL for expanding relative links (e.g., https://myproject.io) |
title |
String | Auto-detected | Project title for llms.txt generation |
description |
String | Auto-detected | Project description for llms.txt generation |
output |
String | llms.txt |
Output filename for generated llms.txt |
convert_urls |
Boolean | false |
Convert HTML URLs to markdown format (.html → .md) |
suffix |
String | .llm |
Suffix added to transformed files. Use "" for in-place transformation |
excludes |
Array | [] |
Glob patterns for files/directories to exclude from processing |
verbose |
Boolean | false |
Enable detailed output during processing |
Bulk Transformation
The bulk-transform command processes all markdown files in a directory recursively, creating
AI-friendly versions. By default, it creates new files with a .llm.md suffix, but you can also transform files in-place for build pipelines.
Key Features
- Recursive processing - Finds and transforms all
.mdfiles in nested directories - Preserves structure - Maintains your existing directory layout
- Exclusion patterns - Skip files/directories using glob patterns
- Custom suffixes - Choose how transformed files are named, or transform in-place
- LLM optimizations - Expands relative links, converts HTML URLs, etc.
Default Behavior: Creating Separate Files
By default, bulk-transform creates new .llm.md files alongside your originals:
# llms-txt.yml
docs: ./docs
base_url: https://myproject.io
suffix: .llm # Creates .llm.md files (default if omitted)
convert_urls: true
llms-txt bulk-transform --config llms-txt.yml
Result:
docs/
├── README.md
├── README.llm.md ← AI-friendly version
├── setup.md
└── setup.llm.md ← AI-friendly version
This preserves your original files and creates LLM-optimized versions separately.
In-Place Transformation
For build pipelines where you want to transform documentation directly without maintaining separate copies, use suffix: "":
# llms-txt.yml
docs: ./docs
base_url: https://myproject.io
convert_urls: true
suffix: "" # Transform in-place, no separate files
excludes:
- "**/private/**"
- "**/drafts/**"
llms-txt bulk-transform --config llms-txt.yml
Before transformation (docs/setup.md):
See the [configuration guide](../config.md) for details.
Visit our [API docs](https://myproject.io/api/).
After transformation (docs/setup.md - same file, overwritten):
See the [configuration guide](https://myproject.io/docs/config.md) for details.
Visit our [API docs](https://myproject.io/api.md).
This is perfect for:
- Build pipelines - Transform docs as part of your deployment process
- Static site generators - Process markdown before building HTML
- CI/CD workflows - Automated documentation transformation
Real-World Example: Karafka Framework
The Karafka framework uses in-place transformation in its documentation build process. Previously, it had 140+ lines of custom Ruby code for link expansion and URL conversion. Now it uses:
# llms-txt.yml
docs: ./online/docs
base_url: https://karafka.io/docs
convert_urls: true
suffix: ""
excludes:
- "**/Enterprise-License-Setup/**"
# In their build script (sync.rb)
system!("llms-txt bulk-transform --config llms-txt.yml")
This configuration:
- Processes all markdown files recursively in
./online/docs - Expands relative links to absolute URLs using the base_url
- Converts HTML URLs to markdown format (
.html→.md) - Transforms files in-place (no separate
.llm.mdfiles) - Excludes password-protected enterprise documentation
- Runs as part of an automated daily deployment via GitHub Actions
Result: Over 140 lines of custom code replaced with a 6-line configuration file.
Usage Examples
# Transform all files with default settings (creates .llm.md files)
llms-txt bulk-transform --docs ./wiki
# Transform in-place using config file
llms-txt bulk-transform --config karafka-config.yml
# Verbose output to see processing details
llms-txt bulk-transform --config llms-txt.yml --verbose
Example Config for Bulk Transformation
# karafka-config.yml
docs: ./wiki
base_url: https://karafka.io
suffix: .llm
convert_urls: true
excludes:
- "**/private/**" # Skip private directories
- "**/draft-*.md" # Skip draft files
- "**/old-docs/**" # Skip legacy documentation
Example Output (Default Suffix)
With the config above, these files:
wiki/
├── Home.md
├── getting-started.md
├── api/
│ ├── consumers.md
│ └── producers.md
└── private/
└── internal.md
Become:
wiki/
├── Home.md
├── Home.llm.md ← AI-friendly version
├── getting-started.md
├── getting-started.llm.md
├── api/
│ ├── consumers.md
│ ├── consumers.llm.md
│ ├── producers.md
│ └── producers.llm.md
└── private/
└── internal.md ← Excluded, no .llm.md version
Example Output (In-Place Transformation)
With suffix: "", the original files are overwritten:
wiki/
├── Home.md ← Transformed in-place
├── getting-started.md ← Transformed in-place
├── api/
│ ├── consumers.md ← Transformed in-place
│ └── producers.md ← Transformed in-place
└── private/
└── internal.md ← Excluded from transformation
Serving LLM-Friendly Documentation
After using bulk-transform to create .llm.md versions of your documentation, you can configure your web server to automatically serve these LLM-optimized versions to AI bots while showing the original versions to human visitors.
Note: This section applies when using the default
suffix: .llmbehavior. If you're usingsuffix: ""for in-place transformation, the markdown files are already LLM-optimized and can be served directly.
How It Works
The strategy is simple:
- Detect AI bots by their User-Agent strings
- Serve
.llm.mdfiles to detected AI bots - Serve original
.mdfiles to human visitors - Automatic selection - no manual switching needed
Apache Configuration
Add this to your .htaccess file:
# Detect LLM bots by User-Agent
SetEnvIf User-Agent "(?i)(openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot)" IS_LLM_BOT
SetEnvIf User-Agent "(?i)(perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate)" IS_LLM_BOT
SetEnvIf User-Agent "(?i)(langchain|llamaindex|semantic|embedding|vector|rag)" IS_LLM_BOT
SetEnvIf User-Agent "(?i)(ollama|mistral|cohere|together|fireworks|groq)" IS_LLM_BOT
# Serve .md files as text/plain
<FilesMatch "\.md$">
Header set Content-Type "text/plain; charset=utf-8"
ForceType text/plain
</FilesMatch>
# Enable rewrite engine
RewriteEngine On
# For LLM bots: rewrite requests to serve .llm.md versions
RewriteCond %{ENV:IS_LLM_BOT} !^$
RewriteCond %{REQUEST_URI} ^/docs/.*\.md$ [NC]
RewriteCond %{REQUEST_URI} !\.llm\.md$ [NC]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
RewriteRule ^(.*)\.md$ $1.llm.md [L]
# For LLM bots: handle clean URLs by appending .llm.md
RewriteCond %{ENV:IS_LLM_BOT} !^$
RewriteCond %{REQUEST_URI} ^/docs/ [NC]
RewriteCond %{REQUEST_URI} !\.md$ [NC]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.llm.md -f
RewriteRule ^(.*)$ $1.llm.md [L]
# For regular users: serve original .md files or clean URLs as usual
# (add your normal URL handling rules here)
Nginx Configuration
Add this to your nginx server block:
# Map to detect LLM bots
map $http_user_agent $is_llm_bot {
default 0;
"~*(?i)(openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot)" 1;
"~*(?i)(perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate)" 1;
"~*(?i)(langchain|llamaindex|semantic|embedding|vector|rag)" 1;
"~*(?i)(ollama|mistral|cohere|together|fireworks|groq)" 1;
}
server {
# ... your server configuration ...
# Serve .md files as text/plain
location ~ \.md$ {
default_type text/plain;
charset utf-8;
}
# For LLM bots requesting .md files, serve .llm.md version
location ~ ^/docs/(.*)\.md$ {
if ($is_llm_bot) {
rewrite ^(.*)\.md$ $1.llm.md last;
}
try_files $uri $uri/ =404;
}
# For LLM bots requesting clean URLs, serve .llm.md version
location ~ ^/docs/ {
if ($is_llm_bot) {
try_files $uri.llm.md $uri $uri/ =404;
}
try_files $uri $uri.md $uri/ =404;
}
}
Cloudflare Workers
For serverless deployments, use Cloudflare Workers:
export default {
async fetch(request) {
const url = new URL(request.url);
const userAgent = request.headers.get('user-agent') || '';
// Detect LLM bots
const llmBotPatterns = [
/openai|anthropic|claude|gpt|chatgpt|bard|gemini|copilot/i,
/perplexity|character\.ai|you\.com|poe\.com|huggingface|replicate/i,
/langchain|llamaindex|semantic|embedding|vector|rag/i,
/ollama|mistral|cohere|together|fireworks|groq/i
];
const isLLMBot = llmBotPatterns.some(pattern => pattern.test(userAgent));
// If LLM bot and requesting docs
if (isLLMBot && url.pathname.startsWith('/docs/')) {
// Try to serve .llm.md version
const llmPath = url.pathname.replace(/\.md$/, '.llm.md');
if (!url.pathname.endsWith('.llm.md')) {
url.pathname = llmPath;
}
}
return fetch(url);
}
}
Custom Suffix
If you used a different suffix with the bulk-transform command (e.g., suffix: .ai), update your web server rules accordingly.
Apache:
RewriteRule ^(.*)\.md$ $1.ai.md [L]
Nginx:
rewrite ^(.*)\.md$ $1.ai.md last;
Cloudflare Workers:
const llmPath = url.pathname.replace(/\.md$/, '.ai.md');
Ruby API
Basic Usage
require 'llms_txt'
# Option 1: Using config file (recommended)
content = LlmsTxt.generate_from_docs(config_file: 'llms-txt.yml')
# Option 2: Direct options (overrides config)
content = LlmsTxt.generate_from_docs('./docs',
base_url: 'https://myproject.io',
title: 'My Project',
description: 'A great project'
)
# Option 3: Mix config file with overrides
content = LlmsTxt.generate_from_docs('./docs',
config_file: 'my-config.yml',
title: 'Override Title' # This overrides config file title
)
# Transform markdown with config
transformed = LlmsTxt.transform_markdown('README.md',
config_file: 'llms-txt.yml'
)
# Transform with direct options
transformed = LlmsTxt.transform_markdown('README.md',
base_url: 'https://myproject.io',
convert_urls: true
)
# Bulk transform all files in directory (creates .llm.md files)
transformed_files = LlmsTxt.bulk_transform('./wiki',
base_url: 'https://karafka.io',
suffix: '.llm',
excludes: ['**/private/**', '**/draft-*.md']
)
puts "Transformed #{transformed_files.size} files"
# Bulk transform in-place (overwrites original files)
transformed_files = LlmsTxt.bulk_transform('./wiki',
base_url: 'https://karafka.io',
suffix: '', # Empty string for in-place transformation
convert_urls: true,
excludes: ['**/private/**']
)
# Bulk transform with config file
transformed_files = LlmsTxt.bulk_transform('./wiki',
config_file: 'karafka-config.yml'
)
# Parse and validate (unchanged)
parsed = LlmsTxt.parse('llms.txt')
puts parsed.title
puts parsed.description
valid = LlmsTxt.validate(content)
How It Works
Generation Process
- Scan for markdown files - Finds all
.mdfiles in specified directory - Extract metadata - Gets title and description from each file
- Prioritize docs - Orders by importance (README first, then guides, APIs, etc.)
- Build llms.txt - Creates properly formatted output with links and descriptions
Transformation Process
- Expand relative links - Convert
./docs/api.mdtohttps://myproject.io/docs/api.md - Convert URLs - Change
.htmllinks to.mdfor better AI understanding - Preserve content - No content modification, just link processing
File Prioritization
When generating llms.txt, files are automatically prioritized:
- README files - Always listed first
- Getting Started guides - Quick start documentation
- Guides and tutorials - Step-by-step content
- API references - Technical documentation
- Other files - Everything else
Example Output
Given a docs/ directory with:
README.mdgetting-started.mdapi-reference.md
Running llms-txt generate --docs ./docs --base-url https://myproject.io creates:
# My Project
> This is a Ruby library that helps developers build amazing applications with a clean, simple API.
## Documentation
- [README](https://myproject.io/README.md): Complete overview and installation instructions
- [Getting Started](https://myproject.io/getting-started.md): Quick start guide with examples
- [API Reference](https://myproject.io/api-reference.md): Detailed API documentation and method
signatures
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/mensfeld/llms-txt-ruby.
License
The gem is available as open source under the terms of the MIT License.