WebAuthor
WebAuthor is a Ruby gem that extracts author information from web pages using multiple strategies. It can detect authors from both meta tags and JSON-LD schema, providing a reliable way to identify content creators.
Features
- Extract author information from HTML meta tags
- Extract author information from JSON-LD schema (schema.org)
- Support for multiple authors in a single page
- Fallback strategy - tries different methods until an author is found
- Clean, type-safe code with Sorbet
Installation
Add this line to your application's Gemfile:
gem 'web-author'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install web-author
Usage
Basic Usage
require 'web_author'
# Create a new Page object with a URL
page = WebAuthor::Page.new(url: 'https://example.com/article')
# Get the author of the page
= page.
# => "John Doe"
WebAuthor will first try to find author information in JSON-LD schema data, then fall back to meta tags if needed.
Handling Multiple Authors
If a page has multiple authors in the JSON-LD schema, WebAuthor returns them as a comma-separated string:
page = WebAuthor::Page.new(url: 'https://example.com/collaboration-article')
= page.
# => "Jane Smith, Bob Johnson"
Error Handling
WebAuthor raises WebAuthor::Error when it encounters problems fetching the page:
begin
page = WebAuthor::Page.new(url: 'https://example.com/article')
= page.
rescue WebAuthor::Error => e
puts "Failed to get author: #{e.}"
end
How It Works
WebAuthor uses a strategy to extract author information:
- First, it tries to find author information in JSON-LD schema (often found in
<script type="application/ld+json">tags) - If no author is found in JSON-LD, it looks for a meta tag with the name "author" (
<meta name="author" content="Author Name">) - If no author is found using any strategy, it returns
nil
Supported Author Formats
Meta Tags
<meta name="author" content="Author Name" />
JSON-LD Schema
Single author:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"author": {
"@type": "Person",
"name": "Author Name"
}
}
</script>
Multiple authors:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"author": [
{
"@type": "Person",
"name": "First Author"
},
{
"@type": "Person",
"name": "Second Author"
}
]
}
</script>
Requirements
- Ruby 3.4 or higher
- Nokogiri
- Sorbet Runtime
- Zeitwerk
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install.
Development Workflow
Type Checking with Sorbet
This project uses Sorbet for static type checking. To run the type checker:
$ bin/type-check
or directly:
$ bundle exec srb tc
Running Tests
Run all tests using:
$ bundle exec rake test
Run a specific test file:
$ bundle exec ruby -Ilib:test test/web_author/page_test.rb
Code Style and Linting
This project follows Ruby style guidelines enforced by RuboCop. Run the linter with:
$ bundle exec rubocop
Auto-fix issues when possible:
$ bundle exec rubocop -A
Running All Checks
The default Rake task runs both tests and RuboCop:
$ bundle exec rake
Working with Sorbet
WebAuthor uses Sorbet for static type checking. When adding new code:
- Add comment on top of the file:
# typed: strict - Add type signatures to methods using
sigblocks - Run
bin/type-checkto verify type safety
Example of typed code:
extend T::Sig
sig { params(url: String).void }
def initialize(url:)
@url = T.let(url, String)
@content = T.let(nil, T.nilable(String))
end
sig { returns(T.nilable(String)) }
def
# method implementation
end
Adding a new strategy
You should create a new class that inherits from WebAuthor::Strategy and implement the author method.
You will notice that you will get the document from the initializer as every strategy receives it. This is a Nokogiri::XML::Document object.
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
Bug reports and pull requests are welcome on GitHub at https://github.com/lucianghinda/web_author.
License
The gem is available as open source under the terms of the MIT License.