SnapSearch-Client-Ruby
Snapsearch Client Ruby is Ruby based framework agnostic HTTP client library for SnapSearch (https://snapsearch.io/).
SnapSearch provides similar libraries in other languages: https://github.com/SnapSearch/Snapsearch-Clients
Installation
Usage
Development
Get the bundler dependency management tool.
gem install bundler
Install/update all dependencies:
bundle install
See all build tasks:
bundle exec rake -T
Make your changes. Release a new version tag with (see the other rake version:bump:... etc tasks):
bundle exec rake version:bump
Synchronise and push the tag to Github:
git push
git push --
Create the gem package:
bundle exec rake gem
Push the gem to Ruby Gems:
gem push pkg/snapsearch-client-ruby-MAJOR.MINOR.PATCH.gem
Setting Up the Detector
The Detector class detects if the incoming request is coming from a robot or not.
Detects if the request came from a search engine robot. It will intercept in cascading order:
- on a GET request
- on an HTTP or HTTPS protocol
- not on any ignored robot user agents
- not on any route not matching the whitelist
- not on any route matching the blacklist
- not on any static files that is not a PHP file if it is detected
- on requests with escaped_fragment query parameter
- on any matched robot user agents
You can customize a few aspects of this process:
User Agents
Most robots send a unique user-agent HTTP header that we match against to confirm if it indeed a request from a robot.
We also ignore certain user agents, such as the SnapSearch robot.
The list of user agents to match and ignore is contained in resources/robots.json. You can customize this list through the Detector instance
you are working with:
# Retrieve the list of user agents to match and ignore:
detector.robots # => { 'match' => ['SomeRobot', 'AnotherRobot'], 'ignore' => ['SnapSearch'] }
# Add a user agent to match against:
detector.robots['match'] << 'NewRobot'
# Add a user agent to ignore:
detector.robots['ignore'] << 'MyRobot'
# Set a new list of user agents to match and ignore:
detector.robots = { 'match' => ['WebScraper', 'SillyBot'], 'ignore' => ['MyBotToIgnore'] }
# Load from a custom JSON file:
detector.robots_json = './my_robots.json'
detector.robots # => { 'match' => ['MyCustomBot', 'AnotherRobot'], 'ignore' => ['MyLoadedBotFromJSON'] }
Tests
Tests are written with RSpec. Run tests with bundle exec rspec spec/
