Module: Jekyll::Algolia::FileBrowser

Includes:: Jekyll::Algolia

Defined in:: lib/jekyll/algolia/file_browser.rb

Overview

Module to get information about Jekyll file. Jekyll handles posts, pages, collection, etc. They each need specific processing, so knowing which kind of file we’re working on will help.

We also do not index all files. This module will help in defining which files should be indexed and which should not.

Constant Summary

Constants included from Jekyll::Algolia

VERSION

Class Method Summary collapse

.absolute_path(filepath) ⇒ Object

Public: Return the absolute path of a Jekyll file.
.allowed_extension?(file) ⇒ Boolean

Public: Check if the file has one of the allowed extensions.
.categories(file) ⇒ Object

Public: Returns the list of tags of a file, defaults to an empty array.
.collection(file) ⇒ Object

Public: Returns the name of the collection.
.date(file) ⇒ Object

Public: Returns a timestamp of the file date.
.excerpt_html(file) ⇒ Object

Public: Returns the HTML version of the excerpt.
.excerpt_raw(file) ⇒ Object

Public: Returns the raw excerpt of a file, directly as returned by Jekyll.
.excerpt_text(file) ⇒ Object

Public: Returns the text version of the excerpt.
.excluded_from_config?(file) ⇒ Boolean

Public: Check if the file has been excluded by ‘files_to_exclude`.
.excluded_from_hook?(file) ⇒ Boolean

Public: Check if the file has been excluded by running a custom user hook.
.indexable?(file) ⇒ Boolean

Public: Check if the file should be indexed.
.is_404?(file) ⇒ Boolean

Public: Check if the file is a 404 error page.
.metadata(file) ⇒ Object

Public: Return a hash of all the file metadata.
.raw_data(file) ⇒ Object

Note that even if you define tags and categories in a collection item, it will not be included in the data.
.redirect?(file) ⇒ Boolean

Public: Check if the file is redirect page.
.relative_path(filepath) ⇒ Object

Public: Return the path of a Jekyll file relative to the Jekyll source.
.slug(file) ⇒ Object

Public: Returns the slug of the file.
.static_file?(file) ⇒ Boolean

Public: Check if the specified file is a static Jekyll asset.
.tags(file) ⇒ Object

Public: Returns the list of tags of a file, defaults to an empty array.
.type(file) ⇒ Object

Public: Get the type of the document (page, post, collection, etc).
.url(file) ⇒ Object

Public: Returns the url of the file, starting from the root.
.use_default_excerpt?(file) ⇒ Boolean

Public: Return true if the Jekyll default excerpt should be used for this file.

Methods included from Jekyll::Algolia

init, load_overwrites, run, site

Class Method Details

.absolute_path(filepath) ⇒ `Object`

Public: Return the absolute path of a Jekyll file

file - The Jekyll file to inspect

# File 'lib/jekyll/algolia/file_browser.rb', line 20

def self.absolute_path(filepath)
  pathname = Pathname.new(filepath)
  return pathname.cleanpath.to_s if pathname.absolute?

  File.expand_path(File.join(Configurator.get('source'), filepath))
end

.allowed_extension?(file) ⇒ `Boolean`

Public: Check if the file has one of the allowed extensions

file - The Jekyll file

Jekyll can transform markdown files to HTML by default. With plugins, it can convert many more file formats. By default we’ll only index markdown and raw HTML files but this list can be extended using the ‘extensions_to_index` config option.

Returns:

(Boolean)

# File 'lib/jekyll/algolia/file_browser.rb', line 103

def self.allowed_extension?(file)
  extensions = Configurator.algolia('extensions_to_index')
  extname = File.extname(file.path)[1..-1]
  extensions.include?(extname)
end

.categories(file) ⇒ `Object`

Public: Returns the list of tags of a file, defaults to an empty array

file - The Jekyll file



223
224
225

# File 'lib/jekyll/algolia/file_browser.rb', line 223

def self.categories(file)
  file.data['categories'] || []
end

.collection(file) ⇒ `Object`

Public: Returns the name of the collection

file - The Jekyll file

Only collection documents can have a collection name. Pages don’t. Posts are purposefully excluded from it as well even if they are technically part of a collection

# File 'lib/jekyll/algolia/file_browser.rb', line 331

def self.collection(file)
  return nil unless file.respond_to?(:collection)

  collection_name = file.collection.label

  # Posts are a special kind of collection, but it's an implementation
  # detail from my POV, so I'll exclude them
  return nil if collection_name == 'posts'

  collection_name
end

.date(file) ⇒ `Object`

Public: Returns a timestamp of the file date

file - The Jekyll file

Posts have their date coming from the filepath, or the front-matter. Pages and other collection items can only have a date set in front-matter.

# File 'lib/jekyll/algolia/file_browser.rb', line 234

def self.date(file)
  # Collections get their date from .date, while pages read it from .data.
  # Jekyll by default will set the date of collection to the current date,
  # but we overwrote this.
  date = if file.respond_to?(:date)
           file.date
         else
           file.data['date']
         end

  return nil if date.nil?
  date.to_time.to_i
end

.excerpt_html(file) ⇒ `Object`

Public: Returns the HTML version of the excerpt

file - The Jekyll file

# File 'lib/jekyll/algolia/file_browser.rb', line 288

def self.excerpt_html(file)
  # If it's a post with a custom separator for the excerpt, we honor it
  return excerpt_raw(file) if use_default_excerpt?(file)

  # Otherwise we take the first matching node
  html = file.content
  selector = Configurator.algolia('nodes_to_index')
  first_node = Nokogiri::HTML(html).css(selector).first
  return nil if first_node.nil?
  first_node.to_s
end

.excerpt_raw(file) ⇒ `Object`

Public: Returns the raw excerpt of a file, directly as returned by Jekyll. Swallow any error that could occur when reading.

file - The Jekyll file

This might throw an exception if the excerpt is invalid. We also silence all logger output as Jekyll is quite verbose and will display the potential Liquid error in the terminal, even if we catch the actual error.

# File 'lib/jekyll/algolia/file_browser.rb', line 257

def self.excerpt_raw(file)
  Logger.silent do
    return file.data['excerpt'].to_s.strip
  end
rescue StandardError
  nil
end

.excerpt_text(file) ⇒ `Object`

Public: Returns the text version of the excerpt

file - The Jekyll file

Only collections (including posts) have an excerpt. Pages don’t.

# File 'lib/jekyll/algolia/file_browser.rb', line 305

def self.excerpt_text(file)
  html = excerpt_html(file)
  Utils.html_to_text(html)
end

.excluded_from_config?(file) ⇒ `Boolean`

Public: Check if the file has been excluded by ‘files_to_exclude`

file - The Jekyll file

Returns:

(Boolean)

# File 'lib/jekyll/algolia/file_browser.rb', line 112

def self.excluded_from_config?(file)
  excluded_patterns = Configurator.algolia('files_to_exclude')
  jekyll_source = Configurator.get('source')
  path = absolute_path(file.path)

  excluded_patterns.each do |pattern|
    pattern = File.expand_path(File.join(jekyll_source, pattern))
    return true if File.fnmatch(pattern, path, File::FNM_PATHNAME)
  end
  false
end

.excluded_from_hook?(file) ⇒ `Boolean`

Public: Check if the file has been excluded by running a custom user hook

file - The Jekyll file

Returns:

(Boolean)



128
129
130

# File 'lib/jekyll/algolia/file_browser.rb', line 128

def self.excluded_from_hook?(file)
  Hooks.should_be_excluded?(file.path)
end

.indexable?(file) ⇒ `Boolean`

Public: Check if the file should be indexed

file - The Jekyll file

There are many reasons a file should not be indexed. We need to exclude all the static assets, only keep the actual content.

Returns:

(Boolean)

# File 'lib/jekyll/algolia/file_browser.rb', line 50

def self.indexable?(file)
  return false if static_file?(file)
  return false if is_404?(file)
  return false if redirect?(file)
  return false unless allowed_extension?(file)
  return false if excluded_from_config?(file)
  return false if excluded_from_hook?(file)

  true
end

.is_404?(file) ⇒ `Boolean`

Public: Check if the file is a 404 error page

file - The Jekyll file

404 pages are not Jekyll defaults but a convention adopted by GitHub pages. We don’t want to index those. Source: help.github.com/articles/creating-a-custom-404-page-for-your-github-pages-site/

rubocop:disable Naming/PredicateName

Returns:

(Boolean)



79
80
81

# File 'lib/jekyll/algolia/file_browser.rb', line 79

def self.is_404?(file)
  ['404.md', '404.html'].include?(File.basename(file.path))
end

.metadata(file) ⇒ `Object`

Public: Return a hash of all the file metadata

file - The Jekyll file

It contains both the raw metadata extracted from the front-matter, as well as more specific fields like the collection name, date timestamp, slug, type and url

# File 'lib/jekyll/algolia/file_browser.rb', line 139

def self.metadata(file)
  raw_data = raw_data(file)
  specific_data = {
    collection: collection(file),
    tags: tags(file),
    categories: categories(file),
    date: date(file),
    excerpt_html: excerpt_html(file),
    excerpt_text: excerpt_text(file),
    slug: slug(file),
    type: type(file),
    url: url(file)
  }

  metadata = Utils.compact_empty(raw_data.merge(specific_data))

  metadata
end

.raw_data(file) ⇒ `Object`

Note that even if you define tags and categories in a collection item, it will not be included in the data. It’s always an empty array.

# File 'lib/jekyll/algolia/file_browser.rb', line 168

def self.raw_data(file)
  data = file.data.clone

  # Remove all keys where we have a specific getter
  data.each_key do |key|
    data.delete(key) if respond_to?(key)
  end
  data.delete('excerpt')

  # Delete other keys added by Jekyll that are not in the front-matter and
  # not needed for search
  data.delete('draft')
  data.delete('ext')

  # Convert all values to a version that can be serialized to JSON
  data = Utils.jsonify(data)

  # Convert all keys to symbols
  data = Utils.keys_to_symbols(data)

  data
end

.redirect?(file) ⇒ `Boolean`

Public: Check if the file is redirect page

file - The Jekyll file

Plugins like jekyll-redirect-from add dynamic pages that only contain an HTML meta refresh. We need to exclude those files from indexing. github.com/jekyll/jekyll-redirect-from

Returns:

(Boolean)



91
92
93

# File 'lib/jekyll/algolia/file_browser.rb', line 91

def self.redirect?(file)
  file.respond_to?(:name) && file.name == 'redirect.html'
end

.relative_path(filepath) ⇒ `Object`

Public: Return the path of a Jekyll file relative to the Jekyll source

file - The Jekyll file to inspect

# File 'lib/jekyll/algolia/file_browser.rb', line 30

def self.relative_path(filepath)
  pathname = Pathname.new(filepath)
  config_source = Configurator.get('source') || ''
  jekyll_source = Pathname.new(File.expand_path(config_source))

  # Removing any starting ./
  if pathname.relative?
    fullpath = File.expand_path(File.join(jekyll_source, pathname))
    return fullpath.gsub(%r{^#{jekyll_source}/}, '')
  end

  pathname.relative_path_from(jekyll_source).cleanpath.to_s
end

.slug(file) ⇒ `Object`

Public: Returns the slug of the file

file - The Jekyll file

Slugs can be automatically extracted from collections, but for other files, we have to create them from the basename

# File 'lib/jekyll/algolia/file_browser.rb', line 316

def self.slug(file)
  # We get the real slug from the file data if available
  return file.data['slug'] if file.data.key?('slug')

  # We create it ourselves from the filepath otherwise
  File.basename(file.path, File.extname(file.path)).downcase
end

.static_file?(file) ⇒ `Boolean`

Public: Check if the specified file is a static Jekyll asset

file - The Jekyll file

We don’t index static assets (js, css, images)

Returns:

(Boolean)



66
67
68

# File 'lib/jekyll/algolia/file_browser.rb', line 66

def self.static_file?(file)
  file.is_a?(Jekyll::StaticFile)
end

.tags(file) ⇒ `Object`

Public: Returns the list of tags of a file, defaults to an empty array

file - The Jekyll file



216
217
218

# File 'lib/jekyll/algolia/file_browser.rb', line 216

def self.tags(file)
  file.data['tags'] || []
end

.type(file) ⇒ `Object`

Public: Get the type of the document (page, post, collection, etc)

file - The Jekyll file

Pages are simple html and markdown documents in the tree Elements from a collection are called Documents Posts are a custom kind of Documents

# File 'lib/jekyll/algolia/file_browser.rb', line 198

def self.type(file)
  type = file.class.name.split('::')[-1].downcase

  type = 'post' if type == 'document' && file.collection.label == 'posts'

  type
end

.url(file) ⇒ `Object`

Public: Returns the url of the file, starting from the root

file - The Jekyll file



209
210
211

# File 'lib/jekyll/algolia/file_browser.rb', line 209

def self.url(file)
  file.url
end

.use_default_excerpt?(file) ⇒ `Boolean`

Public: Return true if the Jekyll default excerpt should be used for this file

file - The Jekyll file

Most of the time, we’ll use our own excerpt (the first matching element), but in some cases, we’ll fallback to Jekyll’s default excerpt if it seems to be what the user wants

Returns:

(Boolean)

# File 'lib/jekyll/algolia/file_browser.rb', line 273

def self.use_default_excerpt?(file)
  # Only posts can have excerpt
  return false unless type(file) == 'post'

  # User defined their own separator in the config
  custom_separator = file.excerpt_separator.to_s.strip
  return false if custom_separator.empty?

  # This specific post contains this separator
  file.content.include?(custom_separator)
end

Module: Jekyll::Algolia::FileBrowser

Overview

Constant Summary

Constants included from Jekyll::Algolia

Class Method Summary collapse

Methods included from Jekyll::Algolia

Class Method Details

.absolute_path(filepath) ⇒ Object

.allowed_extension?(file) ⇒ Boolean

.categories(file) ⇒ Object

.collection(file) ⇒ Object

.date(file) ⇒ Object

.excerpt_html(file) ⇒ Object

.excerpt_raw(file) ⇒ Object

.excerpt_text(file) ⇒ Object

.excluded_from_config?(file) ⇒ Boolean

.excluded_from_hook?(file) ⇒ Boolean

.indexable?(file) ⇒ Boolean

.is_404?(file) ⇒ Boolean

.metadata(file) ⇒ Object

.raw_data(file) ⇒ Object

.redirect?(file) ⇒ Boolean

.relative_path(filepath) ⇒ Object

.slug(file) ⇒ Object

.static_file?(file) ⇒ Boolean

.tags(file) ⇒ Object

.type(file) ⇒ Object

.url(file) ⇒ Object

.use_default_excerpt?(file) ⇒ Boolean

.absolute_path(filepath) ⇒ `Object`

.allowed_extension?(file) ⇒ `Boolean`

.categories(file) ⇒ `Object`

.collection(file) ⇒ `Object`

.date(file) ⇒ `Object`

.excerpt_html(file) ⇒ `Object`

.excerpt_raw(file) ⇒ `Object`

.excerpt_text(file) ⇒ `Object`

.excluded_from_config?(file) ⇒ `Boolean`

.excluded_from_hook?(file) ⇒ `Boolean`

.indexable?(file) ⇒ `Boolean`

.is_404?(file) ⇒ `Boolean`

.metadata(file) ⇒ `Object`

.raw_data(file) ⇒ `Object`

.redirect?(file) ⇒ `Boolean`

.relative_path(filepath) ⇒ `Object`

.slug(file) ⇒ `Object`

.static_file?(file) ⇒ `Boolean`

.tags(file) ⇒ `Object`

.type(file) ⇒ `Object`

.url(file) ⇒ `Object`

.use_default_excerpt?(file) ⇒ `Boolean`