Module: Jekyll::Algolia::FileBrowser
- Includes:
- Jekyll::Algolia
- Defined in:
- lib/jekyll/algolia/file_browser.rb
Overview
Module to get information about Jekyll file. Jekyll handles posts, pages, collection, etc. They each need specific processing, so knowing which kind of file we’re working on will help.
We also do not index all files. This module will help in defining which files should be indexed and which should not.
Constant Summary
Constants included from Jekyll::Algolia
MissingCredentialsError, VERSION
Class Method Summary collapse
-
.absolute_path(filepath) ⇒ Object
Public: Return the absolute path of a Jekyll file.
-
.allowed_extension?(file) ⇒ Boolean
Public: Check if the file has one of the allowed extensions.
-
.categories(file) ⇒ Object
Public: Returns the list of tags of a file, defaults to an empty array.
-
.collection(file) ⇒ Object
Public: Returns the name of the collection.
-
.date(file) ⇒ Object
Public: Returns a timestamp of the file date.
-
.excerpt_html(file) ⇒ Object
Public: Returns the HTML version of the excerpt.
-
.excerpt_raw(file) ⇒ Object
Public: Returns the raw excerpt of a file, directly as returned by Jekyll.
-
.excerpt_text(file) ⇒ Object
Public: Returns the text version of the excerpt.
-
.excluded_from_config?(file) ⇒ Boolean
Public: Check if the file has been excluded by ‘files_to_exclude`.
-
.excluded_from_hook?(file) ⇒ Boolean
Public: Check if the file has been excluded by running a custom user hook.
-
.indexable?(file) ⇒ Boolean
Public: Check if the file should be indexed.
-
.is_404?(file) ⇒ Boolean
Public: Check if the file is a 404 error page.
-
.metadata(file) ⇒ Object
Public: Return a hash of all the file metadata.
-
.raw_data(file) ⇒ Object
Note that even if you define tags and categories in a collection item, it will not be included in the data.
-
.redirect?(file) ⇒ Boolean
Public: Check if the file is redirect page.
-
.relative_path(filepath) ⇒ Object
Public: Return the path of a Jekyll file relative to the Jekyll source.
-
.slug(file) ⇒ Object
Public: Returns the slug of the file.
-
.static_file?(file) ⇒ Boolean
Public: Check if the specified file is a static Jekyll asset.
-
.tags(file) ⇒ Object
Public: Returns the list of tags of a file, defaults to an empty array.
-
.type(file) ⇒ Object
Public: Get the type of the document (page, post, collection, etc).
-
.url(file) ⇒ Object
Public: Returns the url of the file, starting from the root.
-
.use_default_excerpt?(file) ⇒ Boolean
Public: Return true if the Jekyll default excerpt should be used for this file.
Methods included from Jekyll::Algolia
init, load_overwrites, run, site
Class Method Details
.absolute_path(filepath) ⇒ Object
Public: Return the absolute path of a Jekyll file
file - The Jekyll file to inspect
21 22 23 24 25 26 |
# File 'lib/jekyll/algolia/file_browser.rb', line 21 def self.absolute_path(filepath) pathname = Pathname.new(filepath) return pathname.cleanpath.to_s if pathname.absolute? File.(File.join(Configurator.get('source'), filepath)) end |
.allowed_extension?(file) ⇒ Boolean
Public: Check if the file has one of the allowed extensions
file - The Jekyll file
Jekyll can transform markdown files to HTML by default. With plugins, it can convert many more file formats. By default we’ll only index markdown and raw HTML files but this list can be extended using the ‘extensions_to_index` config option.
112 113 114 115 116 |
# File 'lib/jekyll/algolia/file_browser.rb', line 112 def self.allowed_extension?(file) extensions = Configurator.extensions_to_index extname = File.extname(file.path)[1..-1] extensions.include?(extname) end |
.categories(file) ⇒ Object
Public: Returns the list of tags of a file, defaults to an empty array
file - The Jekyll file
232 233 234 |
# File 'lib/jekyll/algolia/file_browser.rb', line 232 def self.categories(file) file.data['categories'] || [] end |
.collection(file) ⇒ Object
Public: Returns the name of the collection
file - The Jekyll file
Only collection documents can have a collection name. Pages don’t. Posts are purposefully excluded from it as well even if they are technically part of a collection
351 352 353 354 355 356 357 358 359 360 361 |
# File 'lib/jekyll/algolia/file_browser.rb', line 351 def self.collection(file) return nil unless file.respond_to?(:collection) collection_name = file.collection.label # Posts are a special kind of collection, but it's an implementation # detail from my POV, so I'll exclude them return nil if collection_name == 'posts' collection_name end |
.date(file) ⇒ Object
Public: Returns a timestamp of the file date
file - The Jekyll file
Posts have their date coming from the filepath, or the front-matter. Pages and other collection items can only have a date set in front-matter.
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
# File 'lib/jekyll/algolia/file_browser.rb', line 243 def self.date(file) # Collections get their date from .date, while pages read it from .data. # Jekyll by default will set the date of collection to the current date, # but we monkey-patched that so it returns nil for collection items date = if file.respond_to?(:date) file.date else file.data['date'] end return nil if date.nil? # If date is a string, we try to parse it if date.is_a? String begin date = Time.parse(date) rescue StandardError return nil end end date.to_time.to_i end |
.excerpt_html(file) ⇒ Object
Public: Returns the HTML version of the excerpt
file - The Jekyll file
307 308 309 310 311 312 313 314 315 316 317 318 |
# File 'lib/jekyll/algolia/file_browser.rb', line 307 def self.excerpt_html(file) # If it's a post with a custom separator for the excerpt, we honor it return excerpt_raw(file) if use_default_excerpt?(file) # Otherwise we take the first matching node html = file.content selector = Configurator.algolia('nodes_to_index') first_node = Nokogiri::HTML(html).css(selector).first return nil if first_node.nil? first_node.to_s end |
.excerpt_raw(file) ⇒ Object
Public: Returns the raw excerpt of a file, directly as returned by Jekyll. Swallow any error that could occur when reading.
file - The Jekyll file
This might throw an exception if the excerpt is invalid. We also silence all logger output as Jekyll is quite verbose and will display the potential Liquid error in the terminal, even if we catch the actual error.
276 277 278 279 280 281 282 |
# File 'lib/jekyll/algolia/file_browser.rb', line 276 def self.excerpt_raw(file) Logger.silent do return file.data['excerpt'].to_s.strip end rescue StandardError nil end |
.excerpt_text(file) ⇒ Object
Public: Returns the text version of the excerpt
file - The Jekyll file
Only collections (including posts) have an excerpt. Pages don’t.
325 326 327 328 |
# File 'lib/jekyll/algolia/file_browser.rb', line 325 def self.excerpt_text(file) html = excerpt_html(file) Utils.html_to_text(html) end |
.excluded_from_config?(file) ⇒ Boolean
Public: Check if the file has been excluded by ‘files_to_exclude`
file - The Jekyll file
121 122 123 124 125 126 127 128 129 130 131 |
# File 'lib/jekyll/algolia/file_browser.rb', line 121 def self.excluded_from_config?(file) excluded_patterns = Configurator.algolia('files_to_exclude') jekyll_source = Configurator.get('source') path = absolute_path(file.path) excluded_patterns.each do |pattern| pattern = File.(File.join(jekyll_source, pattern)) return true if File.fnmatch(pattern, path, File::FNM_PATHNAME) end false end |
.excluded_from_hook?(file) ⇒ Boolean
Public: Check if the file has been excluded by running a custom user hook
file - The Jekyll file
137 138 139 |
# File 'lib/jekyll/algolia/file_browser.rb', line 137 def self.excluded_from_hook?(file) Hooks.should_be_excluded?(file.path) end |
.indexable?(file) ⇒ Boolean
Public: Check if the file should be indexed
file - The Jekyll file
There are many reasons a file should not be indexed. We need to exclude all the static assets, only keep the actual content.
51 52 53 54 55 56 57 58 59 60 |
# File 'lib/jekyll/algolia/file_browser.rb', line 51 def self.indexable?(file) return false if static_file?(file) return false if is_404?(file) return false if redirect?(file) return false unless allowed_extension?(file) return false if excluded_from_config?(file) return false if excluded_from_hook?(file) true end |
.is_404?(file) ⇒ Boolean
Public: Check if the file is a 404 error page
file - The Jekyll file
404 pages are not Jekyll defaults but a convention adopted by GitHub pages. We don’t want to index those. Source: help.github.com/articles/creating-a-custom-404-page-for-your-github-pages-site/
rubocop:disable Naming/PredicateName
80 81 82 |
# File 'lib/jekyll/algolia/file_browser.rb', line 80 def self.is_404?(file) ['404.md', '404.html'].include?(File.basename(file.path)) end |
.metadata(file) ⇒ Object
Public: Return a hash of all the file metadata
file - The Jekyll file
It contains both the raw metadata extracted from the front-matter, as well as more specific fields like the collection name, date timestamp, slug, type and url
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/jekyll/algolia/file_browser.rb', line 148 def self.(file) raw_data = raw_data(file) specific_data = { collection: collection(file), tags: (file), categories: categories(file), date: date(file), excerpt_html: excerpt_html(file), excerpt_text: excerpt_text(file), slug: slug(file), type: type(file), url: url(file) } = Utils.compact_empty(raw_data.merge(specific_data)) end |
.raw_data(file) ⇒ Object
Note that even if you define tags and categories in a collection item, it will not be included in the data. It’s always an empty array.
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/jekyll/algolia/file_browser.rb', line 177 def self.raw_data(file) data = file.data.clone # Remove all keys where we have a specific getter data.each_key do |key| data.delete(key) if respond_to?(key) end data.delete('excerpt') # Delete other keys added by Jekyll that are not in the front-matter and # not needed for search data.delete('draft') data.delete('ext') # Convert all values to a version that can be serialized to JSON data = Utils.jsonify(data) # Convert all keys to symbols data = Utils.keys_to_symbols(data) data end |
.redirect?(file) ⇒ Boolean
Public: Check if the file is redirect page
file - The Jekyll file
Plugins like jekyll-redirect-from add dynamic pages that only contain an HTML meta refresh. We need to exclude those files from indexing. github.com/jekyll/jekyll-redirect-from
92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/jekyll/algolia/file_browser.rb', line 92 def self.redirect?(file) # When using redirect_from, jekyll-redirect-from creates a page named # `redirect.html` return true if file.respond_to?(:name) && file.name == 'redirect.html' # When using redirect_to, it sets the layout to `redirect` if file.respond_to?(:data) && file.data['layout'] == 'redirect' return true end false end |
.relative_path(filepath) ⇒ Object
Public: Return the path of a Jekyll file relative to the Jekyll source
file - The Jekyll file to inspect
31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/jekyll/algolia/file_browser.rb', line 31 def self.relative_path(filepath) pathname = Pathname.new(filepath) config_source = Configurator.get('source') || '' jekyll_source = Pathname.new(File.(config_source)) # Removing any starting ./ if pathname.relative? fullpath = File.(File.join(jekyll_source, pathname)) return fullpath.gsub(%r{^#{jekyll_source}/}, '') end pathname.relative_path_from(jekyll_source).cleanpath.to_s end |
.slug(file) ⇒ Object
Public: Returns the slug of the file
file - The Jekyll file
Slugs can be automatically extracted from collections, but for other files, we have to create them from the basename
336 337 338 339 340 341 342 |
# File 'lib/jekyll/algolia/file_browser.rb', line 336 def self.slug(file) # We get the real slug from the file data if available return file.data['slug'] if file.data.key?('slug') # We create it ourselves from the filepath otherwise File.basename(file.path, File.extname(file.path)).downcase end |
.static_file?(file) ⇒ Boolean
Public: Check if the specified file is a static Jekyll asset
file - The Jekyll file
We don’t index static assets (js, css, images)
67 68 69 |
# File 'lib/jekyll/algolia/file_browser.rb', line 67 def self.static_file?(file) file.is_a?(Jekyll::StaticFile) end |
.tags(file) ⇒ Object
Public: Returns the list of tags of a file, defaults to an empty array
file - The Jekyll file
225 226 227 |
# File 'lib/jekyll/algolia/file_browser.rb', line 225 def self.(file) file.data['tags'] || [] end |
.type(file) ⇒ Object
Public: Get the type of the document (page, post, collection, etc)
file - The Jekyll file
Pages are simple html and markdown documents in the tree Elements from a collection are called Documents Posts are a custom kind of Documents
207 208 209 210 211 212 213 |
# File 'lib/jekyll/algolia/file_browser.rb', line 207 def self.type(file) type = file.class.name.split('::')[-1].downcase type = 'post' if type == 'document' && file.collection.label == 'posts' type end |
.url(file) ⇒ Object
Public: Returns the url of the file, starting from the root
file - The Jekyll file
218 219 220 |
# File 'lib/jekyll/algolia/file_browser.rb', line 218 def self.url(file) file.url end |
.use_default_excerpt?(file) ⇒ Boolean
Public: Return true if the Jekyll default excerpt should be used for this file
file - The Jekyll file
Most of the time, we’ll use our own excerpt (the first matching element), but in some cases, we’ll fallback to Jekyll’s default excerpt if it seems to be what the user wants
292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/jekyll/algolia/file_browser.rb', line 292 def self.use_default_excerpt?(file) # Only posts can have excerpt return false unless type(file) == 'post' # User defined their own separator in the config custom_separator = file.excerpt_separator.to_s.strip return false if custom_separator.empty? # This specific post contains this separator file.content.include?(custom_separator) end |