Class: Atlas2ipynb::Converter
- Inherits:
-
Object
- Object
- Atlas2ipynb::Converter
- Defined in:
- lib/atlas2ipynb.rb
Instance Method Summary collapse
-
#convert! ⇒ Object
************************************************************************************* Convert all chapter files in the directory into ipynb *************************************************************************************.
-
#html_to_ipynb(fn) ⇒ Object
************************************************************************************* This function takes a file name with HTML content, parses it with nokogiri, does some post-processing on the image links, and then calls process_section to convert each element to the corrseponding ipynb cell type (markdown, header, or code) *************************************************************************************.
-
#process_section(n, level, out) ⇒ Object
************************************************************************************* This function processes the raw HTML sections using nokogiri.
-
#string_to_filename(s) ⇒ Object
************************************************************************************* ipynb uses the plain filename as the user’s index page, so we need to make something a name that is “meaningful”.
Instance Method Details
#convert! ⇒ Object
************************************************************************************* Convert all chapter files in the directory into ipynb *************************************************************************************
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/atlas2ipynb.rb', line 128 def convert! puts "Searching for ch*.html files from Atlas yeah" Dir["ch*.html"].each do |fn| out = html_to_ipynb(fn) # Compute the new filename, which is the original filename # with the ".html" (last 5 chars) replaced with ".ipynb". title_fn = self.string_to_filename(out['metadata']['name']) ipynb_fn = "#{fn[0,fn.length-5]}_#{title_fn}.ipynb" puts "... Converting #{fn} to #{ipynb_fn}" # Create the file f = File.open(ipynb_fn, 'w') f.write JSON.pretty_generate(out) f.close end puts "Done!" end |
#html_to_ipynb(fn) ⇒ Object
************************************************************************************* This function takes a file name with HTML content, parses it with nokogiri, does some post-processing on the image links, and then calls process_section to convert each element to the corrseponding ipynb cell type (markdown, header, or code) *************************************************************************************
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/atlas2ipynb.rb', line 89 def html_to_ipynb(fn) # # Open the file and parse it w/nokogiri doc = Nokogiri::HTML(IO.read(fn), nil, 'utf-8') # # Pre-process the doc to fix image URLs so that images can be served by the notebook server # You do this by prepending "files/" to the image's relative URL, per this question on stackoverflow: # "inserting image into ipython notebook markdown" doc.css("figure img").each do |img| src = img.attributes["src"].value img.attributes["src"].value = src.split("/").unshift("files").join("/") #prepends "files" to the src end # # Grab the first h1 tag to use as part of the notebooks filename # chapter_title = doc.css("section h1").first.text # # combine the cells we just computed with the ipynb header information # notebook = { "metadata" => { "name" => chapter_title }, "nbformat" => 3, "nbformat_minor"=> 0, "worksheets" => [ { "cells" => self.process_section(doc.css("section").first,1, []), "metadata" => {} } ] } return notebook end |
#process_section(n, level, out) ⇒ Object
************************************************************************************* This function processes the raw HTML sections using nokogiri. It looks for headers or code; everything else is treated as HTML. This can be passed to ipynb directly since markdown is a superset of HTML. Although I’d originally planned to convert HTML to markdown, this proved infeasible with all the many edge cases in conversion, such has mathml *************************************************************************************
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/atlas2ipynb.rb', line 44 def process_section(n, level, out) n.children.each do |c| case c.name when "section", "span", "div" process_section(c, level+1, out) # since ipynb is flat, we need to recurse on sections when "h1", "h2", "h3", "h4", "h4", "h5", "h6" out << { "cell_type" => "heading", "level" => level, "metadata" => {}, "source" => c.text } when "pre", "code" # test for the code type. If it's not python, then insert the # appropriate cell magics listing = c.text code_type = c.attributes["data-code-language"] || "python" if code_type != "python" listing = "%%#{code_type}\n#{listing}" end out << { "cell_type" => "code", "collapsed" => false, "input" => listing, "language" => code_type, "metadata" => {}, "outputs" => [] } else out << { "cell_type" => "markdown", "metadata" => {}, "source" => c.to_s } end end return out end |
#string_to_filename(s) ⇒ Object
************************************************************************************* ipynb uses the plain filename as the user’s index page, so we need to make something a name that is “meaningful”. To do this, I turn the <h1> from the first section, which should be the chapter title, into a filename, per
http://stackoverflow.com/questions/1939333/how-to-make-a-ruby-string-safe-for-a-filesystem
Also, since we might have special chars and foreign language isseus, we transliterate the string to strip out accents and such, per:
http://stackoverflow.com/questions/225471/how-do-i-replace-accented-latin-characters-in-ruby
(I’m not sure what it will do with CJKV languages.) Finally, it ensures that the string is less than 50 chars long, but breaks it on a word boundry so that it looks “nice” *************************************************************************************
26 27 28 29 30 31 32 33 34 |
# File 'lib/atlas2ipynb.rb', line 26 def string_to_filename(s) I18n.enforce_available_locales = false out = ActiveSupport::Inflector.transliterate(s).downcase out.gsub!(/^.*(\\|\/)/,'') out.gsub!(/[^0-9A-Za-z]/,"_") # truncate the name at 50 chars, but do it "nicely" candidate = out[0,50].split("_") - [""] return candidate.join("_") end |