HtmlAwareTruncation
Yet another ruby html-aware truncation routine. Truncate HTML to max text characters, resulting in still legal HTML without any unclosed tags etc.
I was unable to find an existing solution that met my needs:
- Uses nokogiri (cause it's really good at handling somewhat invalid HTML input, and you probably already have it as a dependency)
- Does not monkey-patch nokogiri or String or anything else.
- Follows Rails truncate helper semantics, including a custom :separator that can be a string or regex, usually for word boundaries.
Usage
require 'html_aware_truncation'
string = "<p>Lots of html <b>with bolded stuff</b></p>"
HtmlAwareTruncation.truncate_html(string, length: 10)
# => "<p>Lots of h…</p>"
HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/)
# => "<p>Lots of …</p>"
HtmlAwareTruncation.truncate_html(string, length: 10, separator: /\b/, omission: '--')
# => "<p>Lots of --</p>"
If you already have a Nokogiri node, or want to do the Nokogiri
parsing and serialization yourself, you can pass a single Nokogiri node
to truncate_nokogiri_node
. Often a Nokogiri::HTML::DocumentFragment
makes sense:
node = Nokogiri::HTML::DocumentFragment.parse(some_html_str)
HtmlAwareTruncation.truncate_nokogiri_node(some_html_str, length: 10)
# => Returns a Nokogiri node, may mutate original passed in, not entirely sure.
For convenience, you can include
the HtmlAwareTruncation
module, to
get it's methods as mixins.
require 'html_aware_truncation'
class Something
include HtmlAwareTruncation
def something
truncate_html(whatever)
end
end
Known problems
This isn't perfect, but it's good enough for me to use in several production apps. In edge cases, it may sometimes:
- May in some cases be an extra character (or a few) above the specified
length
limit (off by one error maybe?) - put the omission mark in a node of it's own, which is kind of silly:
"<p>Stuff <b>…</b></p>"
- leave one or more empty nodes at the end:
"<p>Stuff and...<b></b></p>"
- Put the omission mark in a tag/node that really ought not to have text content: `"
- stuff …
Some specs marked pending
demonstrate some "bad behavior", but there may be others un-tested.
In general though, this has not caused me real problems in production, it works out. I still find this preferable to other alternative gems I know about, so I packaged it up in case you do too. Patches welcome.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/jrochkind/html_aware_truncation.
License
The gem is available as open source under the terms of the MIT License.
Alternatives
I adapted some code or tests from some of these. I mostly adapted from an example in a blog post now only in the wayback machine. Alternative examples can also be useful to look at to see how/if they solve the known problems with this gem, for ideas.