Ruby Emoji Regex π
A pair of Ruby regular expressions for matching Unicode Emoji symbols.
Background
This is based upon the fantastic work from Mathias Bynens' emoji-regex
Javascript package. emoji-regex
is cleverly assembled based upon data from the Unicode Consortium.
The regular expressions provided herein are derived from that pacakge.
Installation
gem install emoji_regex
Usage
emoji_regex
provides two regular expressions:
EmojiRegex::Regex
matches emoji which present as emoji by default, and those which present as emoji when combined withU+FE0F VARIATION SELECTOR-16
.EmojiRegex::Text
matches emoji which present as text by default (regardless of variation selector), as well as those which present as emoji by default.
Emoji vs Text Presentation
Emoji_Presentation
is a property of emoji symbols, defined in Unicode Technical Report #51 which controls whether symbols are intended to be rendered as emoji by default.
Generally, for emoji which re-use Unicode code points which existed before Emoji itself was introduced to Unicode, Emoji_Presentation
is false
.
This means they should be displayed as monochrome text characters by default, and should be combined with U+FE0F VARIATION SELECTOR-16
to indicate emoji presentation is desired.
EmojiRegex::Regex
follows this Unicode Consortium guidance, while EmojiRegex::Text
matches anything that someone might possibly consider a Unicode emoji.
It's most likely that the regular expression you want is EmojiRegex::Regex
! βΊοΈ
Example
require 'emoji_regex'
text = <<TEXT
\u{231A}: β default emoji presentation character (Emoji_Presentation)
\u{2194}: β default text presentation character
\u{2194}\u{FE0F}: βοΈ default text presentation character with Emoji variation selector
\u{1F469}: π© emoji modifier base (Emoji_Modifier_Base)
\u{1F469}\u{1F3FF}: π©πΏ emoji modifier base followed by a modifier
TEXT
puts 'EmojiRegex::Regex'
text.scan EmojiRegex::Regex do |emoji|
puts "Matched sequence #{emoji} β code points: #{emoji.length}"
end
puts ''
puts 'EmojiRegex::Text'
text.scan EmojiRegex::Text do |emoji|
puts "Matched sequence #{emoji} β code points: #{emoji.length}"
end
Console output:
EmojiRegex::Regex
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence βοΈ β code points: 2
Matched sequence βοΈ β code points: 2
Matched sequence π© β code points: 1
Matched sequence π© β code points: 1
Matched sequence π©πΏ β code points: 2
Matched sequence π©πΏ β code points: 2
EmojiRegex::Text
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence β β code points: 1
Matched sequence βοΈ β code points: 2
Matched sequence βοΈ β code points: 2
Matched sequence π© β code points: 1
Matched sequence π© β code points: 1
Matched sequence π©πΏ β code points: 2
Matched sequence π©πΏ β code points: 2
Development
Requirements
Initial setup
To install all the Ruby and Javascript dependencies, you can run:
bin/setup
To update the Ruby source files based on the emoji-regex
library:
rake regenerate
Specs
A spec suite is provided, which can be run as:
rake spec
Creating a release
- Update the version in emoji_regex.gemspec
rake release