SimpleInlineTextAnnotation (Ruby gem)
SimpleInlineTextAnnotation is a Ruby gem designed for working with inline text annotations. It allows you to parse and generate annotated text in a structured and efficient way.
Installation
To use this gem in a Rails application, add the following line to your application's Gemfile:
gem 'simple_inline_text_annotation'
Then, run the following command to install the gem:
bundle install
Usage
The SimpleInlineTextAnnotation gem provides two main methods: parse and generate. These methods allow you to work with inline text annotations in a structured way.
parse Method
The parse method takes a string with inline annotations and extracts structured information about the annotations, including the character positions and annotation types.
Example
result = SimpleInlineTextAnnotation.parse('[Elon Musk][Person] is a member of the [PayPal Mafia][Organization].')
puts result
# => {
# text: "Elon Musk is a member of the PayPal Mafia.",
# denotations: [
# {span: {begin: 0, end: 9}, obj: "Person"},
# {span: {begin: 29, end: 41}, obj: "Organization"}
# ]
# }
Explanation
The input string
[Elon Musk][Person] is a member of the [PayPal Mafia][Organization].contains two annotations:[Elon Musk][Person]: The textElon Muskis annotated asPerson.[PayPal Mafia][Organization]: The textPayPal Mafiais annotated asOrganization.
The method returns a hash with:
"text": The plain text without annotations."denotations": An array of hashes, where each hash contains:"span": The character positions (beginandend) of the annotated text."obj": The annotation type.
generate Method
The generate method performs the reverse operation of parse. It takes a hash containing the plain text and its annotations, and generates a string with inline annotations.
Example
result = SimpleInlineTextAnnotation.generate({
"text" => "Elon Musk is a member of the PayPal Mafia.",
"denotations" => [
{ "span" => { "begin" => 0, "end" => 9 }, "obj" => "Person" },
{ "span" => { "begin" => 29, "end" => 41 }, "obj" => "Organization" }
]
})
puts result
# => "[Elon Musk][Person] is a member of the [PayPal Mafia][Organization]."
Explanation
- The input hash contains:
"text": The plain text ("Elon Musk is a member of the PayPal Mafia.")."denotations": An array of hashes, where each hash specifies:"span": The character positions (beginandend) of the annotated text."obj": The annotation type.
- The method generates a string where:
- The text specified in
"span"is enclosed in square brackets[]. - The annotation type specified in
"obj"is added in a second set of square brackets[].
- The text specified in
Relation Annotation
The SimpleInlineTextAnnotation gem supports advanced relation annotation, allowing you to define relationships between annotated entities. This is achieved by interpreting the second set of square brackets ([]) based on the number of elements it contains.
Parsing Rules
- If the second
[]contains 1 element, it is treated as the annotation type (default behavior). - If the second
[]contains 2 elements, the first element is interpreted as theidof the denotation, and the second element as theobj(annotation type). - If the second
[]contains 4 elements, the elements are interpreted as follows:- The first element is the
idof the denotation and thesubjof the relation. - The second element is the
obj(annotation type) of the denotation. - The third element is the
pred(predicate) of the relation. - The fourth element is the
objof the relation.
- The first element is the
- Any other cases are ignored.
Example
source = "[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization]."
result = SimpleInlineTextAnnotation.parse(source)
puts result
# => {
# text: "Elon Musk is a member of the PayPal Mafia.",
# denotations: [
# { id: "T1", span: { begin: 0, end: 9 }, obj: "Person" },
# { id: "T2", span: { begin: 29, end: 41 }, obj: "Organization" }
# ],
# relations: [
# { pred: "member_of", subj: "T1", obj: "T2" }
# ]
# }
Explanation
The input string
[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization].contains:- Two denotations:
[Elon Musk][T1, Person, member_of, T2]: The textElon Muskis annotated asPersonwithidT1. It also serves as thesubjof the relation.[PayPal Mafia][T2, Organization]: The textPayPal Mafiais annotated asOrganizationwithidT2.
- One relation:
member_of: Indicates thatT1(Elon Musk) is a member ofT2(PayPal Mafia).
- Two denotations:
The method returns a hash with:
"text": The plain text without annotations."denotations": An array of hashes, where each hash contains:"id": The unique identifier of the denotation."span": The character positions (beginandend) of the annotated text."obj": The annotation type."relations": An array of hashes, where each hash contains:"pred": The predicate or type of the relation."subj": Theidof the subject denotation."obj": Theidof the object denotation.
Generating Relation Annotation
The generate method can also create strings with relation annotations from structured data.
result = SimpleInlineTextAnnotation.generate({
"text" => "Elon Musk is a member of the PayPal Mafia.",
"denotations" => [
{ "id" => "T1", "span" => { "begin" => 0, "end" => 9 }, "obj" => "Person" },
{ "id" => "T2", "span" => { "begin" => 29, "end" => 41 }, "obj" => "Organization" }
],
"relations" => [
{ "pred" => "member_of", "subj" => "T1", "obj" => "T2" }
]
})
puts result
# => "[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization]."
Explanation
- The input hash includes:
"text": The plain text."denotations": An array of annotations withid,span, andobj."relations": An array of relationships, where:"subj"and"obj"referenceids in thedenotationsarray."pred"specifies the relationship type.
- The method generates a string with inline annotations and relationships.