Method: Aws::Textract::Client#get_document_analysis

Defined in:: lib/aws-sdk-textract/client.rb

#get_document_analysis(params = {}) ⇒ `Types::GetDocumentAnalysisResponse`

Gets the results for an Amazon Textract asynchronous operation that analyzes text in a document.

You start asynchronous text analysis by calling StartDocumentAnalysis, which returns a job identifier (‘JobId`). When the text analysis operation finishes, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that’s registered in the initial call to ‘StartDocumentAnalysis`. To get the results of the text-detection operation, first check that the status value published to the Amazon SNS topic is `SUCCEEDED`. If so, call `GetDocumentAnalysis`, and pass the job identifier (`JobId`) from the initial call to `StartDocumentAnalysis`.

‘GetDocumentAnalysis` returns an array of Block objects. The following types of information are returned:

Form data (key-value pairs). The related information is returned in two Block objects, each of type ‘KEY_VALUE_SET`: a KEY `Block` object and a VALUE `Block` object. For example, *Name: Ana Silva Carolina* contains a key and value. Name: is the key. *Ana Silva Carolina* is the value.
Table and table cell data. A TABLE ‘Block` object contains information about a detected table. A CELL `Block` object is returned for each cell in a table.
Lines and words of text. A LINE ‘Block` object contains one or more WORD `Block` objects. All lines and words that are detected in the document are returned (including text that doesn’t have a relationship with the value of the ‘StartDocumentAnalysis` `FeatureTypes` input parameter).
Query. A QUERY Block object contains the query text, alias and link to the associated Query results block object.
Query Results. A QUERY_RESULT Block object contains the answer to the query and an ID that connects it to the query asked. This Block also contains a confidence score.

<note markdown=“1”> While processing a document with queries, look out for ‘INVALID_REQUEST_PARAMETERS` output. This indicates that either the per page query limit has been exceeded or that the operation is trying to query a page in the document which doesn’t exist.

</note>

Selection elements such as check boxes and option buttons (radio buttons) can be detected in form data and in tables. A SELECTION_ELEMENT ‘Block` object contains information about a selection element, including the selection status.

Use the ‘MaxResults` parameter to limit the number of blocks that are returned. If there are more results than specified in `MaxResults`, the value of `NextToken` in the operation response contains a pagination token for getting the next set of results. To get the next page of results, call `GetDocumentAnalysis`, and populate the `NextToken` request parameter with the token value that’s returned from the previous call to ‘GetDocumentAnalysis`.

For more information, see [Document Text Analysis].

[1]: docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html

Examples:

Request syntax with placeholder values


resp = client.get_document_analysis({
  job_id: "JobId", # required
  max_results: 1,
  next_token: "PaginationToken",
})

Response structure


resp.document_metadata.pages #=> Integer
resp.job_status #=> String, one of "IN_PROGRESS", "SUCCEEDED", "FAILED", "PARTIAL_SUCCESS"
resp.next_token #=> String
resp.blocks #=> Array
resp.blocks[0].block_type #=> String, one of "KEY_VALUE_SET", "PAGE", "LINE", "WORD", "TABLE", "CELL", "SELECTION_ELEMENT", "MERGED_CELL", "TITLE", "QUERY", "QUERY_RESULT", "SIGNATURE", "TABLE_TITLE", "TABLE_FOOTER", "LAYOUT_TEXT", "LAYOUT_TITLE", "LAYOUT_HEADER", "LAYOUT_FOOTER", "LAYOUT_SECTION_HEADER", "LAYOUT_PAGE_NUMBER", "LAYOUT_LIST", "LAYOUT_FIGURE", "LAYOUT_TABLE", "LAYOUT_KEY_VALUE"
resp.blocks[0].confidence #=> Float
resp.blocks[0].text #=> String
resp.blocks[0].text_type #=> String, one of "HANDWRITING", "PRINTED"
resp.blocks[0].row_index #=> Integer
resp.blocks[0].column_index #=> Integer
resp.blocks[0].row_span #=> Integer
resp.blocks[0].column_span #=> Integer
resp.blocks[0].geometry.bounding_box.width #=> Float
resp.blocks[0].geometry.bounding_box.height #=> Float
resp.blocks[0].geometry.bounding_box.left #=> Float
resp.blocks[0].geometry.bounding_box.top #=> Float
resp.blocks[0].geometry.polygon #=> Array
resp.blocks[0].geometry.polygon[0].x #=> Float
resp.blocks[0].geometry.polygon[0].y #=> Float
resp.blocks[0].geometry.rotation_angle #=> Float
resp.blocks[0].id #=> String
resp.blocks[0].relationships #=> Array
resp.blocks[0].relationships[0].type #=> String, one of "VALUE", "CHILD", "COMPLEX_FEATURES", "MERGED_CELL", "TITLE", "ANSWER", "TABLE", "TABLE_TITLE", "TABLE_FOOTER"
resp.blocks[0].relationships[0].ids #=> Array
resp.blocks[0].relationships[0].ids[0] #=> String
resp.blocks[0].entity_types #=> Array
resp.blocks[0].entity_types[0] #=> String, one of "KEY", "VALUE", "COLUMN_HEADER", "TABLE_TITLE", "TABLE_FOOTER", "TABLE_SECTION_TITLE", "TABLE_SUMMARY", "STRUCTURED_TABLE", "SEMI_STRUCTURED_TABLE"
resp.blocks[0].selection_status #=> String, one of "SELECTED", "NOT_SELECTED"
resp.blocks[0].page #=> Integer
resp.blocks[0].query.text #=> String
resp.blocks[0].query.alias #=> String
resp.blocks[0].query.pages #=> Array
resp.blocks[0].query.pages[0] #=> String
resp.warnings #=> Array
resp.warnings[0].error_code #=> String
resp.warnings[0].pages #=> Array
resp.warnings[0].pages[0] #=> Integer
resp.status_message #=> String
resp.analyze_document_model_version #=> String

Parameters:

params (Hash) (defaults to: {}) —

({})

Options Hash (params):

:job_id (required, String) —

A unique identifier for the text-detection job. The ‘JobId` is returned from `StartDocumentAnalysis`. A `JobId` value is only valid for 7 days.
:max_results (Integer) —

The maximum number of results to return per paginated call. The largest value that you can specify is 1,000. If you specify a value greater than 1,000, a maximum of 1,000 results is returned. The default value is 1,000.
:next_token (String) —

If the previous response was incomplete (because there are more blocks to retrieve), Amazon Textract returns a pagination token in the response. You can use this pagination token to retrieve the next set of blocks.

Returns:

(Types::GetDocumentAnalysisResponse) —
Returns a response object which responds to the following methods:
- #document_metadata => Types::DocumentMetadata
- #job_status => String
- #next_token => String
- #blocks => Array<Types::Block>
- #warnings => Array<Types::Warning>
- #status_message => String
- #analyze_document_model_version => String

Method: Aws::Textract::Client#get_document_analysis

#get_document_analysis(params = {}) ⇒ Types::GetDocumentAnalysisResponse

Examples:

Request syntax with placeholder values

Response structure

#get_document_analysis(params = {}) ⇒ `Types::GetDocumentAnalysisResponse`