Class: Aws::Kendra::Types::SeedUrlConfiguration

Inherits:
Struct
  • Object
show all
Includes:
Structure
Defined in:
lib/aws-sdk-kendra/types.rb

Overview

Provides the configuration information for the seed or starting point URLs to crawl.

*When selecting websites to index, you must adhere to the [Amazon Acceptable Use Policy] and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own web pages, or web pages that you have authorization to index.*

[1]: aws.amazon.com/aup/

Constant Summary collapse

SENSITIVE =
[]

Instance Attribute Summary collapse

Instance Attribute Details

#seed_urlsArray<String>

The list of seed or starting point URLs of the websites you want to crawl.

The list can include a maximum of 100 seed URLs.

Returns:

  • (Array<String>)


9201
9202
9203
9204
9205
9206
# File 'lib/aws-sdk-kendra/types.rb', line 9201

class SeedUrlConfiguration < Struct.new(
  :seed_urls,
  :web_crawler_mode)
  SENSITIVE = []
  include Aws::Structure
end

#web_crawler_modeString

You can choose one of the following modes:

  • ‘HOST_ONLY`—crawl only the website host names. For example, if the seed URL is “abc.example.com”, then only URLs with host name “abc.example.com” are crawled.

  • ‘SUBDOMAINS`—crawl the website host names with subdomains. For example, if the seed URL is “abc.example.com”, then “a.abc.example.com” and “b.abc.example.com” are also crawled.

  • ‘EVERYTHING`—crawl the website host names with subdomains and other domains that the web pages link to.

The default mode is set to ‘HOST_ONLY`.

Returns:

  • (String)


9201
9202
9203
9204
9205
9206
# File 'lib/aws-sdk-kendra/types.rb', line 9201

class SeedUrlConfiguration < Struct.new(
  :seed_urls,
  :web_crawler_mode)
  SENSITIVE = []
  include Aws::Structure
end