Create Web Crawler Job

Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.

Headers

AuthorizationstringRequired

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.
applies_to_partslist of stringsRequired

The parts to which created webpage/articles during this crawler job will be linked to.

accept_regexeslist of stringsOptional
The list of regexes a URL must satisfy to be crawled.
descriptionstringOptionalformat: "text"
The description of the job.
domain_nameslist of stringsOptional
The list of allowed domain names to crawl.
frequencyintegerOptional

Number of days between re-sync job runs. If 0, the job will run only once.

max_depthintegerOptional
The maximum depth to crawl.
notify_on_completebooleanOptional
Whether to notify the user when the job is complete. Default is true.
reject_regexeslist of stringsOptional
The list of regexes which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.
sitemap_index_urlslist of stringsOptional
The list of sitemap index URLs to crawl.
sitemap_urlslist of stringsOptional
The list of sitemap URLs to crawl.
urlslist of stringsOptional
The list of URLs to crawl.
user_agentstringOptionalformat: "text"<=1024 characters
User agent to use for crawling websites in this job.
accept_regexstringOptionalformat: "text"Deprecated
The regex a URL must satisfy to be crawled.
reject_regexstringOptionalformat: "text"Deprecated
The regex which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

Response

The response to create a web crawler job.
web_crawler_jobobject

Errors