Create Web Crawler Job

POST

Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.

Request

This endpoint expects an object.
applies_to_partslist of stringsRequired

The parts to which created webpage/articles during this crawler job will be linked to.

accept_regexeslist of stringsOptional

The list of regexes a URL must satisfy to be crawled.

descriptionstringOptionalformat: "text"

The description of the job.

domain_nameslist of stringsOptional

The list of allowed domain names to crawl.

frequencyintegerOptional

Number of days between re-sync job runs. If 0, the job will run only once.

max_depthintegerOptional

The maximum depth to crawl.

notify_on_completebooleanOptional

Whether to notify the user when the job is complete. Default is true.

reject_regexeslist of stringsOptional

The list of regexes which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

sitemap_index_urlslist of stringsOptional

The list of sitemap index URLs to crawl.

sitemap_urlslist of stringsOptional

The list of sitemap URLs to crawl.

urlslist of stringsOptional

The list of URLs to crawl.

accept_regexstringOptionalformat: "text"Deprecated

The regex a URL must satisfy to be crawled.

reject_regexstringOptionalformat: "text"Deprecated

The regex which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

Response

The response to create a web crawler job.

web_crawler_jobobject
Built with