Create Web Crawler Job
Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.
Headers
Authorization
Bearer authentication of the form Bearer <token>, where token is your auth token.
Request
This endpoint expects an object.
applies_to_parts
The parts to which created webpage/articles during this crawler job will be linked to.
accept_regexes
The list of regexes a URL must satisfy to be crawled.
description
The description of the job.
domain_names
The list of allowed domain names to crawl.
frequency
Number of days between re-sync job runs. If 0, the job will run only once.
max_depth
The maximum depth to crawl.
notify_on_complete
Whether to notify the user when the job is complete. Default is
true.
reject_regexes
The list of regexes which if satisfied by a URL results in
rejection of the URL. If a URL matches both accept and reject
regexes, it is rejected.
sitemap_index_urls
The list of sitemap index URLs to crawl.
sitemap_urls
The list of sitemap URLs to crawl.
urls
The list of URLs to crawl.
user_agent
User agent to use for crawling websites in this job.
accept_regexDeprecated
The regex a URL must satisfy to be crawled.
reject_regexDeprecated
The regex which if satisfied by a URL results in rejection of the
URL. If a URL matches both accept and reject regexes, it is
rejected.
Response
The response to create a web crawler job.
web_crawler_job