Create Web Crawler Job

Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.

This endpoint expects an object.

applies_to_partslist of stringsRequired

The parts to which created webpage/articles during this crawler job will be linked to.

accept_regexeslist of stringsOptional

The list of regexes a URL must satisfy to be crawled.

descriptionstringOptionalformat: "text"

The description of the job.

domain_nameslist of stringsOptional

The list of allowed domain names to crawl.

frequencyintegerOptional

Number of days between re-sync job runs. If 0, the job will run only once.

max_depthintegerOptional

The maximum depth to crawl.

notify_on_completebooleanOptional

Whether to notify the user when the job is complete. Default is true.

reject_regexeslist of stringsOptional

The list of regexes which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

sitemap_index_urlslist of stringsOptional

The list of sitemap index URLs to crawl.

sitemap_urlslist of stringsOptional

The list of sitemap URLs to crawl.

urlslist of stringsOptional

The list of URLs to crawl.

user_agentstringOptionalformat: "text"<=1024 characters

User agent to use for crawling websites in this job.

accept_regexstringOptionalformat: "text"Deprecated

The regex a URL must satisfy to be crawled.

reject_regexstringOptionalformat: "text"Deprecated

The regex which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

The response to create a web crawler job.

web_crawler_jobobject

1	curl -X POST https://api.devrev.ai/web-crawler-jobs.create \
2	-H "Authorization: Bearer <token>" \
3	-H "Content-Type: application/json" \
4	-d '{
5	"applies_to_parts": [
6	"foo"
7	]
8	}'