Create Web Crawler Job
Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.
Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.
Authorization
auth-token All the DevRev APIs require a token to authenticate the user. Provide Authorization: Bearer <TOKEN> as a header to every API request.
How do I find my token?
In: header
Request Body
application/json
The regex a URL must satisfy to be crawled.
textThe list of regexes a URL must satisfy to be crawled.
items <= 100The parts to which created webpage/articles during this crawler job will be linked to.
1 <= items <= 1The description of the job.
textThe list of allowed domain names to crawl.
Number of days between re-sync job runs. If 0, the job will run only once.
int32Language code of the articles created by the job.
textlength <= 10The maximum depth to crawl.
int32Whether to notify the user when the job is complete. Default is true.
The regex which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.
textThe list of regexes which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.
items <= 100The list of sitemap index URLs to crawl.
items <= 2The list of sitemap URLs to crawl.
items <= 2The list of URLs to crawl.
items <= 50User agent to use for crawling websites in this job.
textlength <= 1024Response Body
application/json
application/json
application/json
application/json
application/json
application/json
application/json
application/json
curl -X POST "https://api.devrev.ai/web-crawler-jobs.create" \ -H "Content-Type: application/json" \ -d '{ "applies_to_parts": [ "PROD-12345" ] }'{
"web_crawler_job": {
"created_by": {
"type": "dev_user",
"display_id": "string",
"id": "string",
"display_name": "string",
"display_picture": {
"display_id": "string",
"id": "string",
"file": {
"type": "string",
"name": "string",
"size": 0
}
},
"email": "string",
"full_name": "string",
"state": "active"
},
"created_date": "2023-01-01T12:00:00.000Z",
"display_id": "string",
"id": "string",
"modified_by": {
"type": "dev_user",
"display_id": "string",
"id": "string",
"display_name": "string",
"display_picture": {
"display_id": "string",
"id": "string",
"file": {
"type": "string",
"name": "string",
"size": 0
}
},
"email": "string",
"full_name": "string",
"state": "active"
},
"modified_date": "2023-01-01T12:00:00.000Z",
"object_version": 0,
"accept_regexs": [
"string"
],
"description": "string",
"domain_names": [
"string"
],
"frequency": 0,
"language_code": "string",
"max_depth": 0,
"no_parent": true,
"notify_on_complete": true,
"num_bytes": 0,
"num_timeout_urls": 0,
"num_urls_scraped": 0,
"reject_regexs": [
"string"
],
"sitemap_index_urls": [
"string"
],
"sitemap_urls": [
"string"
],
"state": "aborted",
"urls": [
"string"
],
"user_agent": "string"
}
}{
"detail": "string",
"message": "string",
"type": "artifact_already_attached_to_a_parent",
"existing_parent": "string",
"is_same": true
}{
"detail": "string",
"message": "string",
"type": "dev_org_deactivated"
}{
"detail": "string",
"message": "string",
"type": "forbidden"
}{
"detail": "string",
"message": "string",
"type": "request_entity_too_large",
"max_bytes": 0
}{
"detail": "string",
"message": "string",
"type": "too_many_requests",
"retry_after": 0
}{
"detail": "string",
"message": "string",
"type": "internal_error",
"reference_id": "string"
}{
"detail": "string",
"message": "string",
"type": "service_unavailable"
}