• About
    • For Developers
  • API Reference
    • Changelog
  • Guides
    • Account creation
    • Agents async API
    • Links
API Referenceweb-crawler-job

Create Web Crawler Job

POST
https://api.devrev.ai/web-crawler-jobs.create
POST
/web-crawler-jobs.create
1curl -X POST https://api.devrev.ai/web-crawler-jobs.create \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "applies_to_parts": [
6 "PROD-12345"
7 ]
8}'
Try it
201Created
1{
2 "web_crawler_job": {
3 "id": "string",
4 "accept_regexs": [
5 "string"
6 ],
7 "created_by": {
8 "display_id": "string",
9 "id": "string",
10 "display_name": "string",
11 "display_picture": {
12 "display_id": "string",
13 "id": "string",
14 "file": {
15 "type": "string",
16 "name": "string",
17 "size": 1
18 }
19 },
20 "email": "string",
21 "full_name": "string",
22 "state": "active"
23 },
24 "created_date": "2023-01-01T12:00:00.000Z",
25 "description": "string",
26 "display_id": "string",
27 "domain_names": [
28 "string"
29 ],
30 "frequency": 1,
31 "max_depth": 1,
32 "modified_by": {
33 "display_id": "string",
34 "id": "string",
35 "display_name": "string",
36 "display_picture": {
37 "display_id": "string",
38 "id": "string",
39 "file": {
40 "type": "string",
41 "name": "string",
42 "size": 1
43 }
44 },
45 "email": "string",
46 "full_name": "string",
47 "state": "active"
48 },
49 "modified_date": "2023-01-01T12:00:00.000Z",
50 "no_parent": true,
51 "notify_on_complete": true,
52 "num_bytes": 1,
53 "num_timeout_urls": 1,
54 "num_urls_scraped": 1,
55 "reject_regexs": [
56 "string"
57 ],
58 "sitemap_index_urls": [
59 "string"
60 ],
61 "sitemap_urls": [
62 "string"
63 ],
64 "state": "aborted",
65 "urls": [
66 "string"
67 ],
68 "user_agent": "string"
69 }
70}
Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.

Headers

AuthorizationstringRequired
Bearer authentication of the form `Bearer <token>`, where token is your auth token.

Request

This endpoint expects an object.
applies_to_partslist of stringsRequired
The parts to which created webpage/articles during this crawler job will be linked to.
accept_regexeslist of stringsOptional
The list of regexes a URL must satisfy to be crawled.
descriptionstringOptionalformat: "text"
The description of the job.
domain_nameslist of stringsOptional
The list of allowed domain names to crawl.
frequencyintegerOptional
Number of days between re-sync job runs. If 0, the job will run only once.
max_depthintegerOptional
The maximum depth to crawl.
notify_on_completebooleanOptional
Whether to notify the user when the job is complete. Default is true.
reject_regexeslist of stringsOptional
The list of regexes which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.
sitemap_index_urlslist of stringsOptional
The list of sitemap index URLs to crawl.
sitemap_urlslist of stringsOptional
The list of sitemap URLs to crawl.
urlslist of stringsOptional
The list of URLs to crawl.
user_agentstringOptionalformat: "text"<=1024 characters
User agent to use for crawling websites in this job.
accept_regexstringOptionalformat: "text"Deprecated
The regex a URL must satisfy to be crawled.
reject_regexstringOptionalformat: "text"Deprecated
The regex which if satisfied by a URL results in rejection of the URL. If a URL matches both accept and reject regexes, it is rejected.

Response

The response to create a web crawler job.
web_crawler_jobobject

Errors

Was this page helpful?
Previous

Get Web Crawler Job

Next
Built with
The response to create a web crawler job.

Bearer authentication of the form Bearer <token>, where token is your auth token.

Number of days between re-sync job runs. If 0, the job will run only once.

Creates a web crawler job whose objective is to crawl the provided URLs/sitemaps and generate corresponding webpages as artifacts.

The parts to which created webpage/articles during this crawler job will be linked to.