In the data extraction phase, the extractor is expected to call the external system’s APIs to retrieve all the items that were updated since the start of the last extraction. If there was no previous extraction (the current run is an initial import), then all the items should be extracted.

The extractor must store at what time it started each extraction in its state, so that it can extract only items created and/or updated since this date in the next sync run.

Triggering event

Airdrop initiates data extraction by starting the snap-in with a message with event of type EXTRACTION_DATA_START when transitioning to the data extraction phase.

During the data extraction phase, the snap-in extracts data from an external system, prepares batches of data and uploads them in the form of artifacts to DevRev.

The snap-in must respond to Airdrop with a message with event of type EXTRACTION_DATA_PROGRESS, together with an optional progress estimate and relevant artifacts when it extracts some data and the maximum ADaaS snap-in runtime (12 minutes) has been reached.

If the extraction has been rate-limited by the external system and back-off is required, the snap-in must respond to Airdrop with a message with event of type EXTRACTION_DATA_DELAY and specifying back-off time with delay attribute.

In both cases, Airdrop starts the snap-in with a message with event of type EXTRACTION_DATA_CONTINUE. The restarting is immediate (in case of EXTRACTION_DATA_PROGRESS) or delayed (in case of EXTRACTION_DATA_DELAY).

Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of type EXTRACTION_DATA_DONE.

If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a message with event of type EXTRACTION_DATA_ERROR.

Snap-in response

During the data extraction phase, the snap-in uploads batches of extracted items (the recommended batch size is 2000 items) formatted in JSONL (JSON Lines format), gzipped, and submitted as an artifact to S3Interact (with tooling from @devrev/adaas-sdk).

Each artifact is submitted with an item_type, defining a separate domain object from the external system and matching the record_type in the provided metadata. Item types defined when uploading extracted data must validate the declarations in the metadata file.

Extracted data must be normalized.

  • Null values: All fields without a value should either be omitted or set to null. For example, if an external system provides values such as "", -1 for missing values, those must be set to null.
  • Timestamps: Full-precision timestamps should be formatted as RFC3999 (1972-03-29T22:04:47+01:00), and dates should be just 2020-12-31.
  • References: references must be strings, not numbers or objects.
  • Number fields must be valid JSON numbers (not strings)
  • Multiselect fields must be provided as an array (not CSV)

Each line of the file contains an id and the optional created_date and modified_date fields in the beginning of the record. All other fields are contained within the data attribute.

1{
2 "id": "2102e01F",
3 "created_date": "1972-03-29T22:04:47+01:00",
4 "modified_date": "1970-01-01T01:00:04+01:00",
5 "data": {
6 "actual_close_date": "1970-01-01T02:33:18+01:00",
7 "creator": "b8",
8 "owner": "A3A",
9 "rca": null,
10 "severity": "fatal",
11 "summary": "Lorem ipsum"
12 }
13}

Extracted artifacts can be validated with the chef-cli using the following command:

$$ chef-cli validate-metadata -m external_domain_metadata.json -r issue < extractor_issues_2.json