Metadata extraction

During the metadata extraction phase, the ADaaS snap-in must provide an external_domain_metadata.json file on each sync run. This file provides a structured way of describing the external system’s domain system, including its domain entities, types, relationships, and other metadata.

The extraction function of the snap-in must provide a valid metadata file. DevRev provides a JSON schema and a CLI tool chef-cli to validate the proposed schema.

Triggering event

Airdrop initiates the metadata extraction by starting the snap-in with a message with an event of type EXTRACTION_METADATA_START.

The snap-in must respond to Airdrop with a message with an event of type EXTRACTION_METADATA_DONE when done, or EXTRACTION_METADATA_ERROR in case of an error.

Snap-in response

During the metadata extraction phase, the ADaaS snap-in must provide an external_domain_metadata.json file on each sync run.

The transformation can be crafted and finalized further using the chef-cli to ensure extracted data is mapped consistently to the DevRev domain model.

Step-by-step approach to crafting the metadata declaration

Since crafting metadata declaration in the form of an external_domain_metadata.json file can be a tedious process, a step-by-step approach is useful for understanding the metadata declarations and as a checklist to declare the metadata for an extraction from a specific external system.

Metadata declarations include both static declarations, formulated by deduction and comparison of external domain system, and DevRev domain system and dynamic declarations that are obtained during a snap-in run from external system APIs (since they are configurable in the external system and can be changed by the end user at any time, such as mandatory fields or custom fields).

  1. Declare the extracted record types

Record types are the types of records that has a well-defined schema you extract from or load to the external system, a domain object in the external system.

If the snap-in is extracting issues and comments, a good starting point to declare record types in external_domain_metadata.json would be:

1{
2 "record_types": {
3 "issues": {},
4 "comments": {}
5 }
6}

Although the declaration of record types is arbitrary, they must match the item_type field in the artifacts you will upload.

  1. Declare the custom record types

If the external system supports custom types, or custom variants of some base record type, and you want to airdrop those too, you have to declare them in the metadata at runtime. That is, the extractor will use APIs of the external system to dynamically discover what custom record types exist.

The output of this process might look like this:

1{
2"record_types": {
3 "issues_stock_epic": {},
4 "issues_custom2321": {},
5 "issues_custom2322": {},
6 "comments": {}
7 }
8}
  1. Provide human-readable names to external record types

Define human-readable names for the record types defined in your metadata file.

1 {
2 "record_types":{
3 "issues_stock_epic":{
4 "name":"Epic"
5 },
6 "issues_custom2321":{
7 "name":"Incident report"
8 },
9 "issues_custom2322":{
10 "name":"Problem"
11 },
12 "comments":{
13 "name":"Comment"
14 }
15 }
16}
  1. Categorize external record types.

The metadata allows each external record type to be annotated with one category. The category key can be an arbitrary string, but it must match the categories declared under record_type_categories.

Categories of external record types simplify mappings so that a mapping can be applied to a whole category of record types. Categories also provide a way how custom record types can be mapped.

If the external system allows records to change the record type within the category, while still preserving identity, this can be defined by the are_transitions_possible field in the record_type_categories section. For example, if an issue that can be moved to become a problem in the external system.

1{
2 "record_types":{
3 "issues_stock_epic":{
4 "name":"Epic",
5 "category":"issue"
6 },
7 "issues_custom2321":{
8 "name":"Incident report",
9 "category":"issue"
10 },
11 "issues_custom2322":{
12 "name":"Problem",
13 "category":"issue"
14 },
15 "comments":{
16 "name":"Comment"
17 }
18 },
19 "record_type_categories":{
20 "issue":{
21 "are_transitions_possible":true
22 }
23 }
24}
  1. Declare fields for each record type:

Fields’ keys must match what is actually found in the extracted data in the artifacts.

The supported types are:

  • bool
  • int
  • float
  • text
  • rich_text: Formatted text with mentions and images.
  • reference: IDs referring to another record. References have to declare what they can refer to, which can be one or more record types (#record:) or categories (#category:).
  • enum: A string from a predefined set of values with the optional human-readable names for each value.
  • date,
  • timestamp,
  • struct.

If the external system supports custom fields, the set of custom fields in each record type you wish to extract must be declared too.

Enum fields’ set of possible values can often be customizable. A good practice is to retrieve the set of possible values for all enum fields from the external system’s APIs in each sync run.

ID (primary key) of the record, created_date, and modified_date must not be declared.

Example:

1{
2 "record_types": {
3 "issues_stock_epic": {
4 "name": "Epic",
5 "category": "issue",
6 "fields": {
7 "actual_close_date": {
8 "name": "Closed at",
9 "type": "timestamp"
10 },
11 "owner": {
12 "is_required": true,
13 "type": "reference",
14 "reference": {
15 "refers_to": {
16 "#record:user": {}
17 }
18 }
19 },
20 "creator": {
21 "is_required": true,
22 "type": "reference",
23 "reference": {
24 "refers_to": {
25 "#record:user": {}
26 }
27 }
28 },
29 "priority": {
30 "name": "Priority",
31 "is_required": true,
32 "type": "enum",
33 "enum": {
34 "values": [
35 {
36 "key": "P-0",
37 "name": "Super important"
38 },
39 {
40 "key": "P-1"
41 },
42 {
43 "key": "P-2",
44 "is_deprecated": true
45 }
46 ]
47 }
48 },
49 "target_close_date": {
50 "type": "date"
51 },
52 "headline": {
53 "name": "Headline",
54 "is_required": true,
55 "type": "text"
56 }
57 }
58 }
59 }
60}
  1. Declare arrays

If the field is array in the extracted data, it is still typed with the one of the supported types. Lists must be marked as a collection.

1{
2 "name": "Assignees",
3 "is_required": true,
4 "type": "reference",
5 "reference": {
6 "refers_to": {
7 "#category:agents": {}
8 }
9 },
10 "collection": {
11 "max_length": 5
12 }
13}
  1. Consider special references:
  • Some references have role of parent or child. This means that the child record doesn’t make sense without its parent, for example a comment attached to a ticket. Assigning a role helps Airdrop correctly handle such fields in case the end-user decides to filter some of the parent records out.

  • Sometimes the external system uses references besides the primary key of records, for example when referring to a case by serial number, or to a user by their email. To correctly resolve such references, they must be marked with ‘by_field’, which must be a field existing in that record type, marked ‘is_identifier’. For example:

1{
2"record_types": {
3 "users": {
4 "fields": {
5 "email": {
6 "type": "text",
7 "is_identifier":true
8 }
9 }
10 },
11 "comments": {
12 "fields": {
13 "user_email": {
14 "type": "reference",
15 "reference": {
16 "refers_to": {
17 "#record:users": {
18 "by_field": "email"
19 }
20 }
21 }
22 }
23 }
24 }
25}
26}