Metadata extraction | DevRev

During the metadata extraction phase, the snap-in must provide an external_domain_metadata.json file to Airdrop. This file provides a structured way of describing the external system’s domain model, including its domain entities, types, relationships, and other metadata.

You can check which object types for Airdrop DevRev supports here.

Triggering event

Airdrop initiates the metadata extraction by starting the snap-in with a message with an event of type EXTRACTION_METADATA_START.

The snap-in must respond to Airdrop with a message with an event of type EXTRACTION_METADATA_DONE when done, or EXTRACTION_METADATA_ERROR in case of an error.

Implementation

Metadata extraction should be implemented in the metadata-extraction.ts file.

1 import { ExtractorEventType, processTask } from "@devrev/ts-adaas";
2 import externalDomainMetadata from "../../external-system/external_domain_metadata.json";
3 
4 const repos = [{ itemType: "external_domain_metadata" }];
5 
6 processTask({
7   task: async ({ adapter }) => {
8     adapter.initializeRepos(repos);
9     await adapter
10       .getRepo("external_domain_metadata")
11       ?.push([externalDomainMetadata]);
12     await adapter.emit(ExtractorEventType.ExtractionMetadataDone);
13   },
14   onTimeout: async ({ adapter }) => {
15     await adapter.emit(ExtractorEventType.ExtractionMetadataError, {
16       error: { message: "Failed to extract metadata. Lambda timeout." },
17     });
18   },
19 });

The snap-in must always emit a single message.

Validating metadata

The extraction function of the snap-in must provide a valid metadata file. DevRev provides a JSON schema and a CLI tool chef-cli to validate the proposed schema.

To check the metadata for internal consistency, you should use the following command after every step:

$ chef-cli validate-metadata < external_domain_metadata.json

This outputs any problems there may be with the metadata file.

The detailed format of this metadata is defined by the external_domain_metadata_schema.json file. The file is available as part of the chef-cli package. There, you can also find an example of a metadata file (metadata.json).

The transformation can be crafted and finalized further using the chef-cli to ensure extracted data is mapped consistently to the DevRev domain model.

Getting started with infer-metadata

The chef-cli provides a helpful command to generate initial domain metadata from example data:

$ chef-cli infer-metadata example_data_directory > metadata.json

To get good results with this approach:

Collect example data from the external system and place them in a directory. Each file should:
- Contain the same type of records, named after their type.
- Have .json or .jsonl extension, for example issues.json.
- Contain either a single JSON array of objects, or newline-separated objects.
Run the infer-metadata command targeting this directory.
Inspect the generated metadata, particularly field types and the suggestions the tool generates.

For best results:

Provide 10-100 examples of each record type (but not more than 1000).
Ensure logically distinct fields are separate keys at the top level.
Use referentially consistent example data if possible.
Make sure IDs are strings, not numbers.

This generated metadata serves as a starting point that needs further refinement. It can be used to prototype initial domain mappings (by running a sync with it) and to generate example normalized data, but it’s important to understand that it’s only an initial guess. The metadata must be carefully reviewed and refined to ensure accuracy and proper representation of your external system’s data model.

Craft the metadata declaration

Since crafting metadata declaration in the form of an external_domain_metadata.json file can be a tedious process, a step-by-step approach is useful for understanding the metadata declarations and as a checklist to declare the metadata for an extraction from a specific external system.

Metadata declarations include both static declarations, formulated by deduction and comparison of external domain system and DevRev domain system, and dynamic declarations that are obtained during a snap-in run from external system APIs (since they are configurable in the external system and can be changed by the end user at any time, such as mandatory fields or custom fields).

Declare the extracted record types

Record types are the types of records that have a well-defined schema you extract from or load to the external system, a domain object in the external system.

If the snap-in is extracting issues and comments, a good starting point to declare record types in external_domain_metadata.json would be:

1 {
2   "record_types": {
3     "issues": {},
4     "comments": {}
5   }
6 }

Although the declaration of record types is arbitrary, they must match the item_type field in the artifacts you will upload.

Declare the custom record types

If the external system supports custom types, or custom variants of some base record type, and you want to airdrop those too, you have to declare them in the metadata at runtime. That is, the extractor will use APIs of the external system to dynamically discover what custom record types exist.

The output of this process might look like this:

1 {
2 "record_types": {
3     "issues_stock_epic": {},
4     "issues_custom2321": {},
5     "issues_custom2322": {},
6     "comments": {}
7   }
8 }

Provide human-readable names to external record types

Define human-readable names for the record types defined in your metadata file.

1 {
2   "record_types":{
3     "issues_stock_epic":{
4       "name":"Epic"
5     },
6     "issues_custom2321":{
7       "name":"Incident report"
8     },
9     "issues_custom2322":{
10       "name":"Problem"
11     },
12     "comments":{
13       "name":"Comment"
14     }
15   }
16 }

Categorize external record types

The metadata allows each external record type to be annotated with one category. The category key can be an arbitrary string, but it must match the categories declared under record_type_categories.

Categories of external record types simplify mappings so that a mapping can be applied to a whole category of record types. Categories also provide a way how custom record types can be mapped.

If the external system allows records to change the record type within the category, while still preserving identity, this can be defined by the are_record_type_conversions_possible field in the record_type_categories section. For example, if an issue can be moved to become a problem in the external system.

1 {
2   "record_types":{
3     "issues_stock_epic":{
4       "name":"Epic",
5       "category":"issue"
6     },
7     "issues_custom2321":{
8       "name":"Incident report",
9       "category":"issue"
10     },
11     "issues_custom2322":{
12       "name":"Problem",
13       "category":"issue"
14     },
15     "comments":{
16       "name":"Comment"
17     }
18   },
19   "record_type_categories":{
20     "issue":{
21       "are_record_type_conversions_possible":true
22     }
23   }
24 }

Mark record types as loadable

The record types that will be used in a 2-way sync must be marked with is_loadable. This will allow Airdrop to load the record types to the external system.

1 {
2   "record_types":{
3     "issues_stock_epic":{
4       "name":"Epic",
5       "category":"issue",
6       "is_loadable": true
7     },
8     "issues_custom2321":{
9       "name":"Incident report",
10       "category":"issue"
11     },
12     "issues_custom2322":{
13       "name":"Problem",
14       "category":"issue"
15     },
16     "comments":{
17       "name":"Comment",
18       "is_loadable": true
19     }
20   }
21 }

Declare fields for each record type

Fields’ keys must match what is actually found in the extracted data in the artifacts. Note that field keys are case-sensitive.

The supported types are:

bool
int
float
text: Interpreted as plain text.
rich_text: Formatted text with mentions and images. See the rich text section for more details.
reference: IDs referring to another record. References have to declare what they can refer to, which can be one or more record types (#record:) or categories (#category:).
enum: A string from a predefined set of values with the optional human-readable names for each value.
date
timestamp
struct
permission: Used in article shared_with field. See the permissions section for more details.
type_key: Used to map permissions to record types. See the permissions section for more details.

Refer to the metadata schema file (external_domain_metadata_schema.json) to help choose the appropriate type for your fields.

If the external system supports custom fields, the set of custom fields in each record type you wish to extract must be declared too.

Enum fields’ set of possible values can often be customizable. A good practice is to retrieve the set of possible values for all enum fields from the external system’s APIs in each sync run. You can mark specific enum values as deprecated using the is_deprecated property.

ID (primary key) of the record, created_date, and modified_date must not be declared.

Example:

1 {
2   "schema_version": "v0.2.0",
3   "record_types": {
4     "issues_stock_epic": {
5       "name": "Epic",
6       "category": "issue",
7       "fields": {
8         "actual_close_date": {
9           "name": "Closed at",
10           "type": "timestamp"
11         },
12         "owner": {
13           "is_required": true,
14           "type": "reference",
15           "reference": {
16             "refers_to": {
17               "#record:user": {}
18             }
19           }
20         },
21         "creator": {
22           "is_required": true,
23           "type": "reference",
24           "reference": {
25             "refers_to": {
26               "#record:user": {}
27             }
28           }
29         },
30         "priority": {
31           "name": "Priority",
32           "is_required": true,
33           "type": "enum",
34           "enum": {
35             "values": [
36               {
37                 "key": "P-0",
38                 "name": "Super important"
39               },
40               {
41                 "key": "P-1"
42               },
43               {
44                 "key": "P-2",
45                 "is_deprecated": true
46               }
47             ]
48           }
49         },
50         "target_close_date": {
51           "type": "date"
52         },
53         "headline": {
54           "name": "Headline",
55           "is_required": true,
56           "type": "text"
57         }
58       }
59     }
60   }
61 }

Declare arrays

If the field is an array in the extracted data, it is still typed with one of the supported types. Lists must be marked as a collection.

1 {
2   "name": "Assignees",
3   "is_required": true,
4   "type": "reference",
5   "reference": {
6     "refers_to": {
7       "#category:agents": {}
8     }
9   },
10   "collection": {
11     "max_length": 5
12   }
13 }

Some references have the role of parent or child. This means that the child record doesn’t make sense without its parent, for example a comment attached to a ticket. Assigning a reference_type helps Airdrop correctly handle such fields in case the end-user decides to filter some of the parent records out.

Define field attributes

External system fields that shouldn’t be mapped in reverse should be marked as is_read_only. Depending on their purpose, you can also mark fields as is_indexed, is_identifier, is_filterable, is_write_only, etc. By default, these are set to false. You can find the full list of supported field attributes and their descriptions in the external_domain_metadata_schema.json.

Configure state transitions

If an external record type has some concept of states, between which only certain transitions are possible (for example, to move to the resolved status, an issue first has to be in_testing), you can declare these in the metadata too.

This allows creation of a matching stage diagram (a collection of stages and their permitted transitions) in DevRev, which enables a much simpler import and a closer preservation of the external data than mapping to DevRev’s built-in stages.

This is especially important for two-way sync, as setting the transitions correctly ensures that the transitions a record undergoes in DevRev can be replicated in the external system.

To declare this in the metadata, make sure the status is represented as an enum field, and then declare the allowed transitions (which you might have to retrieve from an API at runtime, if they are also customized):

1 {
2   "fields": {
3     "status": {
4       "name": "Status",
5       "is_required": true,
6       "type": "enum",
7       "enum": {
8         "values": [
9           {
10             "key": "detected",
11             "name": "Detected"
12           },
13           {
14             "key": "mitigated",
15             "name": "Mitigated"
16           },
17           {
18             "key": "rca_ready",
19             "name": "RCA Ready"
20           },
21           {
22             "key": "archived",
23             "name": "Archived"
24           }
25         ]
26       }
27     }
28   },
29   "stage_diagram": {
30     "controlling_field": "status",
31     "starting_stage": "detected",
32     "all_transitions_allowed": false,
33     "stages": {
34       "detected": {
35         "transitions_to": ["mitigated", "archived", "rca_ready"],
36         "state": "new"
37       },
38       "mitigated": {
39         "transitions_to": ["archived", "detected"],
40         "state": "work_in_progress"
41       },
42       "rca_ready": {
43         "transitions_to": ["archived"],
44         "state": "work_in_progress"
45       },
46       "archived": {
47         "transitions_to": [],
48         "state": "completed"
49       }
50     },
51     "states": {
52       "new": {
53         "name": "New"
54       },
55       "work_in_progress": {
56         "name": "Work in Progress"
57       },
58       "completed": {
59         "name": "Completed",
60         "is_end_state": true
61       }
62     }
63   }
64 }

In the above example:

The status field is the controlling field of the stage diagram.
If a status field has no explicit transitions but you still want a stage diagram, set all_transitions_allowed to true, which creates a diagram where all the defined stages can transition to each other.
External systems may categorize statuses (like Jira’s status categories), which can be included in the diagram metadata (states in the example).
The starting_stage defines the initial stage for new object instances. This data should always be provided if available, otherwise the starting stage is selected alphabetically.
The order and human-readable name are taken from the enum values defined on the controlling field.
If the states field is not provided, default DevRev states are used: open, in_progress, and closed.

Declare custom link types

External record types that represent links between other record types can be imported as custom links, which means that each declared link type will be recreated in DevRev with the same names. This allows closer preservation of the original data without the need to map the links to DevRev’s predefined link types.

To achieve this, the external record type needs to have an enum field defined that represents the link types and you need to declare a special field called link_naming_data in the record type.

1 {
2   "fields": {
3       "type": {
4           "name": "Link Type",
5           "is_required": true,
6           "type": "enum",
7           "enum": {
8               "values": [
9                   {
10                       "key": "1",
11                       "name": "Parent Of"
12                   },
13                   {
14                       "key": "2",
15                       "name": "Related"
16                   },
17                   {
18                       "key": "3",
19                       "name": "Blocks"
20                   }
21               ]
22           }
23       }
24   },
25   "link_naming_data": {
26       "link_type_field": "type",
27       "link_direction_names": {
28           "1": {
29               "forward_name": "Parent of",
30               "backward_name": "Child of"
31           },
32           "2": {
33               "forward_name": "Relates To",
34               "backward_name": "Relates To"
35           },
36           "3": {
37               "forward_name": "Blocks",
38               "backward_name": "Is Blocked By"
39           }
40       }
41   }
42 }

In the above example:

The external type field is declared as the link_type_field.
The link_direction_names provide a mapping of each value in the link_type_field to their directional names. forward_name and backward_name can be the same, but both are required.
The human-readable name of the link type is taken from the enum values defined on the link_type_field. The linkable object types in DevRev are defined based on the mappings of the source_id and target_id fields.