Tasks

Before configuring the Orchestrator component, have all the components (data source connectors, transformations, and data destination connectors) you wish to work with configured and ready.

To configure the Orchestrator component, create a new Orchestration:

Screenshot - Orchestration Create

Add Tasks

The first step is to add orchestration tasks — the component configurations you wish to run — by clicking on Configure Tasks:

Screenshot - Orchestration Main Page

Continue with New Task:

Screenshot - Orchestration Tasks

A list of configured components is shown:

Screenshot - Orchestration Tasks

After selecting a component, a list of its configurations is shown. Clicking the plus button adds the desired configuration to the orchestration:

Screenshot - Orchestration Tasks Configurations

Repeat this for all configurations you want to add into the orchestration.

Organize Tasks

Let’s assume you have the following configurations and wish to orchestrate them into a data pipeline:

  • Adform data source connector with the Campaigns configuration
  • Snowflake data source connector with the Email recipient index configuration
  • Transformations with the configurations Campaign Performance and Campaign Recipient
  • Mailchimp data destination connector with the New recipients configuration

When you randomly add the configurations as orchestration tasks, chances are that you’ll end up with something similar to this:

Screenshot - Orchestration Tasks Added

Here comes an important rule:

Orchestration Phases execute sequentially, tasks within a Phase execute in parallel.

This means that the order of phases is important and maintained and that a second phase will start only when the first phase is completely finished. On the other hand, the order of tasks within the phase is not important, they may execute in any order or in parallel. For more in-depth explanation, see the notes about Job execution.

When this rule is applied to the above task configuration, it leads to the following sequence of execution:

Orchestration Tasks Sequence

That means both transformations and the Mailchimp data destination connector will run in parallel, and when they finish, the Adform data source connector will be run. When it is finished, the Snowflake data source connector will run. Surely, this is not right. The data source connectors must run before the transformations and the transformations must run before the data destination connector. Because this is a typical scenario, there is a feature to do just this — Group tasks by component type:

Screenshot - Orchestration Tasks Order

The tasks are now reordered:

Screenshot - Orchestration Tasks Ordered

The above will lead to the following execution sequence:

Orchestration Tasks Sequence Organized

First, the two data source connectors are run in parallel, then both transformations are run in parallel, and last the data destination connector sends the results to the consumer (Mailchimp service in this case). The configurations will be executed in the order in which they depend on each other.

Handling Dependencies

What if the two transformations are also dependent? Let’s say that Campaign Recipient depends on Campaign Performance, therefore it must be executed after it. This can be done by moving Campaign Recipient to a new phase. Select the Campaign Recipients task, click Actions and Move selected tasks between phases:

Screenshot - Move Task

Type Second Transformation Phase to create a new orchestration phase:

Screenshot - Add Phase

The phase is created and it contains the Campaign Recipients transformation. Now move the phase so that it executes after the phase containing Campaign Performance and before the phase containing the New recipients data destination connector:

Screenshot - Move Phase

The result should be this:

Screenshot - Phase Moved

Which corresponds to the following execution sequence:

Orchestration Tasks Sequence Serialized

That means that the Campaigns and Email Recipient Index configurations will execute first. When they both finish, the transformation Campaign Performance will run. When it finishes, the transformation Campaign Recipient will run. Lastly, the New recipients data destination connector will be executed.

Another way of handling dependencies is using nested orchestrations.