Skip to content
>GLB_
Go back

Orchestrating Multiple AWS Glue Workflows with Step Functions

In modern data architectures, it is common to manage multiple ETL pipelines that must run in sequence or in parallel. AWS Glue provides a robust framework for building workflows, but when we need to orchestrate two or more Glue Workflows together, AWS Step Functions becomes the natural choice.

In this post, we will explain how to connect multiple Glue Workflows using Step Functions, so that one workflow can trigger another, and the execution can be monitored in a controlled manner.

Why Use Step Functions with Glue Workflows?

Although Glue Workflows allow you to chain jobs and crawlers, Step Functions provide several additional advantages:

Step Functions and Glue Integration

Step Functions provide a direct integration with Glue through the following actions:

By combining these two, we can implement a simple pattern:

  1. Start Workflow A.
  2. Wait for completion.
  3. If successful, trigger Workflow B.
  4. If failure, stop or take an alternative action.

Example: Sequential Execution of Two Workflows

Below is a simplified Step Functions state machine definition in Amazon States Language (ASL):

{
  "Comment": "Execute two Glue Workflows sequentially",
  "StartAt": "Start Workflow A",
  "States": {
    "Start Workflow A": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startWorkflowRun",
      "Parameters": {
        "Name": "WorkflowA"
      },
      "ResultPath": "$.WorkflowA",
      "Next": "Wait A"
    },
    "Wait A": {
      "Type": "Wait",
      "Seconds": 60,
      "Next": "Check A"
    },
    "Check A": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:getWorkflowRun",
      "Parameters": {
        "Name": "WorkflowA",
        "RunId.$": "$.WorkflowA.RunId",
        "IncludeGraph": false
      },
      "ResultPath": "$.StatusA",
      "Next": "Decide A"
    },
    "Decide A": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.StatusA.Run.Status",
          "StringEquals": "SUCCEEDED",
          "Next": "Start Workflow B"
        }
      ],
      "Default": "Fail A"
    },
    "Start Workflow B": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startWorkflowRun",
      "Parameters": {
        "Name": "WorkflowB"
      },
      "End": true
    },
    "Fail A": {
      "Type": "Fail",
      "Error": "WorkflowAFailed",
      "Cause": "Workflow A did not finish successfully"
    }
  }
}

This definition ensures that Workflow B only runs if Workflow A finishes with a SUCCEEDED status. Otherwise, the Step Function execution will fail, and you can configure alerts accordingly.


Best Practices


Conclusion

By combining AWS Glue Workflows with Step Functions, you gain powerful orchestration capabilities, ensuring your ETL pipelines are reliable, resilient, and easy to monitor. This approach lets you build complex data pipelines that span multiple workflows, while maintaining control over execution flow and error handling.


References


Share this post:

Previous Post
How to Disable an AWS Glue Trigger from the CLI
Next Post
Understanding the Strategy Design Pattern