Skip to content
>GLB_
Go back

Filtering Items in Azure Data Factory: Excluding Items That Begin with an Underscore

Azure Data Factory (ADF) is a powerful tool for building ETL (Extract, Transform, Load) workflows in the cloud. One common requirement is to filter data or files based on certain conditions. In this post, we’ll explore how to use the Filter activity in ADF to exclude items that begin with an underscore (”_”), which is useful when you want to skip temporary or system files.

Use Case

Imagine you are working with a list of files or data records, and you want to process only those that do not start with an underscore. Files starting with an underscore might be temporary files or files meant to be ignored in your processing. The Filter activity in Azure Data Factory allows you to achieve this efficiently.

Steps to Filter Items That Do Not Start with ”_”

  1. Create a Pipeline: Start by creating a new pipeline in Azure Data Factory. You can do this by navigating to the ADF interface and selecting the option to create a new pipeline.
  2. Add a Filter Activity:
    • Drag and drop the Filter activity from the activities pane onto your pipeline canvas.
  3. Configure the Filter Activity:
    • Click on the Filter activity to configure its settings.
    • In the Items property, specify the dataset or array you want to filter. This might come from a previous activity like a Lookup or Get Metadata activity. For example, you might reference the output of a Get Metadata activity that lists files:
@activity('GetMetadataActivityName').output.childItems

Set the Filter Condition:

@not(startswith(item().name, '_'))

Connect the Filtered Output:

@activity('FilterActivityName').output

Example Scenario

Let’s say you have a Get Metadata activity that retrieves a list of files from a storage account. You want to process only the files that do not start with an underscore. By using the Filter activity with the condition specified above, you can ensure that only the desired files are passed on for further processing.

Summary

Using the Filter activity in Azure Data Factory with the @not(startswith(...)) function is a straightforward way to exclude items based on a naming convention, such as files that start with an underscore. This approach helps you manage and process data more effectively by allowing you to focus only on the relevant items.

With these steps, you can easily set up a filtering mechanism in your ADF pipelines, ensuring that only the data you need is processed, saving time and resources.


Share this post:

Previous Post
Creating a Running Total in SQL Server with Window Functions
Next Post
Using the OVER() Clause with Window Functions in SQL Server